# Design: YAML Support for Persona Files (#57) ## Problem JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability. ## Constraints - Backwards compatibility: existing JSON personas must continue to work - Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486) - Consistency: use `.yaml` extension (not `.yml`) - Library: use `github.com/goccy/go-yaml` v1.16.0+ (approved in CONVENTIONS.md); we implement custom AST-based depth/node-count checks for precise alias-aware validation ## Proposed Approach 1. **Update `parsePersona`** to detect format from file extension 2. **Add YAML parsing** with explicit depth limit (defense in depth) 3. **Keep JSON as fallback** for files without `.yaml`/`.yml` extension 4. **Convert built-in personas** to YAML format 5. **Update embed directive** to include both formats ### File Extension Detection ```go func parsePersona(data []byte, source string) (*Persona, error) { isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml") if isYAML { return parseYAML(data, source) } return parseJSON(data, source) } ``` ### YAML Parsing with Depth Protection We implement a custom AST-based depth/node-count walk (`checkYAMLDepth` in `review/persona.go`) rather than relying on library decoder options. Key design decisions: - **Library:** `github.com/goccy/go-yaml` with `ast.Node`-based traversal - **Dual-map tracking:** `validated` (depth-aware short-circuit) + `visiting` (cycle detection) - **Node-count limit:** Conservative overcounting bounds total validation work - **Alias-aware depth:** Aliases increment depth and are re-checked when encountered at greater depths See `review/persona.go:checkYAMLDepth` for the authoritative implementation. ## State/Data Model No new state. Same `Persona` struct, just different parsing. ## Error Cases | Error | Handling | |-------|----------| | Invalid YAML syntax | Return parse error with source file | | Deeply nested YAML | Custom AST walk (`checkYAMLDepth`) rejects before decode | | Unknown extension | Fall back to JSON parsing | | Missing required fields | Validation rejects after parse | ## Edge Cases - File with `.json` extension but YAML content → JSON parse fails, user sees error - File with no extension → defaults to JSON - Embedded persona reference like `builtin:security` → detect by embed path (`personas/X.yaml`) ## Testing Strategy 1. Unit tests for YAML parsing (valid, invalid, deeply nested) 2. Unit tests for extension detection 3. Integration test for built-in personas (now YAML) 4. Backwards compat test: verify JSON still works for external files ## Completion Checklist 1. [ ] `go-yaml` dependency added at v1.16.0+ 2. [ ] Extension detection uses case-insensitive comparison 3. [ ] YAML parse errors include source file name 4. [ ] JSON parsing still works for `.json` files 5. [ ] Built-in personas converted to YAML with readable multi-line strings 6. [ ] Embed directive updated to include `*.yaml` 7. [ ] Test for deeply nested YAML rejection 8. [ ] All existing tests pass ## Open Questions - Should we support both `.yaml` AND `.yml`? Issue says `.yaml` only for consistency, but some users expect `.yml`. **Decision:** Support both for reading, recommend `.yaml` in docs. - Should we add a "format" field to detect mismatched extension/content? **Decision:** No, keep it simple. Extension determines format.