# Design: YAML Support for Persona Files (#57) ## Problem JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability. ## Constraints - Backwards compatibility: existing JSON personas must continue to work - Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486) - Consistency: use `.yaml` extension (not `.yml`) - Library: use `gopkg.in/yaml.v3` (approved in CONVENTIONS.md) with explicit depth limiting ## Proposed Approach 1. **Update `parsePersona`** to detect format from file extension 2. **Add YAML parsing** with explicit depth limit (defense in depth) 3. **Keep JSON as fallback** for files without `.yaml`/`.yml` extension 4. **Convert built-in personas** to YAML format 5. **Update embed directive** to include both formats ### File Extension Detection ```go func parsePersona(data []byte, source string) (*Persona, error) { isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml") if isYAML { return parseYAML(data, source) } return parseJSON(data, source) } ``` ### YAML Parsing with Depth Protection ```go func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error { var node yaml.Node dec := yaml.NewDecoder(bytes.NewReader(data)) if err := dec.Decode(&node); err != nil { return err } if err := checkYAMLDepth(&node, 0, maxDepth); err != nil { return err } return node.Decode(out) } func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error { if depth > maxDepth { return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth) } // Handle alias nodes by following the Alias pointer if node.Kind == yaml.AliasNode && node.Alias != nil { return checkYAMLDepth(node.Alias, depth, maxDepth) } for _, child := range node.Content { if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil { return err } } return nil } ``` The `gopkg.in/yaml.v3` library does not have built-in depth protection, so we implement explicit depth checking by first decoding into a `yaml.Node`, walking the tree to verify depth (including alias resolution), then decoding into the target struct. ## State/Data Model No new state. Same `Persona` struct, just different parsing. ## Error Cases | Error | Handling | |-------|----------| | Invalid YAML syntax | Return parse error with source file | | Deeply nested YAML | Library rejects (v1.16.0+ fix) | | Unknown extension | Fall back to JSON parsing | | Missing required fields | Validation rejects after parse | ## Edge Cases - File with `.json` extension but YAML content → JSON parse fails, user sees error - File with no extension → defaults to JSON - Embedded persona reference like `builtin:security` → detect by embed path (`personas/X.yaml`) ## Testing Strategy 1. Unit tests for YAML parsing (valid, invalid, deeply nested) 2. Unit tests for extension detection 3. Integration test for built-in personas (now YAML) 4. Backwards compat test: verify JSON still works for external files ## Completion Checklist 1. [ ] `go-yaml` dependency added at v1.16.0+ 2. [ ] Extension detection uses case-insensitive comparison 3. [ ] YAML parse errors include source file name 4. [ ] JSON parsing still works for `.json` files 5. [ ] Built-in personas converted to YAML with readable multi-line strings 6. [ ] Embed directive updated to include `*.yaml` 7. [ ] Test for deeply nested YAML rejection 8. [ ] All existing tests pass ## Open Questions - Should we support both `.yaml` AND `.yml`? Issue says `.yaml` only for consistency, but some users expect `.yml`. **Decision:** Support both for reading, recommend `.yaml` in docs. - Should we add a "format" field to detect mismatched extension/content? **Decision:** No, keep it simple. Extension determines format.