3a1e5e443e
- Move nodeCount increment after cycle detection to avoid over-counting cyclic references (sonnet #2) - Use underscores in test case names used as filenames (sonnet #3) - Fix function comment: 'prevent silent data loss' → 'prevent confusing behavior where additional documents are silently ignored' (sonnet #4) - Mark design doc pseudocode as historical since implementation uses goccy/go-yaml ast.Node, not gopkg.in/yaml.v3 yaml.Node (sonnet #5)
114 lines
4.3 KiB
Markdown
114 lines
4.3 KiB
Markdown
# Design: YAML Support for Persona Files (#57)
|
|
|
|
## Problem
|
|
|
|
JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.
|
|
|
|
## Constraints
|
|
|
|
- Backwards compatibility: existing JSON personas must continue to work
|
|
- Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
|
|
- Consistency: use `.yaml` extension (not `.yml`)
|
|
- Library: use `github.com/goccy/go-yaml` v1.16.0+ (approved in CONVENTIONS.md); we implement custom AST-based depth/node-count checks for precise alias-aware validation
|
|
|
|
## Proposed Approach
|
|
|
|
1. **Update `parsePersona`** to detect format from file extension
|
|
2. **Add YAML parsing** with explicit depth limit (defense in depth)
|
|
3. **Keep JSON as fallback** for files without `.yaml`/`.yml` extension
|
|
4. **Convert built-in personas** to YAML format
|
|
5. **Update embed directive** to include both formats
|
|
|
|
### File Extension Detection
|
|
|
|
```go
|
|
func parsePersona(data []byte, source string) (*Persona, error) {
|
|
isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
|
|
if isYAML {
|
|
return parseYAML(data, source)
|
|
}
|
|
return parseJSON(data, source)
|
|
}
|
|
```
|
|
|
|
### YAML Parsing with Depth Protection
|
|
|
|
> **Note:** The pseudocode below reflects the initial design using `gopkg.in/yaml.v3`
|
|
> types (`yaml.Node`). The actual implementation uses `github.com/goccy/go-yaml`
|
|
> with `ast.Node`-based traversal, dual-map cycle/depth tracking, and node-count
|
|
> limits. See `review/persona.go` for the current implementation.
|
|
|
|
```go
|
|
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
|
|
var node yaml.Node
|
|
dec := yaml.NewDecoder(bytes.NewReader(data))
|
|
if err := dec.Decode(&node); err != nil {
|
|
return err
|
|
}
|
|
if err := checkYAMLDepth(&node, 0, maxDepth); err != nil {
|
|
return err
|
|
}
|
|
return node.Decode(out)
|
|
}
|
|
|
|
func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
|
|
if depth > maxDepth {
|
|
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
|
|
}
|
|
// Handle alias nodes by following the Alias pointer
|
|
if node.Kind == yaml.AliasNode && node.Alias != nil {
|
|
return checkYAMLDepth(node.Alias, depth, maxDepth)
|
|
}
|
|
for _, child := range node.Content {
|
|
if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil {
|
|
return err
|
|
}
|
|
}
|
|
return nil
|
|
}
|
|
```
|
|
|
|
We implement a custom AST-based depth/node-count walk (`checkYAMLDepth`) rather than relying on library decoder options. This gives us precise control over how depth is counted across aliases and anchors, with a depth-aware validated map to prevent alias depth bypass.
|
|
|
|
## State/Data Model
|
|
|
|
No new state. Same `Persona` struct, just different parsing.
|
|
|
|
## Error Cases
|
|
|
|
| Error | Handling |
|
|
|-------|----------|
|
|
| Invalid YAML syntax | Return parse error with source file |
|
|
| Deeply nested YAML | Library rejects (v1.16.0+ fix) |
|
|
| Unknown extension | Fall back to JSON parsing |
|
|
| Missing required fields | Validation rejects after parse |
|
|
|
|
## Edge Cases
|
|
|
|
- File with `.json` extension but YAML content → JSON parse fails, user sees error
|
|
- File with no extension → defaults to JSON
|
|
- Embedded persona reference like `builtin:security` → detect by embed path (`personas/X.yaml`)
|
|
|
|
## Testing Strategy
|
|
|
|
1. Unit tests for YAML parsing (valid, invalid, deeply nested)
|
|
2. Unit tests for extension detection
|
|
3. Integration test for built-in personas (now YAML)
|
|
4. Backwards compat test: verify JSON still works for external files
|
|
|
|
## Completion Checklist
|
|
|
|
1. [ ] `go-yaml` dependency added at v1.16.0+
|
|
2. [ ] Extension detection uses case-insensitive comparison
|
|
3. [ ] YAML parse errors include source file name
|
|
4. [ ] JSON parsing still works for `.json` files
|
|
5. [ ] Built-in personas converted to YAML with readable multi-line strings
|
|
6. [ ] Embed directive updated to include `*.yaml`
|
|
7. [ ] Test for deeply nested YAML rejection
|
|
8. [ ] All existing tests pass
|
|
|
|
## Open Questions
|
|
|
|
- Should we support both `.yaml` AND `.yml`? Issue says `.yaml` only for consistency, but some users expect `.yml`. **Decision:** Support both for reading, recommend `.yaml` in docs.
|
|
- Should we add a "format" field to detect mismatched extension/content? **Decision:** No, keep it simple. Extension determines format.
|