b5f17ddfc4
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 38s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m18s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m20s
The global 'seen' set allowed anchored subtrees validated at a shallow depth to be skipped when later referenced via alias at a greater depth. This could let effective nesting exceed MaxYAMLDepth, enabling DoS. Fix: replace the single 'seen' set with two tracking maps: - validated (node -> min depth): only short-circuits when current depth <= previously validated depth; re-checks at deeper contexts. - visiting (node -> bool): per-path recursion stack for true cycle detection (breaks alias loops without suppressing depth checks). Add TestYAMLAliasDepthBypass that constructs a document with an anchored 15-level subtree referenced via alias under 6 levels of nesting, verifying the combined effective depth (22) is rejected. Addresses security-review-bot findings on review #2774.
109 lines
4.0 KiB
Markdown
109 lines
4.0 KiB
Markdown
# Design: YAML Support for Persona Files (#57)
|
|
|
|
## Problem
|
|
|
|
JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.
|
|
|
|
## Constraints
|
|
|
|
- Backwards compatibility: existing JSON personas must continue to work
|
|
- Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
|
|
- Consistency: use `.yaml` extension (not `.yml`)
|
|
- Library: use `github.com/goccy/go-yaml` v1.16.0+ (approved in CONVENTIONS.md); we implement custom AST-based depth/node-count checks for precise alias-aware validation
|
|
|
|
## Proposed Approach
|
|
|
|
1. **Update `parsePersona`** to detect format from file extension
|
|
2. **Add YAML parsing** with explicit depth limit (defense in depth)
|
|
3. **Keep JSON as fallback** for files without `.yaml`/`.yml` extension
|
|
4. **Convert built-in personas** to YAML format
|
|
5. **Update embed directive** to include both formats
|
|
|
|
### File Extension Detection
|
|
|
|
```go
|
|
func parsePersona(data []byte, source string) (*Persona, error) {
|
|
isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
|
|
if isYAML {
|
|
return parseYAML(data, source)
|
|
}
|
|
return parseJSON(data, source)
|
|
}
|
|
```
|
|
|
|
### YAML Parsing with Depth Protection
|
|
|
|
```go
|
|
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
|
|
var node yaml.Node
|
|
dec := yaml.NewDecoder(bytes.NewReader(data))
|
|
if err := dec.Decode(&node); err != nil {
|
|
return err
|
|
}
|
|
if err := checkYAMLDepth(&node, 0, maxDepth); err != nil {
|
|
return err
|
|
}
|
|
return node.Decode(out)
|
|
}
|
|
|
|
func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
|
|
if depth > maxDepth {
|
|
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
|
|
}
|
|
// Handle alias nodes by following the Alias pointer
|
|
if node.Kind == yaml.AliasNode && node.Alias != nil {
|
|
return checkYAMLDepth(node.Alias, depth, maxDepth)
|
|
}
|
|
for _, child := range node.Content {
|
|
if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil {
|
|
return err
|
|
}
|
|
}
|
|
return nil
|
|
}
|
|
```
|
|
|
|
We implement a custom AST-based depth/node-count walk (`checkYAMLDepth`) rather than relying on library decoder options. This gives us precise control over how depth is counted across aliases and anchors, with a depth-aware validated map to prevent alias depth bypass.
|
|
|
|
## State/Data Model
|
|
|
|
No new state. Same `Persona` struct, just different parsing.
|
|
|
|
## Error Cases
|
|
|
|
| Error | Handling |
|
|
|-------|----------|
|
|
| Invalid YAML syntax | Return parse error with source file |
|
|
| Deeply nested YAML | Library rejects (v1.16.0+ fix) |
|
|
| Unknown extension | Fall back to JSON parsing |
|
|
| Missing required fields | Validation rejects after parse |
|
|
|
|
## Edge Cases
|
|
|
|
- File with `.json` extension but YAML content → JSON parse fails, user sees error
|
|
- File with no extension → defaults to JSON
|
|
- Embedded persona reference like `builtin:security` → detect by embed path (`personas/X.yaml`)
|
|
|
|
## Testing Strategy
|
|
|
|
1. Unit tests for YAML parsing (valid, invalid, deeply nested)
|
|
2. Unit tests for extension detection
|
|
3. Integration test for built-in personas (now YAML)
|
|
4. Backwards compat test: verify JSON still works for external files
|
|
|
|
## Completion Checklist
|
|
|
|
1. [ ] `go-yaml` dependency added at v1.16.0+
|
|
2. [ ] Extension detection uses case-insensitive comparison
|
|
3. [ ] YAML parse errors include source file name
|
|
4. [ ] JSON parsing still works for `.json` files
|
|
5. [ ] Built-in personas converted to YAML with readable multi-line strings
|
|
6. [ ] Embed directive updated to include `*.yaml`
|
|
7. [ ] Test for deeply nested YAML rejection
|
|
8. [ ] All existing tests pass
|
|
|
|
## Open Questions
|
|
|
|
- Should we support both `.yaml` AND `.yml`? Issue says `.yaml` only for consistency, but some users expect `.yml`. **Decision:** Support both for reading, recommend `.yaml` in docs.
|
|
- Should we add a "format" field to detect mismatched extension/content? **Decision:** No, keep it simple. Extension determines format.
|