c3e8f0f231
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 9m32s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m53s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 10m52s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 11m0s
MAJOR fixes: - Remove false security claim about gopkg.in/yaml.v3 having built-in depth protection - Add explicit YAML depth limiting via yaml.Node API (MaxYAMLDepth=20) - Add file size limit for persona files (MaxPersonaFileSize=64KB) - Add test for deeply nested YAML rejection MINOR fixes: - Add sort.Strings to ListBuiltinPersonas for deterministic ordering - Update design doc to reflect actual library used (gopkg.in/yaml.v3) - Update README: 'Zero dependencies' → 'Minimal dependencies' - Add test for file size limit - Add test for sorted persona list
3.4 KiB
3.4 KiB
Design: YAML Support for Persona Files (#57)
Problem
JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.
Constraints
- Backwards compatibility: existing JSON personas must continue to work
- Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
- Consistency: use
.yamlextension (not.yml) - Library: use
gopkg.in/yaml.v3(approved in CONVENTIONS.md) with explicit depth limiting
Proposed Approach
- Update
parsePersonato detect format from file extension - Add YAML parsing with explicit depth limit (defense in depth)
- Keep JSON as fallback for files without
.yaml/.ymlextension - Convert built-in personas to YAML format
- Update embed directive to include both formats
File Extension Detection
func parsePersona(data []byte, source string) (*Persona, error) {
isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
if isYAML {
return parseYAML(data, source)
}
return parseJSON(data, source)
}
YAML Parsing with Depth Protection
func parseYAML(data []byte, source string) (*Persona, error) {
var p Persona
// gopkg.in/yaml.v3 does NOT have built-in depth limiting.
// Use explicit depth check via yaml.Node API.
if err := yaml.Unmarshal(data, &p); err != nil {
return nil, fmt.Errorf("parse persona %s: %w", source, err)
}
if err := validatePersona(&p, source); err != nil {
return nil, err
}
return &p, nil
}
The gopkg.in/yaml.v3 library does not have built-in depth protection, so we implement explicit depth checking by first decoding into a yaml.Node, walking the tree to verify depth, then decoding into the target struct.
State/Data Model
No new state. Same Persona struct, just different parsing.
Error Cases
| Error | Handling |
|---|---|
| Invalid YAML syntax | Return parse error with source file |
| Deeply nested YAML | Library rejects (v1.16.0+ fix) |
| Unknown extension | Fall back to JSON parsing |
| Missing required fields | Validation rejects after parse |
Edge Cases
- File with
.jsonextension but YAML content → JSON parse fails, user sees error - File with no extension → defaults to JSON
- Embedded persona reference like
builtin:security→ detect by embed path (personas/X.yaml)
Testing Strategy
- Unit tests for YAML parsing (valid, invalid, deeply nested)
- Unit tests for extension detection
- Integration test for built-in personas (now YAML)
- Backwards compat test: verify JSON still works for external files
Completion Checklist
go-yamldependency added at v1.16.0+- Extension detection uses case-insensitive comparison
- YAML parse errors include source file name
- JSON parsing still works for
.jsonfiles - Built-in personas converted to YAML with readable multi-line strings
- Embed directive updated to include
*.yaml - Test for deeply nested YAML rejection
- All existing tests pass
Open Questions
- Should we support both
.yamlAND.yml? Issue says.yamlonly for consistency, but some users expect.yml. Decision: Support both for reading, recommend.yamlin docs. - Should we add a "format" field to detect mismatched extension/content? Decision: No, keep it simple. Extension determines format.