Files
review-bot/docs/DESIGN-57-yaml-persona.md
T
Rodin c3e8f0f231
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 9m32s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m53s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 10m52s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 11m0s
fix: address PR review findings
MAJOR fixes:
- Remove false security claim about gopkg.in/yaml.v3 having built-in depth protection
- Add explicit YAML depth limiting via yaml.Node API (MaxYAMLDepth=20)
- Add file size limit for persona files (MaxPersonaFileSize=64KB)
- Add test for deeply nested YAML rejection

MINOR fixes:
- Add sort.Strings to ListBuiltinPersonas for deterministic ordering
- Update design doc to reflect actual library used (gopkg.in/yaml.v3)
- Update README: 'Zero dependencies' → 'Minimal dependencies'
- Add test for file size limit
- Add test for sorted persona list
2026-05-10 14:43:31 -07:00

3.4 KiB

Design: YAML Support for Persona Files (#57)

Problem

JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.

Constraints

  • Backwards compatibility: existing JSON personas must continue to work
  • Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
  • Consistency: use .yaml extension (not .yml)
  • Library: use gopkg.in/yaml.v3 (approved in CONVENTIONS.md) with explicit depth limiting

Proposed Approach

  1. Update parsePersona to detect format from file extension
  2. Add YAML parsing with explicit depth limit (defense in depth)
  3. Keep JSON as fallback for files without .yaml/.yml extension
  4. Convert built-in personas to YAML format
  5. Update embed directive to include both formats

File Extension Detection

func parsePersona(data []byte, source string) (*Persona, error) {
    isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
    if isYAML {
        return parseYAML(data, source)
    }
    return parseJSON(data, source)
}

YAML Parsing with Depth Protection

func parseYAML(data []byte, source string) (*Persona, error) {
    var p Persona
    // gopkg.in/yaml.v3 does NOT have built-in depth limiting.
    // Use explicit depth check via yaml.Node API.
    if err := yaml.Unmarshal(data, &p); err != nil {
        return nil, fmt.Errorf("parse persona %s: %w", source, err)
    }
    if err := validatePersona(&p, source); err != nil {
        return nil, err
    }
    return &p, nil
}

The gopkg.in/yaml.v3 library does not have built-in depth protection, so we implement explicit depth checking by first decoding into a yaml.Node, walking the tree to verify depth, then decoding into the target struct.

State/Data Model

No new state. Same Persona struct, just different parsing.

Error Cases

Error Handling
Invalid YAML syntax Return parse error with source file
Deeply nested YAML Library rejects (v1.16.0+ fix)
Unknown extension Fall back to JSON parsing
Missing required fields Validation rejects after parse

Edge Cases

  • File with .json extension but YAML content → JSON parse fails, user sees error
  • File with no extension → defaults to JSON
  • Embedded persona reference like builtin:security → detect by embed path (personas/X.yaml)

Testing Strategy

  1. Unit tests for YAML parsing (valid, invalid, deeply nested)
  2. Unit tests for extension detection
  3. Integration test for built-in personas (now YAML)
  4. Backwards compat test: verify JSON still works for external files

Completion Checklist

  1. go-yaml dependency added at v1.16.0+
  2. Extension detection uses case-insensitive comparison
  3. YAML parse errors include source file name
  4. JSON parsing still works for .json files
  5. Built-in personas converted to YAML with readable multi-line strings
  6. Embed directive updated to include *.yaml
  7. Test for deeply nested YAML rejection
  8. All existing tests pass

Open Questions

  • Should we support both .yaml AND .yml? Issue says .yaml only for consistency, but some users expect .yml. Decision: Support both for reading, recommend .yaml in docs.
  • Should we add a "format" field to detect mismatched extension/content? Decision: No, keep it simple. Extension determines format.