Files
review-bot/docs/DESIGN-57-yaml-persona.md
T
Rodin 6035afeea7
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 9m33s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m51s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 11m13s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 11m25s
fix: address MINOR review findings from c3e8f0f review
2026-05-10 16:29:44 -07:00

3.9 KiB

Design: YAML Support for Persona Files (#57)

Problem

JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.

Constraints

  • Backwards compatibility: existing JSON personas must continue to work
  • Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
  • Consistency: use .yaml extension (not .yml)
  • Library: use gopkg.in/yaml.v3 (approved in CONVENTIONS.md) with explicit depth limiting

Proposed Approach

  1. Update parsePersona to detect format from file extension
  2. Add YAML parsing with explicit depth limit (defense in depth)
  3. Keep JSON as fallback for files without .yaml/.yml extension
  4. Convert built-in personas to YAML format
  5. Update embed directive to include both formats

File Extension Detection

func parsePersona(data []byte, source string) (*Persona, error) {
    isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
    if isYAML {
        return parseYAML(data, source)
    }
    return parseJSON(data, source)
}

YAML Parsing with Depth Protection

func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
    var node yaml.Node
    dec := yaml.NewDecoder(bytes.NewReader(data))
    if err := dec.Decode(&node); err != nil {
        return err
    }
    if err := checkYAMLDepth(&node, 0, maxDepth); err != nil {
        return err
    }
    return node.Decode(out)
}

func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
    if depth > maxDepth {
        return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
    }
    // Handle alias nodes by following the Alias pointer
    if node.Kind == yaml.AliasNode && node.Alias != nil {
        return checkYAMLDepth(node.Alias, depth, maxDepth)
    }
    for _, child := range node.Content {
        if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil {
            return err
        }
    }
    return nil
}

The gopkg.in/yaml.v3 library does not have built-in depth protection, so we implement explicit depth checking by first decoding into a yaml.Node, walking the tree to verify depth (including alias resolution), then decoding into the target struct.

State/Data Model

No new state. Same Persona struct, just different parsing.

Error Cases

Error Handling
Invalid YAML syntax Return parse error with source file
Deeply nested YAML Library rejects (v1.16.0+ fix)
Unknown extension Fall back to JSON parsing
Missing required fields Validation rejects after parse

Edge Cases

  • File with .json extension but YAML content → JSON parse fails, user sees error
  • File with no extension → defaults to JSON
  • Embedded persona reference like builtin:security → detect by embed path (personas/X.yaml)

Testing Strategy

  1. Unit tests for YAML parsing (valid, invalid, deeply nested)
  2. Unit tests for extension detection
  3. Integration test for built-in personas (now YAML)
  4. Backwards compat test: verify JSON still works for external files

Completion Checklist

  1. go-yaml dependency added at v1.16.0+
  2. Extension detection uses case-insensitive comparison
  3. YAML parse errors include source file name
  4. JSON parsing still works for .json files
  5. Built-in personas converted to YAML with readable multi-line strings
  6. Embed directive updated to include *.yaml
  7. Test for deeply nested YAML rejection
  8. All existing tests pass

Open Questions

  • Should we support both .yaml AND .yml? Issue says .yaml only for consistency, but some users expect .yml. Decision: Support both for reading, recommend .yaml in docs.
  • Should we add a "format" field to detect mismatched extension/content? Decision: No, keep it simple. Extension determines format.