Files
review-bot/docs/DESIGN-57-yaml-persona.md
T
claw b5f17ddfc4
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 38s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m18s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m20s
fix(security): prevent alias depth bypass in YAML validator
The global 'seen' set allowed anchored subtrees validated at a shallow
depth to be skipped when later referenced via alias at a greater depth.
This could let effective nesting exceed MaxYAMLDepth, enabling DoS.

Fix: replace the single 'seen' set with two tracking maps:
- validated (node -> min depth): only short-circuits when current depth
  <= previously validated depth; re-checks at deeper contexts.
- visiting (node -> bool): per-path recursion stack for true cycle
  detection (breaks alias loops without suppressing depth checks).

Add TestYAMLAliasDepthBypass that constructs a document with an
anchored 15-level subtree referenced via alias under 6 levels of
nesting, verifying the combined effective depth (22) is rejected.

Addresses security-review-bot findings on review #2774.
2026-05-12 14:07:05 -07:00

4.0 KiB

Design: YAML Support for Persona Files (#57)

Problem

JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.

Constraints

  • Backwards compatibility: existing JSON personas must continue to work
  • Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
  • Consistency: use .yaml extension (not .yml)
  • Library: use github.com/goccy/go-yaml v1.16.0+ (approved in CONVENTIONS.md); we implement custom AST-based depth/node-count checks for precise alias-aware validation

Proposed Approach

  1. Update parsePersona to detect format from file extension
  2. Add YAML parsing with explicit depth limit (defense in depth)
  3. Keep JSON as fallback for files without .yaml/.yml extension
  4. Convert built-in personas to YAML format
  5. Update embed directive to include both formats

File Extension Detection

func parsePersona(data []byte, source string) (*Persona, error) {
    isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
    if isYAML {
        return parseYAML(data, source)
    }
    return parseJSON(data, source)
}

YAML Parsing with Depth Protection

func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
    var node yaml.Node
    dec := yaml.NewDecoder(bytes.NewReader(data))
    if err := dec.Decode(&node); err != nil {
        return err
    }
    if err := checkYAMLDepth(&node, 0, maxDepth); err != nil {
        return err
    }
    return node.Decode(out)
}

func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
    if depth > maxDepth {
        return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
    }
    // Handle alias nodes by following the Alias pointer
    if node.Kind == yaml.AliasNode && node.Alias != nil {
        return checkYAMLDepth(node.Alias, depth, maxDepth)
    }
    for _, child := range node.Content {
        if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil {
            return err
        }
    }
    return nil
}

We implement a custom AST-based depth/node-count walk (checkYAMLDepth) rather than relying on library decoder options. This gives us precise control over how depth is counted across aliases and anchors, with a depth-aware validated map to prevent alias depth bypass.

State/Data Model

No new state. Same Persona struct, just different parsing.

Error Cases

Error Handling
Invalid YAML syntax Return parse error with source file
Deeply nested YAML Library rejects (v1.16.0+ fix)
Unknown extension Fall back to JSON parsing
Missing required fields Validation rejects after parse

Edge Cases

  • File with .json extension but YAML content → JSON parse fails, user sees error
  • File with no extension → defaults to JSON
  • Embedded persona reference like builtin:security → detect by embed path (personas/X.yaml)

Testing Strategy

  1. Unit tests for YAML parsing (valid, invalid, deeply nested)
  2. Unit tests for extension detection
  3. Integration test for built-in personas (now YAML)
  4. Backwards compat test: verify JSON still works for external files

Completion Checklist

  1. go-yaml dependency added at v1.16.0+
  2. Extension detection uses case-insensitive comparison
  3. YAML parse errors include source file name
  4. JSON parsing still works for .json files
  5. Built-in personas converted to YAML with readable multi-line strings
  6. Embed directive updated to include *.yaml
  7. Test for deeply nested YAML rejection
  8. All existing tests pass

Open Questions

  • Should we support both .yaml AND .yml? Issue says .yaml only for consistency, but some users expect .yml. Decision: Support both for reading, recommend .yaml in docs.
  • Should we add a "format" field to detect mismatched extension/content? Decision: No, keep it simple. Extension determines format.