Files
review-bot/docs/DESIGN-57-yaml-persona.md
T
claw 3a1e5e443e fix(review): address feedback from reviews 2788, 2789, 2791
- Move nodeCount increment after cycle detection to avoid over-counting
  cyclic references (sonnet #2)
- Use underscores in test case names used as filenames (sonnet #3)
- Fix function comment: 'prevent silent data loss' → 'prevent confusing
  behavior where additional documents are silently ignored' (sonnet #4)
- Mark design doc pseudocode as historical since implementation uses
  goccy/go-yaml ast.Node, not gopkg.in/yaml.v3 yaml.Node (sonnet #5)
2026-05-13 06:44:13 -07:00

4.3 KiB

Design: YAML Support for Persona Files (#57)

Problem

JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.

Constraints

  • Backwards compatibility: existing JSON personas must continue to work
  • Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
  • Consistency: use .yaml extension (not .yml)
  • Library: use github.com/goccy/go-yaml v1.16.0+ (approved in CONVENTIONS.md); we implement custom AST-based depth/node-count checks for precise alias-aware validation

Proposed Approach

  1. Update parsePersona to detect format from file extension
  2. Add YAML parsing with explicit depth limit (defense in depth)
  3. Keep JSON as fallback for files without .yaml/.yml extension
  4. Convert built-in personas to YAML format
  5. Update embed directive to include both formats

File Extension Detection

func parsePersona(data []byte, source string) (*Persona, error) {
    isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
    if isYAML {
        return parseYAML(data, source)
    }
    return parseJSON(data, source)
}

YAML Parsing with Depth Protection

Note: The pseudocode below reflects the initial design using gopkg.in/yaml.v3 types (yaml.Node). The actual implementation uses github.com/goccy/go-yaml with ast.Node-based traversal, dual-map cycle/depth tracking, and node-count limits. See review/persona.go for the current implementation.

func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
    var node yaml.Node
    dec := yaml.NewDecoder(bytes.NewReader(data))
    if err := dec.Decode(&node); err != nil {
        return err
    }
    if err := checkYAMLDepth(&node, 0, maxDepth); err != nil {
        return err
    }
    return node.Decode(out)
}

func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
    if depth > maxDepth {
        return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
    }
    // Handle alias nodes by following the Alias pointer
    if node.Kind == yaml.AliasNode && node.Alias != nil {
        return checkYAMLDepth(node.Alias, depth, maxDepth)
    }
    for _, child := range node.Content {
        if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil {
            return err
        }
    }
    return nil
}

We implement a custom AST-based depth/node-count walk (checkYAMLDepth) rather than relying on library decoder options. This gives us precise control over how depth is counted across aliases and anchors, with a depth-aware validated map to prevent alias depth bypass.

State/Data Model

No new state. Same Persona struct, just different parsing.

Error Cases

Error Handling
Invalid YAML syntax Return parse error with source file
Deeply nested YAML Library rejects (v1.16.0+ fix)
Unknown extension Fall back to JSON parsing
Missing required fields Validation rejects after parse

Edge Cases

  • File with .json extension but YAML content → JSON parse fails, user sees error
  • File with no extension → defaults to JSON
  • Embedded persona reference like builtin:security → detect by embed path (personas/X.yaml)

Testing Strategy

  1. Unit tests for YAML parsing (valid, invalid, deeply nested)
  2. Unit tests for extension detection
  3. Integration test for built-in personas (now YAML)
  4. Backwards compat test: verify JSON still works for external files

Completion Checklist

  1. go-yaml dependency added at v1.16.0+
  2. Extension detection uses case-insensitive comparison
  3. YAML parse errors include source file name
  4. JSON parsing still works for .json files
  5. Built-in personas converted to YAML with readable multi-line strings
  6. Embed directive updated to include *.yaml
  7. Test for deeply nested YAML rejection
  8. All existing tests pass

Open Questions

  • Should we support both .yaml AND .yml? Issue says .yaml only for consistency, but some users expect .yml. Decision: Support both for reading, recommend .yaml in docs.
  • Should we add a "format" field to detect mismatched extension/content? Decision: No, keep it simple. Extension determines format.