Files
review-bot/docs/DESIGN-57-yaml-persona.md
T
claw 5cedeee9f4
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 39s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m12s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m33s
address self-review findings on PR #89
MINOR fixes:
- docs/DESIGN-57-yaml-persona.md: fix Error Cases table entry to reflect
  custom AST walk (checkYAMLDepth) instead of stale library-level reference
- review/persona.go: add EOF check after JSON decode to reject trailing
  garbage after a valid JSON object (prevents silent acceptance of malformed
  input like '{"name":"x"}garbage')
- review/persona_test.go: add TestJSONTrailingContentRejected test

NIT fixes:
- review/persona.go: add default case to checkYAMLDepth switch with
  explanatory comment about scalar leaf nodes
- review/persona.go: document AnchorNode depth+1 conservative asymmetry
- review/persona.go: simplify redundant if-guard in ListBuiltinPersonas
2026-05-12 14:42:22 -07:00

88 lines
3.5 KiB
Markdown

# Design: YAML Support for Persona Files (#57)
## Problem
JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.
## Constraints
- Backwards compatibility: existing JSON personas must continue to work
- Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
- Consistency: use `.yaml` extension (not `.yml`)
- Library: use `github.com/goccy/go-yaml` v1.16.0+ (approved in CONVENTIONS.md); we implement custom AST-based depth/node-count checks for precise alias-aware validation
## Proposed Approach
1. **Update `parsePersona`** to detect format from file extension
2. **Add YAML parsing** with explicit depth limit (defense in depth)
3. **Keep JSON as fallback** for files without `.yaml`/`.yml` extension
4. **Convert built-in personas** to YAML format
5. **Update embed directive** to include both formats
### File Extension Detection
```go
func parsePersona(data []byte, source string) (*Persona, error) {
isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
if isYAML {
return parseYAML(data, source)
}
return parseJSON(data, source)
}
```
### YAML Parsing with Depth Protection
We implement a custom AST-based depth/node-count walk (`checkYAMLDepth` in
`review/persona.go`) rather than relying on library decoder options. Key design
decisions:
- **Library:** `github.com/goccy/go-yaml` with `ast.Node`-based traversal
- **Dual-map tracking:** `validated` (depth-aware short-circuit) + `visiting` (cycle detection)
- **Node-count limit:** Conservative overcounting bounds total validation work
- **Alias-aware depth:** Aliases increment depth and are re-checked when encountered at greater depths
See `review/persona.go:checkYAMLDepth` for the authoritative implementation.
## State/Data Model
No new state. Same `Persona` struct, just different parsing.
## Error Cases
| Error | Handling |
|-------|----------|
| Invalid YAML syntax | Return parse error with source file |
| Deeply nested YAML | Custom AST walk (`checkYAMLDepth`) rejects before decode |
| Unknown extension | Fall back to JSON parsing |
| Missing required fields | Validation rejects after parse |
## Edge Cases
- File with `.json` extension but YAML content → JSON parse fails, user sees error
- File with no extension → defaults to JSON
- Embedded persona reference like `builtin:security` → detect by embed path (`personas/X.yaml`)
## Testing Strategy
1. Unit tests for YAML parsing (valid, invalid, deeply nested)
2. Unit tests for extension detection
3. Integration test for built-in personas (now YAML)
4. Backwards compat test: verify JSON still works for external files
## Completion Checklist
1. [ ] `go-yaml` dependency added at v1.16.0+
2. [ ] Extension detection uses case-insensitive comparison
3. [ ] YAML parse errors include source file name
4. [ ] JSON parsing still works for `.json` files
5. [ ] Built-in personas converted to YAML with readable multi-line strings
6. [ ] Embed directive updated to include `*.yaml`
7. [ ] Test for deeply nested YAML rejection
8. [ ] All existing tests pass
## Open Questions
- Should we support both `.yaml` AND `.yml`? Issue says `.yaml` only for consistency, but some users expect `.yml`. **Decision:** Support both for reading, recommend `.yaml` in docs.
- Should we add a "format" field to detect mismatched extension/content? **Decision:** No, keep it simple. Extension determines format.