fix(deps): replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml #89

Merged
aweiker merged 13 commits from review-bot-issue-87 into main 2026-05-13 03:47:02 +00:00
3 changed files with 20 additions and 37 deletions
Showing only changes of commit 01b6af03a8 - Show all commits
+8 -34
View File
3
@@ -33,42 +33,16 @@ func parsePersona(data []byte, source string) (*Persona, error) {
Review

[MINOR] The design doc references the updated approach correctly but removes the prior code example and now relies on narrative. Consider adding a brief code snippet or explicit mention that MergeKey (<<) merges are traversed to avoid confusion about edge cases.

**[MINOR]** The design doc references the updated approach correctly but removes the prior code example and now relies on narrative. Consider adding a brief code snippet or explicit mention that MergeKey (<<) merges are traversed to avoid confusion about edge cases.
### YAML Parsing with Depth Protection
> **Note:** The pseudocode below reflects the initial design using `gopkg.in/yaml.v3`
> types (`yaml.Node`). The actual implementation uses `github.com/goccy/go-yaml`
> with `ast.Node`-based traversal, dual-map cycle/depth tracking, and node-count
> limits. See `review/persona.go` for the current implementation.
We implement a custom AST-based depth/node-count walk (`checkYAMLDepth` in
`review/persona.go`) rather than relying on library decoder options. Key design
decisions:
```go
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
var node yaml.Node
dec := yaml.NewDecoder(bytes.NewReader(data))
if err := dec.Decode(&node); err != nil {
return err
}
if err := checkYAMLDepth(&node, 0, maxDepth); err != nil {
return err
}
return node.Decode(out)
}
- **Library:** `github.com/goccy/go-yaml` with `ast.Node`-based traversal
- **Dual-map tracking:** `validated` (depth-aware short-circuit) + `visiting` (cycle detection)
- **Node-count limit:** Conservative overcounting bounds total validation work
- **Alias-aware depth:** Aliases increment depth and are re-checked when encountered at greater depths
func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
if depth > maxDepth {
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
}
// Handle alias nodes by following the Alias pointer
if node.Kind == yaml.AliasNode && node.Alias != nil {
return checkYAMLDepth(node.Alias, depth, maxDepth)
}
for _, child := range node.Content {
if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil {
return err
}
}
return nil
}
```
We implement a custom AST-based depth/node-count walk (`checkYAMLDepth`) rather than relying on library decoder options. This gives us precise control over how depth is counted across aliases and anchors, with a depth-aware validated map to prevent alias depth bypass.
See `review/persona.go:checkYAMLDepth` for the authoritative implementation.
## State/Data Model
1
+5 -1
View File
34
@@ -224,7 +224,11 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
}
Review

[MINOR] The validated map stores depth as the value but is described as storing 'the maximum depth at which it was previously checked.' The current code stores the current visit depth, not the maximum. If a node is first visited at depth 10 and later at depth 5, validated[node] becomes 5 — the shorter path. The short-circuit condition depth <= prevDepth then allows re-traversal at depth 6 even though we already checked at depth 10. In practice this isn't a security issue (the alias depth bypass test covers the real attack vector), but the comment and variable name ('validated' implying completeness) are slightly misleading. Consider renaming to visitedAtDepth and updating the comment to say 'the last depth at which this node was validated' rather than 'maximum'.

**[MINOR]** The `validated` map stores `depth` as the value but is described as storing 'the maximum depth at which it was previously checked.' The current code stores the *current* visit depth, not the maximum. If a node is first visited at depth 10 and later at depth 5, `validated[node]` becomes 5 — the shorter path. The short-circuit condition `depth <= prevDepth` then allows re-traversal at depth 6 even though we already checked at depth 10. In practice this isn't a security issue (the alias depth bypass test covers the real attack vector), but the comment and variable name ('validated' implying completeness) are slightly misleading. Consider renaming to `visitedAtDepth` and updating the comment to say 'the last depth at which this node was validated' rather than 'maximum'.
Review

[MAJOR] checkYAMLDepth does not handle ast.MergeKeyNode and treats it as a leaf (per the default case comment). YAML merge keys (<<) can contain aliases to mappings; not traversing MergeKeyNode children can bypass the depth enforcement when deep structures are merged, undermining the DoS protections. Add an explicit case to traverse MergeKeyNode's referenced values (typically aliases) and continue depth/node counting.

**[MAJOR]** checkYAMLDepth does not handle ast.MergeKeyNode and treats it as a leaf (per the default case comment). YAML merge keys (<<) can contain aliases to mappings; not traversing MergeKeyNode children can bypass the depth enforcement when deep structures are merged, undermining the DoS protections. Add an explicit case to traverse MergeKeyNode's referenced values (typically aliases) and continue depth/node counting.
// Track total nodes visited as defense-in-depth against wide-but-shallow attacks.
// Placed after cycle detection to avoid over-counting cyclic references.
// Placed after cycle detection but before the depth-aware short-circuit. This means
// nodes revisited at shallower depths (via aliases) are counted each time they are
// encountered — intentional conservative overcounting. This bounds the total work
Outdated
Review

[MINOR] Cycle detection in checkYAMLDepth returns nil without error. If the downstream decoder ever mishandles alias cycles, this could allow potentially problematic inputs to proceed, risking DoS. Consider failing closed by returning an explicit error on detected cycles.

**[MINOR]** Cycle detection in checkYAMLDepth returns nil without error. If the downstream decoder ever mishandles alias cycles, this could allow potentially problematic inputs to proceed, risking DoS. Consider failing closed by returning an explicit error on detected cycles.
// performed during validation rather than tracking unique nodes, which is the safer
// security posture for untrusted YAML input.
*nodeCount++
if *nodeCount > maxNodes {
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
Review

[MINOR] The validated map stores the depth at which a node was last validated, used for depth-aware short-circuiting. However, the map is keyed by ast.Node (interface type). Interface map keys use pointer identity for concrete pointer types, which is correct here. But the comment says "maps each node to the minimum depth" — the map is actually storing the current visit's depth (overwriting each time the node is visited at a greater depth), not the minimum. The comment should say "most recently validated depth" or "maximum depth at which validated" to avoid confusion.

**[MINOR]** The `validated` map stores the depth at which a node was last validated, used for depth-aware short-circuiting. However, the map is keyed by `ast.Node` (interface type). Interface map keys use pointer identity for concrete pointer types, which is correct here. But the comment says "maps each node to the minimum depth" — the map is actually storing the *current visit's* depth (overwriting each time the node is visited at a greater depth), not the minimum. The comment should say "most recently validated depth" or "maximum depth at which validated" to avoid confusion.
10
+7 -2
View File
2
@@ -459,8 +459,13 @@ func TestYAMLDeeplyNestedRejection(t *testing.T) {
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
path := filepath.Join(dir, "deeply-nested.yaml")
// Build a deeply nested YAML structure that exceeds MaxYAMLDepth (20).
Review

[MINOR] TestYAMLDeeplyNestedRejection comment says 'With 25 levels, we exceed MaxYAMLDepth (20)' but the depth counting depends on both MappingNode and MappingValueNode traversal (each contributes depth+1 for key, depth+1 for value). The comment would benefit from a concrete trace of how the depth accumulates to make the test self-documenting and easier to maintain if MaxYAMLDepth changes.

**[MINOR]** TestYAMLDeeplyNestedRejection comment says 'With 25 levels, we exceed MaxYAMLDepth (20)' but the depth counting depends on both MappingNode and MappingValueNode traversal (each contributes depth+1 for key, depth+1 for value). The comment would benefit from a concrete trace of how the depth accumulates to make the test self-documenting and easier to maintain if MaxYAMLDepth changes.
// Each nested mapping key generates a MappingValueNode, incrementing depth
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
// by 1 per level in the AST walk. With 25 levels, we exceed MaxYAMLDepth (20).
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
// Depth accumulation trace for "nested: \n level0: \n level1: ...":
Outdated
Review

[NIT] The comment in TestYAMLDeeplyNestedRejection was updated from 'Each level adds 2 to the depth count (key + value mapping)' to 'Each level adds to the depth count via mapping values.' The new comment is vague — it doesn't explain the actual counting semantics with the new AST walker, making it harder to reason about whether the test correctly triggers the depth limit. Consider specifying exactly how many depth units each level of nesting adds under the new implementation.

**[NIT]** The comment in `TestYAMLDeeplyNestedRejection` was updated from 'Each level adds 2 to the depth count (key + value mapping)' to 'Each level adds to the depth count via mapping values.' The new comment is vague — it doesn't explain the actual counting semantics with the new AST walker, making it harder to reason about whether the test correctly triggers the depth limit. Consider specifying exactly how many depth units each level of nesting adds under the new implementation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
// - Document root parsed at depth 0
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
// - Root MappingNode children (MappingValueNodes) visited at depth 1
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
// - "nested" MappingValueNode: key at depth 2, value at depth 2
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
// - Each levelN mapping adds +1 depth (MappingNode → MappingValueNode → value)
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
// - After 25 levels: effective depth reaches ~27, well past MaxYAMLDepth (20)
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
// The test uses 25 levels to provide a comfortable margin above the limit.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
var sb strings.Builder
Review

[NIT] The TestYAMLDeeplyNestedRejection comment says 'Each levelN mapping adds +1 depth (MappingNode → MappingValueNode → value)' but the trace shows MappingValueNode contributes depth via both Key and Value children. The actual depth accumulation might differ from the comment's description depending on how goccy/go-yaml structures the AST for nested maps. The test uses 25 levels with a 'comfortable margin' which compensates, but the depth trace comment may be imprecise.

**[NIT]** The `TestYAMLDeeplyNestedRejection` comment says 'Each levelN mapping adds +1 depth (MappingNode → MappingValueNode → value)' but the trace shows MappingValueNode contributes depth via both Key and Value children. The actual depth accumulation might differ from the comment's description depending on how goccy/go-yaml structures the AST for nested maps. The test uses 25 levels with a 'comfortable margin' which compensates, but the depth trace comment may be imprecise.
Review

Acknowledged. The test comment was revised to avoid the imprecise per-level arithmetic:

Exact depth per level depends on AST structure (MappingNode wrapping), but 25 levels reliably exceeds MaxYAMLDepth (20) with comfortable margin.

This intentionally avoids claiming a specific depth-per-level formula. The 25-level margin makes the test robust regardless of the precise per-level increment. No change needed.

Acknowledged. The test comment was revised to avoid the imprecise per-level arithmetic: > Exact depth per level depends on AST structure (MappingNode wrapping), but 25 levels reliably exceeds MaxYAMLDepth (20) with comfortable margin. This intentionally avoids claiming a specific depth-per-level formula. The 25-level margin makes the test robust regardless of the precise per-level increment. No change needed.
sb.WriteString("name: test\nidentity: test\nnested:\n")
indent := " "
10
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.