fix(deps): replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml #89

Merged
aweiker merged 13 commits from review-bot-issue-87 into main 2026-05-13 03:47:02 +00:00
3 changed files with 17 additions and 10 deletions
Showing only changes of commit 80091fb080 - Show all commits
+5
View File
2
@@ -33,6 +33,11 @@ func parsePersona(data []byte, source string) (*Persona, error) {
Review

[MINOR] The design doc references the updated approach correctly but removes the prior code example and now relies on narrative. Consider adding a brief code snippet or explicit mention that MergeKey (<<) merges are traversed to avoid confusion about edge cases.

**[MINOR]** The design doc references the updated approach correctly but removes the prior code example and now relies on narrative. Consider adding a brief code snippet or explicit mention that MergeKey (<<) merges are traversed to avoid confusion about edge cases.
### YAML Parsing with Depth Protection
> **Note:** The pseudocode below reflects the initial design using `gopkg.in/yaml.v3`
> types (`yaml.Node`). The actual implementation uses `github.com/goccy/go-yaml`
> with `ast.Node`-based traversal, dual-map cycle/depth tracking, and node-count
> limits. See `review/persona.go` for the current implementation.
```go
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
var node yaml.Node
+9 -7
View File
3
@@ -161,7 +161,8 @@ func parsePersona(data []byte, source string) (*Persona, error) {
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting
// and strict field checking. This protects against stack exhaustion from deeply
// nested structures and catches typos in field names.
// Multi-document YAML files are rejected to prevent silent data loss.
// Multi-document YAML files are rejected to prevent confusing behavior
// where additional documents are silently ignored.
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
// First pass: parse into AST to check depth limits, node counts, and
// multi-document rejection. This prevents stack exhaustion before we
10
@@ -214,12 +215,6 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
}
// Track total nodes visited as defense-in-depth against wide-but-shallow attacks.
*nodeCount++
if *nodeCount > maxNodes {
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
}
// Cycle detection: if we're currently visiting this node on the current
// recursion path, it's a cycle (e.g., alias pointing to an ancestor).
// Return nil to break the cycle without error — cycles are a structural
@@ -228,6 +223,13 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
return nil
}
Review

[MINOR] The validated map stores depth as the value but is described as storing 'the maximum depth at which it was previously checked.' The current code stores the current visit depth, not the maximum. If a node is first visited at depth 10 and later at depth 5, validated[node] becomes 5 — the shorter path. The short-circuit condition depth <= prevDepth then allows re-traversal at depth 6 even though we already checked at depth 10. In practice this isn't a security issue (the alias depth bypass test covers the real attack vector), but the comment and variable name ('validated' implying completeness) are slightly misleading. Consider renaming to visitedAtDepth and updating the comment to say 'the last depth at which this node was validated' rather than 'maximum'.

**[MINOR]** The `validated` map stores `depth` as the value but is described as storing 'the maximum depth at which it was previously checked.' The current code stores the *current* visit depth, not the maximum. If a node is first visited at depth 10 and later at depth 5, `validated[node]` becomes 5 — the shorter path. The short-circuit condition `depth <= prevDepth` then allows re-traversal at depth 6 even though we already checked at depth 10. In practice this isn't a security issue (the alias depth bypass test covers the real attack vector), but the comment and variable name ('validated' implying completeness) are slightly misleading. Consider renaming to `visitedAtDepth` and updating the comment to say 'the last depth at which this node was validated' rather than 'maximum'.
Review

[MAJOR] checkYAMLDepth does not handle ast.MergeKeyNode and treats it as a leaf (per the default case comment). YAML merge keys (<<) can contain aliases to mappings; not traversing MergeKeyNode children can bypass the depth enforcement when deep structures are merged, undermining the DoS protections. Add an explicit case to traverse MergeKeyNode's referenced values (typically aliases) and continue depth/node counting.

**[MAJOR]** checkYAMLDepth does not handle ast.MergeKeyNode and treats it as a leaf (per the default case comment). YAML merge keys (<<) can contain aliases to mappings; not traversing MergeKeyNode children can bypass the depth enforcement when deep structures are merged, undermining the DoS protections. Add an explicit case to traverse MergeKeyNode's referenced values (typically aliases) and continue depth/node counting.
// Track total nodes visited as defense-in-depth against wide-but-shallow attacks.
// Placed after cycle detection to avoid over-counting cyclic references.
*nodeCount++
if *nodeCount > maxNodes {
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
}
// Depth-aware short-circuit: only skip re-checking a node if we previously
// validated it at the same or deeper effective depth. If this visit is at a
Review

[MINOR] The validated map stores the depth at which a node was last validated, used for depth-aware short-circuiting. However, the map is keyed by ast.Node (interface type). Interface map keys use pointer identity for concrete pointer types, which is correct here. But the comment says "maps each node to the minimum depth" — the map is actually storing the current visit's depth (overwriting each time the node is visited at a greater depth), not the minimum. The comment should say "most recently validated depth" or "maximum depth at which validated" to avoid confusion.

**[MINOR]** The `validated` map stores the depth at which a node was last validated, used for depth-aware short-circuiting. However, the map is keyed by `ast.Node` (interface type). Interface map keys use pointer identity for concrete pointer types, which is correct here. But the comment says "maps each node to the minimum depth" — the map is actually storing the *current visit's* depth (overwriting each time the node is visited at a greater depth), not the minimum. The comment should say "most recently validated depth" or "maximum depth at which validated" to avoid confusion.
// greater depth than before (e.g., alias referenced deeper in the tree),
7
+3 -3
View File
12
@@ -491,9 +491,9 @@ func TestYAMLEmptyFileRejection(t *testing.T) {
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
name string
content string
Review

[MINOR] The TestYAMLEmptyFileRejection test uses tc.name (e.g., "completely empty") as the filename fragment but writes all files to the same dir. The test case names contain spaces which are valid in filenames but could cause issues on some filesystems. More importantly, the test writes files with names like completely empty.yaml — using spaces in filenames is unusual and could cause subtle test issues. Prefer replacing spaces with underscores or hyphens.

**[MINOR]** The `TestYAMLEmptyFileRejection` test uses `tc.name` (e.g., `"completely empty"`) as the filename fragment but writes all files to the same `dir`. The test case names contain spaces which are valid in filenames but could cause issues on some filesystems. More importantly, the test writes files with names like `completely empty.yaml` — using spaces in filenames is unusual and could cause subtle test issues. Prefer replacing spaces with underscores or hyphens.
Review

[NIT] TestYAMLEmptyFileRejection has a redundant if err != nil guard on the second condition: if err != nil && !strings.Contains(...). Since the preceding if err == nil { t.Error(...); return } (implicit — actually the test doesn't return on the first error, it falls through) — actually the first check is if err == nil { t.Error(...) } without a return, so err could still be nil here. This means the strings.Contains check is guarded correctly with err != nil, but the test won't t.Error the wrong message if err == nil (it will have already errored on the first check). The pattern is slightly unusual compared to the rest of the file which uses early returns. Consistent style would be to add a return after the first t.Error, then drop the err != nil guard.

**[NIT]** `TestYAMLEmptyFileRejection` has a redundant `if err != nil` guard on the second condition: `if err != nil && !strings.Contains(...)`. Since the preceding `if err == nil { t.Error(...); return }` (implicit — actually the test doesn't return on the first error, it falls through) — actually the first check is `if err == nil { t.Error(...) }` without a return, so `err` could still be nil here. This means the `strings.Contains` check is guarded correctly with `err != nil`, but the test won't `t.Error` the wrong message if `err == nil` (it will have already errored on the first check). The pattern is slightly unusual compared to the rest of the file which uses early returns. Consistent style would be to add a `return` after the first `t.Error`, then drop the `err != nil` guard.
}{
{"completely empty", ""},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"whitespace only", " \n\n "},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"comment only", "# just a comment\n"},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"completely_empty", ""},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"whitespace_only", " \n\n "},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"comment_only", "# just a comment\n"},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
}
for _, tc := range tests {
1
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.