fix(deps): replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml #89

Merged
aweiker merged 13 commits from review-bot-issue-87 into main 2026-05-13 03:47:02 +00:00
2 changed files with 18 additions and 11 deletions
Showing only changes of commit b9b7be3b4e - Show all commits
+16 -9
View File
10
@@ -166,11 +166,10 @@ func parsePersona(data []byte, source string) (*Persona, error) {
return &p, nil
Outdated
Review

[MINOR] Depth/node-count enforcement occurs after parser.ParseBytes builds the AST. Extremely deep but small YAML (within 64KB) could still stress the parser via recursive descent before checks run, presenting a potential DoS vector. Consider a pre-parse sanity check (e.g., shallow indentation-depth scan) or using any available parser limits if supported.

**[MINOR]** Depth/node-count enforcement occurs after parser.ParseBytes builds the AST. Extremely deep but small YAML (within 64KB) could still stress the parser via recursive descent before checks run, presenting a potential DoS vector. Consider a pre-parse sanity check (e.g., shallow indentation-depth scan) or using any available parser limits if supported.
}
Outdated
Review

[MINOR] Empty YAML detection now returns a generic "empty YAML document" error; consider adding a unit test for empty YAML files to ensure this path is covered and the message is stable.

**[MINOR]** Empty YAML detection now returns a generic "empty YAML document" error; consider adding a unit test for empty YAML files to ensure this path is covered and the message is stable.
Outdated
Review

[NIT] If go-yaml provides decoder/parser options to enforce depth/node limits during parsing, consider enabling them in addition to the manual AST check to ensure protection prior to AST traversal. This would be a defense-in-depth improvement.

**[NIT]** If go-yaml provides decoder/parser options to enforce depth/node limits during parsing, consider enabling them in addition to the manual AST check to ensure protection prior to AST traversal. This would be a defense-in-depth improvement.
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting
// and strict field checking. This protects against stack exhaustion from deeply
// nested structures and catches typos in field names.
// Multi-document YAML files are rejected to prevent confusing behavior
// where additional documents are silently ignored.
// unmarshalYAMLWithDepthLimit unmarshals YAML data with three safety checks:
// - Depth limiting: rejects AST trees exceeding maxDepth to prevent stack exhaustion.
// - Multi-document rejection: prevents silent data loss from ignored extra documents.
// - Strict field checking: rejects unknown YAML keys to catch typos early.
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
// First pass: parse into AST to check depth limits, node counts, and
// multi-document rejection. This prevents stack exhaustion before we
25
@@ -247,10 +246,13 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
Outdated
Review

[MAJOR] Alias handling combined with the global 'seen' short-circuit lets deeply nested aliases evade depth checks. The code follows alias nodes (good), but because the target node may already be in the 'seen' set, subsequent deeper traversals are skipped, allowing overall nesting to exceed limits and enabling DoS.

**[MAJOR]** Alias handling combined with the global 'seen' short-circuit lets deeply nested aliases evade depth checks. The code follows alias nodes (good), but because the target node may already be in the 'seen' set, subsequent deeper traversals are skipped, allowing overall nesting to exceed limits and enabling DoS.
Review

[MINOR] The validated map uses ast.Node (interface) as a key, which relies on pointer equality for interface comparison. This works correctly for pointer types like *ast.MappingNode, but if any ast.Node implementation is a value type (non-pointer), two structurally identical nodes could hash to different entries or the same entry incorrectly. In practice, all goccy/go-yaml AST node types appear to be pointer types, so this is low risk, but it's an implicit assumption worth a comment.

**[MINOR]** The `validated` map uses `ast.Node` (interface) as a key, which relies on pointer equality for interface comparison. This works correctly for pointer types like `*ast.MappingNode`, but if any `ast.Node` implementation is a value type (non-pointer), two structurally identical nodes could hash to different entries or the same entry incorrectly. In practice, all goccy/go-yaml AST node types appear to be pointer types, so this is low risk, but it's an implicit assumption worth a comment.
Review

Already addressed. There's an explicit comment at the validated map access (the depth-aware short-circuit block):

Note: using ast.Node (interface) as map key relies on pointer identity, which is correct because all goccy/go-yaml AST node types are pointer receivers (*MappingNode, *SequenceNode, etc.), never value types.

This documents the assumption directly at the usage site. No change needed.

Already addressed. There's an explicit comment at the `validated` map access (the depth-aware short-circuit block): > Note: using ast.Node (interface) as map key relies on pointer identity, which is correct because all goccy/go-yaml AST node types are pointer receivers (*MappingNode, *SequenceNode, etc.), never value types. This documents the assumption directly at the usage site. No change needed.
}
Outdated
Review

[MINOR] The validated[node] = depth assignment stores the current depth as the 'minimum depth at which it was validated'. However the comment says 'maps each node to the minimum depth at which it was previously checked' and the short-circuit condition is depth <= prevDepth (skip if current depth is shallower or equal). This means validated stores the first depth seen (or the deepest so far), not the minimum. On the first visit validated[node] is set to depth, and subsequent visits only proceed if depth > prevDepth. So the map actually stores the shallowest depth visited. The comment is slightly confusing — 'minimum depth' is correct, but the explanation of re-checking logic could be clearer.

**[MINOR]** The `validated[node] = depth` assignment stores the current depth as the 'minimum depth at which it was validated'. However the comment says 'maps each node to the minimum depth at which it was previously checked' and the short-circuit condition is `depth <= prevDepth` (skip if current depth is shallower or equal). This means `validated` stores the *first* depth seen (or the deepest so far), not the minimum. On the first visit `validated[node]` is set to `depth`, and subsequent visits only proceed if `depth > prevDepth`. So the map actually stores the *shallowest* depth visited. The comment is slightly confusing — 'minimum depth' is correct, but the explanation of re-checking logic could be clearer.
Review

[NIT] The checkYAMLDepth function receives maxNodes int as a parameter but MaxYAMLNodes is passed as the constant from the call site. The maxNodes parameter is never varied between call sites — it's always MaxYAMLNodes. Could simplify by using the constant directly in the function, but having it as a parameter makes testing easier (allows injecting lower limits in tests). Current approach is acceptable.

**[NIT]** The `checkYAMLDepth` function receives `maxNodes int` as a parameter but `MaxYAMLNodes` is passed as the constant from the call site. The `maxNodes` parameter is never varied between call sites — it's always `MaxYAMLNodes`. Could simplify by using the constant directly in the function, but having it as a parameter makes testing easier (allows injecting lower limits in tests). Current approach is acceptable.
// Depth-aware short-circuit: only skip re-checking a node if we previously
// validated it at the same or deeper effective depth. If this visit is at a
// greater depth than before (e.g., alias referenced deeper in the tree),
// we must re-traverse to catch depth limit violations.
// Depth-aware short-circuit: skip re-validation only when the current visit
// depth is the same or shallower than the depth at which this node was
Outdated
Review

[MINOR] The checkYAMLDepth switch statement has no default case. While this is intentional (scalar types are leaf nodes and need no recursion), a comment inside the switch or a default: // scalar leaf node, no children case would make it clearer to future readers that the omission is deliberate rather than an oversight. The existing comment above the closing brace partially covers this but is outside the switch body.

**[MINOR]** The `checkYAMLDepth` switch statement has no default case. While this is intentional (scalar types are leaf nodes and need no recursion), a comment inside the switch or a `default: // scalar leaf node, no children` case would make it clearer to future readers that the omission is deliberate rather than an oversight. The existing comment above the closing brace partially covers this but is outside the switch body.
// previously validated. A shallower (or equal) current depth means the
// prior, deeper validation already covered any subtree depth violations.
// If the current depth exceeds the previous validation depth (e.g., an alias
// references this node deeper in the tree), we must re-traverse to ensure
// the combined effective depth doesn't exceed maxDepth.
//
// Note: using ast.Node (interface) as map key relies on pointer identity,
// which is correct because all goccy/go-yaml AST node types are pointer
Review

[MINOR] The MappingValueNode case visits both Key and Value at depth+1 relative to the MappingValueNode's own depth. Since MappingValueNode is itself visited at depth+1 from its parent MappingNode, keys end up at depth+2 from the mapping. This asymmetry between key depth and value depth means scalar keys consume an extra depth level relative to what might be intuitive, but it's consistent and the tests validate the behavior works. Worth a brief comment explaining that keys consume a depth level intentionally.

**[MINOR]** The `MappingValueNode` case visits both Key and Value at `depth+1` relative to the MappingValueNode's own depth. Since `MappingValueNode` is itself visited at `depth+1` from its parent `MappingNode`, keys end up at `depth+2` from the mapping. This asymmetry between key depth and value depth means scalar keys consume an extra depth level relative to what might be intuitive, but it's consistent and the tests validate the behavior works. Worth a brief comment explaining that keys consume a depth level intentionally.
@@ -273,6 +275,11 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
}
}
Review

[NIT] The AnchorNode case increments depth for the anchor definition. The comment explains this is intentional and asymmetric. However, this means an anchored value at the top level that nests 10 levels will consume 11 depth budget (1 for anchor + 10 for content), and when aliased at depth 5, the alias expands to depth 5+1+10=16. The comment says 'combined budget is halved' which isn't quite accurate — it's more nuanced. The comment could be clearer, but this doesn't affect correctness.

**[NIT]** The `AnchorNode` case increments depth for the anchor definition. The comment explains this is intentional and asymmetric. However, this means an anchored value at the top level that nests 10 levels will consume 11 depth budget (1 for anchor + 10 for content), and when aliased at depth 5, the alias expands to depth 5+1+10=16. The comment says 'combined budget is halved' which isn't quite accurate — it's more nuanced. The comment could be clearer, but this doesn't affect correctness.
Review

Acknowledged. The AnchorNode comment already explains the asymmetry in detail — both the definition site and the reference site each consume a level, making deeply nested anchor/alias pairs hit the limit sooner. The reviewer's observation about the budget not being exactly "halved" is correct (the comment says "reduced" not "halved"), and the existing phrasing captures the design intent accurately. No change — doesn't affect correctness and the comment already explains the reasoning.

Acknowledged. The AnchorNode comment already explains the asymmetry in detail — both the definition site and the reference site each consume a level, making deeply nested anchor/alias pairs hit the limit sooner. The reviewer's observation about the budget not being exactly "halved" is correct (the comment says "reduced" not "halved"), and the existing phrasing captures the design intent accurately. No change — doesn't affect correctness and the comment already explains the reasoning.
case *ast.MappingValueNode:
// Both Key and Value are visited at depth+1 relative to this
// MappingValueNode. Since MappingNode visits its MappingValueNode
// children at depth+1 as well, keys and values end up at depth+2
// from the parent MappingNode. This is intentional: it mirrors the
// actual nesting structure (mapping → key-value pair → key/value).
if err := checkYAMLDepth(n.Key, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
return err
}
1
+2 -2
View File
15
@@ -510,9 +510,9 @@ func TestYAMLEmptyFileRejection(t *testing.T) {
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
_, err := LoadPersona(path)
if err == nil {
t.Error("expected error for empty YAML input, got nil")
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
t.Fatal("expected error for empty YAML input, got nil")
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
}
if err != nil && !strings.Contains(err.Error(), "empty YAML document") {
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
if !strings.Contains(err.Error(), "empty YAML document") {
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
t.Errorf("expected error containing %q, got: %v", "empty YAML document", err)
}
})
1
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.