fix(deps): replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml #89

Merged
aweiker merged 13 commits from review-bot-issue-87 into main 2026-05-13 03:47:02 +00:00
3 changed files with 17 additions and 10 deletions
Showing only changes of commit 80091fb080 - Show all commits
+5
View File
3
@@ -33,6 +33,11 @@ func parsePersona(data []byte, source string) (*Persona, error) {
Review

[MINOR] The design doc references the updated approach correctly but removes the prior code example and now relies on narrative. Consider adding a brief code snippet or explicit mention that MergeKey (<<) merges are traversed to avoid confusion about edge cases.

**[MINOR]** The design doc references the updated approach correctly but removes the prior code example and now relies on narrative. Consider adding a brief code snippet or explicit mention that MergeKey (<<) merges are traversed to avoid confusion about edge cases.
### YAML Parsing with Depth Protection
> **Note:** The pseudocode below reflects the initial design using `gopkg.in/yaml.v3`
> types (`yaml.Node`). The actual implementation uses `github.com/goccy/go-yaml`
> with `ast.Node`-based traversal, dual-map cycle/depth tracking, and node-count
> limits. See `review/persona.go` for the current implementation.
```go
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
var node yaml.Node
1
+9 -7
View File
5
@@ -161,7 +161,8 @@ func parsePersona(data []byte, source string) (*Persona, error) {
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting
// and strict field checking. This protects against stack exhaustion from deeply
// nested structures and catches typos in field names.
security-review-bot marked this conversation as resolved Outdated
Outdated
Review

[NIT] Alias cycle handling uses a 'seen' map keyed by ast.Node to prevent infinite recursion, which can undercount repeated alias expansions. In theory, a document could reuse large anchored subtrees via aliases to inflate decoded structures beyond the node-count check. Practical risk is low given the 64KB file size limit and the constrained Persona schema.

**[NIT]** Alias cycle handling uses a 'seen' map keyed by ast.Node to prevent infinite recursion, which can undercount repeated alias expansions. In theory, a document could reuse large anchored subtrees via aliases to inflate decoded structures beyond the node-count check. Practical risk is low given the 64KB file size limit and the constrained Persona schema.
Outdated
Review

[MINOR] The design doc (docs/DESIGN-57-yaml-persona.md) states the library 'provides built-in depth protection via MaxYAMLDepth/MaxYAMLNodes decoder options', implying these would be used. However, the actual implementation does NOT use those decoder options — instead it implements its own AST walk via checkYAMLDepth. The code is correct and the manual walk is arguably more robust (the design doc was aspirational), but the doc comment on unmarshalYAMLWithDepthLimit says 'The github.com/goccy/go-yaml library provides built-in depth protection via MaxYAMLDepth and MaxYAMLNodes decoder options. We use these instead of a manual depth-checking walk' — which is false. The actual implementation contradicts this comment. The design doc should be updated to reflect the actual approach.

**[MINOR]** The design doc (`docs/DESIGN-57-yaml-persona.md`) states the library 'provides built-in depth protection via `MaxYAMLDepth`/`MaxYAMLNodes` decoder options', implying these would be used. However, the actual implementation does NOT use those decoder options — instead it implements its own AST walk via `checkYAMLDepth`. The code is correct and the manual walk is arguably more robust (the design doc was aspirational), but the doc comment on `unmarshalYAMLWithDepthLimit` says 'The `github.com/goccy/go-yaml` library provides built-in depth protection via `MaxYAMLDepth` and `MaxYAMLNodes` decoder options. We use these instead of a manual depth-checking walk' — which is false. The actual implementation contradicts this comment. The design doc should be updated to reflect the actual approach.
Outdated
Review

[NIT] The function comment for unmarshalYAMLWithDepthLimit still says 'Multi-document YAML files are rejected to prevent silent data loss' but the design doc says the library has built-in depth protection. The comment is still accurate for this function but could mention the two-pass design more clearly (AST parse for security checks, then decoder for struct mapping).

**[NIT]** The function comment for `unmarshalYAMLWithDepthLimit` still says 'Multi-document YAML files are rejected to prevent silent data loss' but the design doc says the library has built-in depth protection. The comment is still accurate for this function but could mention the two-pass design more clearly (AST parse for security checks, then decoder for struct mapping).
Outdated
Review

[NIT] The function comment for unmarshalYAMLWithDepthLimit says 'Multi-document YAML files are rejected to prevent silent data loss' but the primary motivation documented elsewhere is 'confusing behavior where users think their changes take effect.' The comment in the code body is more accurate — consider aligning the function-level doc comment.

**[NIT]** The function comment for `unmarshalYAMLWithDepthLimit` says 'Multi-document YAML files are rejected to prevent silent data loss' but the primary motivation documented elsewhere is 'confusing behavior where users think their changes take effect.' The comment in the code body is more accurate — consider aligning the function-level doc comment.
Outdated
Review

[NIT] YAML is parsed twice (AST via parser.ParseBytes and then decoding via yaml.NewDecoder with Strict). This is acceptable for small persona files but note the extra overhead; fine given MaxPersonaFileSize is 64KB.

**[NIT]** YAML is parsed twice (AST via parser.ParseBytes and then decoding via yaml.NewDecoder with Strict). This is acceptable for small persona files but note the extra overhead; fine given MaxPersonaFileSize is 64KB.
// Multi-document YAML files are rejected to prevent silent data loss.
// Multi-document YAML files are rejected to prevent confusing behavior
// where additional documents are silently ignored.
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
Outdated
Review

[MINOR] Depth/node-count enforcement occurs after parser.ParseBytes builds the AST. Extremely deep but small YAML (within 64KB) could still stress the parser via recursive descent before checks run, presenting a potential DoS vector. Consider a pre-parse sanity check (e.g., shallow indentation-depth scan) or using any available parser limits if supported.

**[MINOR]** Depth/node-count enforcement occurs after parser.ParseBytes builds the AST. Extremely deep but small YAML (within 64KB) could still stress the parser via recursive descent before checks run, presenting a potential DoS vector. Consider a pre-parse sanity check (e.g., shallow indentation-depth scan) or using any available parser limits if supported.
// First pass: parse into AST to check depth limits, node counts, and
Outdated
Review

[MINOR] Empty YAML detection now returns a generic "empty YAML document" error; consider adding a unit test for empty YAML files to ensure this path is covered and the message is stable.

**[MINOR]** Empty YAML detection now returns a generic "empty YAML document" error; consider adding a unit test for empty YAML files to ensure this path is covered and the message is stable.
// multi-document rejection. This prevents stack exhaustion before we
Outdated
Review

[NIT] If go-yaml provides decoder/parser options to enforce depth/node limits during parsing, consider enabling them in addition to the manual AST check to ensure protection prior to AST traversal. This would be a defense-in-depth improvement.

**[NIT]** If go-yaml provides decoder/parser options to enforce depth/node limits during parsing, consider enabling them in addition to the manual AST check to ensure protection prior to AST traversal. This would be a defense-in-depth improvement.
17
@@ -214,12 +215,6 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
}
// Track total nodes visited as defense-in-depth against wide-but-shallow attacks.
*nodeCount++
if *nodeCount > maxNodes {
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
}
// Cycle detection: if we're currently visiting this node on the current
Outdated
Review

[MINOR] The checkYAMLDepth function uses ast.Node as a map key for cycle detection. ast.Node is an interface — map keys for interface types use the dynamic type and value for equality. For pointer types (which all the concrete ast node types appear to be), this works correctly (pointer identity). This is fine but worth a comment explaining that pointer identity is the intended comparison, not structural equality, to prevent future maintainers from being surprised.

**[MINOR]** The `checkYAMLDepth` function uses `ast.Node` as a map key for cycle detection. `ast.Node` is an interface — map keys for interface types use the dynamic type and value for equality. For pointer types (which all the concrete ast node types appear to be), this works correctly (pointer identity). This is fine but worth a comment explaining that pointer identity is the intended comparison, not structural equality, to prevent future maintainers from being surprised.
// recursion path, it's a cycle (e.g., alias pointing to an ancestor).
Outdated
Review

[MAJOR] Depth check can be bypassed due to global 'seen' set: if an anchored/aliased subtree is first validated at a shallow depth and later referenced via an alias deeper in the structure, the early return on previously seen node skips re-checking at the deeper context. This can allow effective nesting depth to exceed MaxYAMLDepth, potentially leading to stack exhaustion/DoS during decode.

**[MAJOR]** Depth check can be bypassed due to global 'seen' set: if an anchored/aliased subtree is first validated at a shallow depth and later referenced via an alias deeper in the structure, the early return on previously seen node skips re-checking at the deeper context. This can allow effective nesting depth to exceed MaxYAMLDepth, potentially leading to stack exhaustion/DoS during decode.
// Return nil to break the cycle without error — cycles are a structural
Outdated
Review

[MINOR] The nodeCount increment happens before the visiting and validated checks. This means nodes encountered in cycles (where visiting[node] is true and we return early) are still counted against the total, potentially causing false-positive node count limit errors for valid YAML with shared anchors referenced multiple times. The increment should ideally only count genuinely new work, or the behavior should be explicitly documented.

**[MINOR]** The `nodeCount` increment happens before the `visiting` and `validated` checks. This means nodes encountered in cycles (where `visiting[node]` is true and we return early) are still counted against the total, potentially causing false-positive node count limit errors for valid YAML with shared anchors referenced multiple times. The increment should ideally only count genuinely new work, or the behavior should be explicitly documented.
Outdated
Review

[MINOR] The nodeCount is incremented before the visiting cycle detection check. For cyclic structures, this means a node that triggers cycle detection (returns early) still increments nodeCount. This slightly over-counts nodes in cyclic test scenarios, but since real YAML parsed by parser.ParseBytes cannot have true reference cycles in the AST (only alias references that point forward), this is functionally harmless in production. However, in the unit tests that construct artificial cycles, the count will be inflated.

**[MINOR]** The `nodeCount` is incremented before the `visiting` cycle detection check. For cyclic structures, this means a node that triggers cycle detection (returns early) still increments `nodeCount`. This slightly over-counts nodes in cyclic test scenarios, but since real YAML parsed by `parser.ParseBytes` cannot have true reference cycles in the AST (only alias references that point forward), this is functionally harmless in production. However, in the unit tests that construct artificial cycles, the count will be inflated.
@@ -228,6 +223,13 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
return nil
}
Review

[MINOR] The validated map stores depth as the value but is described as storing 'the maximum depth at which it was previously checked.' The current code stores the current visit depth, not the maximum. If a node is first visited at depth 10 and later at depth 5, validated[node] becomes 5 — the shorter path. The short-circuit condition depth <= prevDepth then allows re-traversal at depth 6 even though we already checked at depth 10. In practice this isn't a security issue (the alias depth bypass test covers the real attack vector), but the comment and variable name ('validated' implying completeness) are slightly misleading. Consider renaming to visitedAtDepth and updating the comment to say 'the last depth at which this node was validated' rather than 'maximum'.

**[MINOR]** The `validated` map stores `depth` as the value but is described as storing 'the maximum depth at which it was previously checked.' The current code stores the *current* visit depth, not the maximum. If a node is first visited at depth 10 and later at depth 5, `validated[node]` becomes 5 — the shorter path. The short-circuit condition `depth <= prevDepth` then allows re-traversal at depth 6 even though we already checked at depth 10. In practice this isn't a security issue (the alias depth bypass test covers the real attack vector), but the comment and variable name ('validated' implying completeness) are slightly misleading. Consider renaming to `visitedAtDepth` and updating the comment to say 'the last depth at which this node was validated' rather than 'maximum'.
Review

[MAJOR] checkYAMLDepth does not handle ast.MergeKeyNode and treats it as a leaf (per the default case comment). YAML merge keys (<<) can contain aliases to mappings; not traversing MergeKeyNode children can bypass the depth enforcement when deep structures are merged, undermining the DoS protections. Add an explicit case to traverse MergeKeyNode's referenced values (typically aliases) and continue depth/node counting.

**[MAJOR]** checkYAMLDepth does not handle ast.MergeKeyNode and treats it as a leaf (per the default case comment). YAML merge keys (<<) can contain aliases to mappings; not traversing MergeKeyNode children can bypass the depth enforcement when deep structures are merged, undermining the DoS protections. Add an explicit case to traverse MergeKeyNode's referenced values (typically aliases) and continue depth/node counting.
// Track total nodes visited as defense-in-depth against wide-but-shallow attacks.
// Placed after cycle detection to avoid over-counting cyclic references.
*nodeCount++
if *nodeCount > maxNodes {
Outdated
Review

[MINOR] Cycle detection in checkYAMLDepth returns nil without error. If the downstream decoder ever mishandles alias cycles, this could allow potentially problematic inputs to proceed, risking DoS. Consider failing closed by returning an explicit error on detected cycles.

**[MINOR]** Cycle detection in checkYAMLDepth returns nil without error. If the downstream decoder ever mishandles alias cycles, this could allow potentially problematic inputs to proceed, risking DoS. Consider failing closed by returning an explicit error on detected cycles.
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
}
// Depth-aware short-circuit: only skip re-checking a node if we previously
// validated it at the same or deeper effective depth. If this visit is at a
Review

[MINOR] The validated map stores the depth at which a node was last validated, used for depth-aware short-circuiting. However, the map is keyed by ast.Node (interface type). Interface map keys use pointer identity for concrete pointer types, which is correct here. But the comment says "maps each node to the minimum depth" — the map is actually storing the current visit's depth (overwriting each time the node is visited at a greater depth), not the minimum. The comment should say "most recently validated depth" or "maximum depth at which validated" to avoid confusion.

**[MINOR]** The `validated` map stores the depth at which a node was last validated, used for depth-aware short-circuiting. However, the map is keyed by `ast.Node` (interface type). Interface map keys use pointer identity for concrete pointer types, which is correct here. But the comment says "maps each node to the minimum depth" — the map is actually storing the *current visit's* depth (overwriting each time the node is visited at a greater depth), not the minimum. The comment should say "most recently validated depth" or "maximum depth at which validated" to avoid confusion.
// greater depth than before (e.g., alias referenced deeper in the tree),
10
+3 -3
View File
13
@@ -491,9 +491,9 @@ func TestYAMLEmptyFileRejection(t *testing.T) {
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
name string
content string
Review

[MINOR] The TestYAMLEmptyFileRejection test uses tc.name (e.g., "completely empty") as the filename fragment but writes all files to the same dir. The test case names contain spaces which are valid in filenames but could cause issues on some filesystems. More importantly, the test writes files with names like completely empty.yaml — using spaces in filenames is unusual and could cause subtle test issues. Prefer replacing spaces with underscores or hyphens.

**[MINOR]** The `TestYAMLEmptyFileRejection` test uses `tc.name` (e.g., `"completely empty"`) as the filename fragment but writes all files to the same `dir`. The test case names contain spaces which are valid in filenames but could cause issues on some filesystems. More importantly, the test writes files with names like `completely empty.yaml` — using spaces in filenames is unusual and could cause subtle test issues. Prefer replacing spaces with underscores or hyphens.
Review

[NIT] TestYAMLEmptyFileRejection has a redundant if err != nil guard on the second condition: if err != nil && !strings.Contains(...). Since the preceding if err == nil { t.Error(...); return } (implicit — actually the test doesn't return on the first error, it falls through) — actually the first check is if err == nil { t.Error(...) } without a return, so err could still be nil here. This means the strings.Contains check is guarded correctly with err != nil, but the test won't t.Error the wrong message if err == nil (it will have already errored on the first check). The pattern is slightly unusual compared to the rest of the file which uses early returns. Consistent style would be to add a return after the first t.Error, then drop the err != nil guard.

**[NIT]** `TestYAMLEmptyFileRejection` has a redundant `if err != nil` guard on the second condition: `if err != nil && !strings.Contains(...)`. Since the preceding `if err == nil { t.Error(...); return }` (implicit — actually the test doesn't return on the first error, it falls through) — actually the first check is `if err == nil { t.Error(...) }` without a return, so `err` could still be nil here. This means the `strings.Contains` check is guarded correctly with `err != nil`, but the test won't `t.Error` the wrong message if `err == nil` (it will have already errored on the first check). The pattern is slightly unusual compared to the rest of the file which uses early returns. Consistent style would be to add a `return` after the first `t.Error`, then drop the `err != nil` guard.
}{
{"completely empty", ""},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"whitespace only", " \n\n "},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"comment only", "# just a comment\n"},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"completely_empty", ""},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"whitespace_only", " \n\n "},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
{"comment_only", "# just a comment\n"},
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
}
for _, tc := range tests {
1
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
Review

Fixed in 0b16c41: moved t.TempDir() inside each subtest for proper isolation.

Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.