fix(deps): replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml #89
@@ -166,11 +166,10 @@ func parsePersona(data []byte, source string) (*Persona, error) {
|
||||
return &p, nil
|
||||
|
|
||||
}
|
||||
|
gpt-review-bot
commented
[MINOR] Empty YAML detection now returns a generic "empty YAML document" error; consider adding a unit test for empty YAML files to ensure this path is covered and the message is stable. **[MINOR]** Empty YAML detection now returns a generic "empty YAML document" error; consider adding a unit test for empty YAML files to ensure this path is covered and the message is stable.
|
||||
|
||||
|
gpt-review-bot
commented
[NIT] If go-yaml provides decoder/parser options to enforce depth/node limits during parsing, consider enabling them in addition to the manual AST check to ensure protection prior to AST traversal. This would be a defense-in-depth improvement. **[NIT]** If go-yaml provides decoder/parser options to enforce depth/node limits during parsing, consider enabling them in addition to the manual AST check to ensure protection prior to AST traversal. This would be a defense-in-depth improvement.
|
||||
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting
|
||||
// and strict field checking. This protects against stack exhaustion from deeply
|
||||
// nested structures and catches typos in field names.
|
||||
// Multi-document YAML files are rejected to prevent confusing behavior
|
||||
// where additional documents are silently ignored.
|
||||
// unmarshalYAMLWithDepthLimit unmarshals YAML data with three safety checks:
|
||||
// - Depth limiting: rejects AST trees exceeding maxDepth to prevent stack exhaustion.
|
||||
// - Multi-document rejection: prevents silent data loss from ignored extra documents.
|
||||
// - Strict field checking: rejects unknown YAML keys to catch typos early.
|
||||
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
|
||||
// First pass: parse into AST to check depth limits, node counts, and
|
||||
// multi-document rejection. This prevents stack exhaustion before we
|
||||
@@ -247,10 +246,13 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
|
||||
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
|
||||
|
[MAJOR] Alias handling combined with the global 'seen' short-circuit lets deeply nested aliases evade depth checks. The code follows alias nodes (good), but because the target node may already be in the 'seen' set, subsequent deeper traversals are skipped, allowing overall nesting to exceed limits and enabling DoS. **[MAJOR]** Alias handling combined with the global 'seen' short-circuit lets deeply nested aliases evade depth checks. The code follows alias nodes (good), but because the target node may already be in the 'seen' set, subsequent deeper traversals are skipped, allowing overall nesting to exceed limits and enabling DoS.
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `validated` map uses `ast.Node` (interface) as a key, which relies on pointer equality for interface comparison. This works correctly for pointer types like `*ast.MappingNode`, but if any `ast.Node` implementation is a value type (non-pointer), two structurally identical nodes could hash to different entries or the same entry incorrectly. In practice, all goccy/go-yaml AST node types appear to be pointer types, so this is low risk, but it's an implicit assumption worth a comment.
rodin
commented
Already addressed. There's an explicit comment at the
This documents the assumption directly at the usage site. No change needed. Already addressed. There's an explicit comment at the `validated` map access (the depth-aware short-circuit block):
> Note: using ast.Node (interface) as map key relies on pointer identity, which is correct because all goccy/go-yaml AST node types are pointer receivers (*MappingNode, *SequenceNode, etc.), never value types.
This documents the assumption directly at the usage site. No change needed.
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `validated[node] = depth` assignment stores the current depth as the 'minimum depth at which it was validated'. However the comment says 'maps each node to the minimum depth at which it was previously checked' and the short-circuit condition is `depth <= prevDepth` (skip if current depth is shallower or equal). This means `validated` stores the *first* depth seen (or the deepest so far), not the minimum. On the first visit `validated[node]` is set to `depth`, and subsequent visits only proceed if `depth > prevDepth`. So the map actually stores the *shallowest* depth visited. The comment is slightly confusing — 'minimum depth' is correct, but the explanation of re-checking logic could be clearer.
sonnet-review-bot
commented
[NIT] The **[NIT]** The `checkYAMLDepth` function receives `maxNodes int` as a parameter but `MaxYAMLNodes` is passed as the constant from the call site. The `maxNodes` parameter is never varied between call sites — it's always `MaxYAMLNodes`. Could simplify by using the constant directly in the function, but having it as a parameter makes testing easier (allows injecting lower limits in tests). Current approach is acceptable.
|
||||
|
||||
// Depth-aware short-circuit: only skip re-checking a node if we previously
|
||||
// validated it at the same or deeper effective depth. If this visit is at a
|
||||
// greater depth than before (e.g., alias referenced deeper in the tree),
|
||||
// we must re-traverse to catch depth limit violations.
|
||||
// Depth-aware short-circuit: skip re-validation only when the current visit
|
||||
// depth is the same or shallower than the depth at which this node was
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `checkYAMLDepth` switch statement has no default case. While this is intentional (scalar types are leaf nodes and need no recursion), a comment inside the switch or a `default: // scalar leaf node, no children` case would make it clearer to future readers that the omission is deliberate rather than an oversight. The existing comment above the closing brace partially covers this but is outside the switch body.
|
||||
// previously validated. A shallower (or equal) current depth means the
|
||||
// prior, deeper validation already covered any subtree depth violations.
|
||||
// If the current depth exceeds the previous validation depth (e.g., an alias
|
||||
// references this node deeper in the tree), we must re-traverse to ensure
|
||||
// the combined effective depth doesn't exceed maxDepth.
|
||||
//
|
||||
// Note: using ast.Node (interface) as map key relies on pointer identity,
|
||||
// which is correct because all goccy/go-yaml AST node types are pointer
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `MappingValueNode` case visits both Key and Value at `depth+1` relative to the MappingValueNode's own depth. Since `MappingValueNode` is itself visited at `depth+1` from its parent `MappingNode`, keys end up at `depth+2` from the mapping. This asymmetry between key depth and value depth means scalar keys consume an extra depth level relative to what might be intuitive, but it's consistent and the tests validate the behavior works. Worth a brief comment explaining that keys consume a depth level intentionally.
|
||||
@@ -273,6 +275,11 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[
|
||||
}
|
||||
}
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `AnchorNode` case increments depth for the anchor definition. The comment explains this is intentional and asymmetric. However, this means an anchored value at the top level that nests 10 levels will consume 11 depth budget (1 for anchor + 10 for content), and when aliased at depth 5, the alias expands to depth 5+1+10=16. The comment says 'combined budget is halved' which isn't quite accurate — it's more nuanced. The comment could be clearer, but this doesn't affect correctness.
rodin
commented
Acknowledged. The AnchorNode comment already explains the asymmetry in detail — both the definition site and the reference site each consume a level, making deeply nested anchor/alias pairs hit the limit sooner. The reviewer's observation about the budget not being exactly "halved" is correct (the comment says "reduced" not "halved"), and the existing phrasing captures the design intent accurately. No change — doesn't affect correctness and the comment already explains the reasoning. Acknowledged. The AnchorNode comment already explains the asymmetry in detail — both the definition site and the reference site each consume a level, making deeply nested anchor/alias pairs hit the limit sooner. The reviewer's observation about the budget not being exactly "halved" is correct (the comment says "reduced" not "halved"), and the existing phrasing captures the design intent accurately. No change — doesn't affect correctness and the comment already explains the reasoning.
|
||||
case *ast.MappingValueNode:
|
||||
// Both Key and Value are visited at depth+1 relative to this
|
||||
// MappingValueNode. Since MappingNode visits its MappingValueNode
|
||||
// children at depth+1 as well, keys and values end up at depth+2
|
||||
// from the parent MappingNode. This is intentional: it mirrors the
|
||||
// actual nesting structure (mapping → key-value pair → key/value).
|
||||
if err := checkYAMLDepth(n.Key, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -510,9 +510,9 @@ func TestYAMLEmptyFileRejection(t *testing.T) {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
_, err := LoadPersona(path)
|
||||
if err == nil {
|
||||
t.Error("expected error for empty YAML input, got nil")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Fatal("expected error for empty YAML input, got nil")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
if err != nil && !strings.Contains(err.Error(), "empty YAML document") {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if !strings.Contains(err.Error(), "empty YAML document") {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Errorf("expected error containing %q, got: %v", "empty YAML document", err)
|
||||
}
|
||||
})
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
[MINOR] Depth/node-count enforcement occurs after parser.ParseBytes builds the AST. Extremely deep but small YAML (within 64KB) could still stress the parser via recursive descent before checks run, presenting a potential DoS vector. Consider a pre-parse sanity check (e.g., shallow indentation-depth scan) or using any available parser limits if supported.