fix(deps): replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml #89
@@ -9,7 +9,7 @@ JSON is awkward for persona files that contain multi-line text (identity, severi
|
||||
- Backwards compatibility: existing JSON personas must continue to work
|
||||
- Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
|
||||
- Consistency: use `.yaml` extension (not `.yml`)
|
||||
- Library: use `github.com/goccy/go-yaml` v1.16.0+ (approved in CONVENTIONS.md); has built-in depth protection via `MaxYAMLDepth`/`MaxYAMLNodes` constants
|
||||
- Library: use `github.com/goccy/go-yaml` v1.16.0+ (approved in CONVENTIONS.md); we implement custom AST-based depth/node-count checks for precise alias-aware validation
|
||||
|
|
||||
|
||||
## Proposed Approach
|
||||
|
gpt-review-bot
commented
[MINOR] Documentation states the library’s built-in depth protection (MaxYAMLDepth/MaxYAMLNodes decoder options) is used instead of manual checks, but the code still performs an AST-based depth/node count walk before decoding. Update the design doc to reflect the current implementation or use the library-provided options if intended. **[MINOR]** Documentation states the library’s built-in depth protection (MaxYAMLDepth/MaxYAMLNodes decoder options) is used instead of manual checks, but the code still performs an AST-based depth/node count walk before decoding. Update the design doc to reflect the current implementation or use the library-provided options if intended.
|
||||
|
||||
@@ -63,7 +63,7 @@ func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
|
||||
}
|
||||
```
|
||||
|
||||
The `github.com/goccy/go-yaml` library provides built-in depth protection via `MaxYAMLDepth` and `MaxYAMLNodes` decoder options. We use these instead of a manual depth-checking walk.
|
||||
We implement a custom AST-based depth/node-count walk (`checkYAMLDepth`) rather than relying on library decoder options. This gives us precise control over how depth is counted across aliases and anchors, with a depth-aware validated map to prevent alias depth bypass.
|
||||
|
sonnet-review-bot
commented
[NIT] The design doc still contains the old **[NIT]** The design doc still contains the old `gopkg.in/yaml.v3`-based code sample for `unmarshalYAMLWithDepthLimit` and `checkYAMLDepth` (using `yaml.Node` and `yaml.AliasNode`) from the original implementation. These code snippets now describe a design that was NOT implemented — the actual implementation uses the `goccy/go-yaml` AST approach. The design doc should be updated to reflect the actual implementation, or the code samples removed/replaced.
|
||||
|
||||
## State/Data Model
|
||||
|
||||
|
||||
@@ -184,7 +184,7 @@ func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
|
||||
}
|
||||
|
||||
|
sonnet-review-bot
commented
[MINOR] The empty document check ( **[MINOR]** The empty document check (`len(file.Docs) == 0`) comes AFTER the multi-document check (`len(file.Docs) > 1`). While not a bug (both conditions are checked), it would be slightly more natural to check for empty first. More importantly: the previous `gopkg.in/yaml.v3` implementation using `dec.Decode(&node)` would return an error on truly empty input, while `parser.ParseBytes` may return an empty `Docs` slice for whitespace-only or empty input. The new explicit empty-doc check is an improvement, but the error message 'empty YAML document' differs from what the old library would have returned — worth noting if callers match on error text.
|
||||
nodeCount := 0
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `go 1.26.2` in go.mod is a pre-release/future version (Go 1.26 has not been released). This may cause `go mod tidy` to behave unexpectedly on stable Go toolchains and suggests the module was initialized with an unstable toolchain. This is pre-existing and not introduced by this PR, but worth flagging.
|
||||
if err := checkYAMLDepth(file.Docs[0].Body, 0, maxDepth, MaxYAMLNodes, make(map[ast.Node]struct{}), &nodeCount); err != nil {
|
||||
if err := checkYAMLDepth(file.Docs[0].Body, 0, maxDepth, MaxYAMLNodes, make(map[ast.Node]int), make(map[ast.Node]bool), &nodeCount); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -195,9 +195,17 @@ func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
|
||||
}
|
||||
|
sonnet-review-bot
commented
[NIT] The comment on **[NIT]** The comment on `unmarshalYAMLWithDepthLimit` says 'Multi-document YAML files are rejected to prevent confusing behavior where additional documents are silently ignored.' but the function header also mentions 'strict field checking'. The doc comment could be tightened to mention all three concerns (depth, multi-doc, strict fields) in the opening summary rather than leaving strict-field checking implicit.
|
||||
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `checkYAMLDepth` function is exported via the test file (`TestYAMLAliasCycleDetection` calls it directly). Since `checkYAMLDepth` is unexported (lowercase), the test is in the same package (`package review`), which is fine. No issue here — just noting this is a white-box test of an internal function, which is appropriate given the security-critical nature of the cycle detection.
gpt-review-bot
commented
[MINOR] Depth counting walks both MappingNode (depth+1) and then MappingValueNode (depth+1 for Key and Value), effectively increasing depth by 2 per mapping level. This is stricter than the test comment's 'incrementing depth by 1 per level' and may unnecessarily reject shallower YAML. Consider clarifying the intended depth semantics or adjusting the increment so each structural level accounts for a single depth step. **[MINOR]** Depth counting walks both MappingNode (depth+1) and then MappingValueNode (depth+1 for Key and Value), effectively increasing depth by 2 per mapping level. This is stricter than the test comment's 'incrementing depth by 1 per level' and may unnecessarily reject shallower YAML. Consider clarifying the intended depth semantics or adjusting the increment so each structural level accounts for a single depth step.
|
||||
// checkYAMLDepth recursively checks that YAML AST nodes don't exceed the depth
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `checkYAMLDepth` function increments `*nodeCount` before the depth-aware short-circuit check (`validated` map lookup). This means a node revisited at a shallower depth (which immediately returns via the short-circuit) still increments the counter. The comment acknowledges this as 'intentional conservative overcounting', which is a reasonable security posture. However, it also means the node count can be inflated by the number of times aliases reference the same shallow subtree, potentially causing false positives for legitimate YAML with many alias references to the same anchor. This is a trade-off that is documented and defensible, but worth noting.
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `unmarshalYAMLWithDepthLimit` function does a two-pass approach: first `parser.ParseBytes` then `yaml.NewDecoder(...).Decode(out)`. This means the raw bytes are parsed twice. The second decode with `yaml.Strict()` also re-parses from a `bytes.NewReader(data)`. For the file size constraint (64KB max), this is acceptable, but worth noting. A minor concern is that the two passes could theoretically behave differently if the library has any parse-then-decode divergence. However, this is standard practice for AST-pre-validation patterns and is not a bug.
|
||||
// limit or the total node count limit. It also detects alias cycles to prevent
|
||||
// infinite recursion from crafted YAML with self-referential aliases.
|
||||
func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, seen map[ast.Node]struct{}, nodeCount *int) error {
|
||||
// limit or the total node count limit. It uses two tracking maps:
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `go 1.26.2` in go.mod references a Go version that does not exist as a stable release (current stable is 1.23/1.24 range). This is likely a typo or pre-release version. While not introduced by this PR, it's worth noting as it could cause toolchain issues.
|
||||
// - validated: maps each node to the minimum depth at which it was previously
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `unmarshalYAMLWithDepthLimit` function takes `maxDepth int` as a parameter but always calls `checkYAMLDepth` with the package-level constant `MaxYAMLNodes` directly rather than accepting it as a parameter. This asymmetry is mildly inconsistent — either both limits should be constants used directly, or both should be parameters. Low impact since callers always pass `MaxYAMLDepth` anyway.
|
||||
// checked. If a node is revisited at a deeper depth (e.g., via an alias),
|
||||
|
gpt-review-bot
commented
[MINOR] checkYAMLDepth breaks alias cycles by returning nil (skipping the cyclic subtree). Consider returning a specific error on detected cycles to fail-fast instead of relying on downstream decoder behavior, improving safety and transparency for malicious inputs. **[MINOR]** checkYAMLDepth breaks alias cycles by returning nil (skipping the cyclic subtree). Consider returning a specific error on detected cycles to fail-fast instead of relying on downstream decoder behavior, improving safety and transparency for malicious inputs.
|
||||
// we re-check it to ensure the combined effective depth doesn't exceed limits.
|
||||
// - visiting: per-path recursion stack for true cycle detection. A node on the
|
||||
// current path is a cycle (alias loop); we return nil to avoid infinite recursion.
|
||||
//
|
||||
// This design prevents the alias depth bypass where an anchored subtree validated
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `unmarshalYAMLWithDepthLimit` function performs a two-pass decode (AST parse then full decode). The second pass with `yaml.Strict()` will also parse the YAML from scratch, so the file bytes are parsed twice. This is intentional per the comment but worth noting: if the goccy/go-yaml `Strict()` decoder also does alias resolution internally, the depth protection from the first pass only guards the structural AST walk, not the decoder's internal expansion. This appears acceptable given the design, but should be validated that `yaml.Strict()` doesn't recurse unboundedly on crafted alias chains during decode.
rodin
commented
Already addressed. The comment block immediately above the
The two-pass design is intentional: pass 1 validates structure/depth on the AST (where we have full control), pass 2 uses Already addressed. The comment block immediately above the `dec := yaml.NewDecoder(...)` call explicitly documents this:
> Safety note: goccy/go-yaml's decoder does not expand YAML aliases recursively — it resolves them via the pre-built AST, which our first pass already depth-checked. Alias chains that would exceed depth limits are caught above; the decoder merely reads the resolved scalar values.
The two-pass design is intentional: pass 1 validates structure/depth on the AST (where we have full control), pass 2 uses `Strict()` for field validation (which doesn't re-expand aliases recursively). No change needed.
|
||||
// at a shallow depth could be referenced via alias at a greater depth, effectively
|
||||
|
gpt-review-bot
commented
[MINOR] Alias cycles are treated as non-errors (cycle detection returns nil). While this prevents recursion issues, consider whether rejecting explicit alias cycles with an error would be a safer fail-fast behavior rather than relying on the decoder to handle them. **[MINOR]** Alias cycles are treated as non-errors (cycle detection returns nil). While this prevents recursion issues, consider whether rejecting explicit alias cycles with an error would be a safer fail-fast behavior rather than relying on the decoder to handle them.
|
||||
// exceeding MaxYAMLDepth.
|
||||
func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[ast.Node]int, visiting map[ast.Node]bool, nodeCount *int) error {
|
||||
if node == nil {
|
||||
return nil
|
||||
|
gpt-review-bot
commented
[NIT] ListBuiltinPersonas treats .json and .yml as valid extensions although built-ins are embedded as YAML only; this is harmless but could be simplified if built-ins will never include those formats. **[NIT]** ListBuiltinPersonas treats .json and .yml as valid extensions although built-ins are embedded as YAML only; this is harmless but could be simplified if built-ins will never include those formats.
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `validated` map stores the depth at which a node was last validated, but `depth <= prevDepth` as the short-circuit condition means a node visited at depth 3 that was previously validated at depth 5 would be skipped — which is correct (deeper previous validation is more conservative). However the comment says 'validated it at the same or deeper effective depth' which is slightly ambiguous. The logic is correct but 'same or shallower current depth compared to the previous validation depth' would be clearer.
|
||||
@@ -212,48 +220,60 @@ func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, seen map[ast.N
|
||||
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `nodeCount` increment happens before the `visiting` and `validated` checks. This means nodes encountered in cycles (where `visiting[node]` is true and we return early) are still counted against the total, potentially causing false-positive node count limit errors for valid YAML with shared anchors referenced multiple times. The increment should ideally only count genuinely new work, or the behavior should be explicitly documented.
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `nodeCount` is incremented before the `visiting` cycle detection check. For cyclic structures, this means a node that triggers cycle detection (returns early) still increments `nodeCount`. This slightly over-counts nodes in cyclic test scenarios, but since real YAML parsed by `parser.ParseBytes` cannot have true reference cycles in the AST (only alias references that point forward), this is functionally harmless in production. However, in the unit tests that construct artificial cycles, the count will be inflated.
|
||||
}
|
||||
|
||||
// Cycle detection: uses pointer identity (ast.Node is an interface, but all
|
||||
// concrete node types are pointers) to detect revisits. This intentionally
|
||||
// compares pointer identity, not structural equality, since we want to track
|
||||
// specific node instances in the parsed AST graph.
|
||||
if _, ok := seen[node]; ok {
|
||||
return nil // Already validated this subtree, skip to avoid infinite recursion.
|
||||
// Cycle detection: if we're currently visiting this node on the current
|
||||
// recursion path, it's a cycle (e.g., alias pointing to an ancestor).
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `validated` map stores `depth` as the value but is described as storing 'the maximum depth at which it was previously checked.' The current code stores the *current* visit depth, not the maximum. If a node is first visited at depth 10 and later at depth 5, `validated[node]` becomes 5 — the shorter path. The short-circuit condition `depth <= prevDepth` then allows re-traversal at depth 6 even though we already checked at depth 10. In practice this isn't a security issue (the alias depth bypass test covers the real attack vector), but the comment and variable name ('validated' implying completeness) are slightly misleading. Consider renaming to `visitedAtDepth` and updating the comment to say 'the last depth at which this node was validated' rather than 'maximum'.
|
||||
// Return nil to break the cycle without error — cycles are a structural
|
||||
|
gpt-review-bot
commented
[MAJOR] checkYAMLDepth does not handle ast.MergeKeyNode and treats it as a leaf (per the default case comment). YAML merge keys (<<) can contain aliases to mappings; not traversing MergeKeyNode children can bypass the depth enforcement when deep structures are merged, undermining the DoS protections. Add an explicit case to traverse MergeKeyNode's referenced values (typically aliases) and continue depth/node counting. **[MAJOR]** checkYAMLDepth does not handle ast.MergeKeyNode and treats it as a leaf (per the default case comment). YAML merge keys (<<) can contain aliases to mappings; not traversing MergeKeyNode children can bypass the depth enforcement when deep structures are merged, undermining the DoS protections. Add an explicit case to traverse MergeKeyNode's referenced values (typically aliases) and continue depth/node counting.
|
||||
// property, not a depth violation.
|
||||
if visiting[node] {
|
||||
return nil
|
||||
}
|
||||
|
[MINOR] Cycle detection in checkYAMLDepth returns nil without error. If the downstream decoder ever mishandles alias cycles, this could allow potentially problematic inputs to proceed, risking DoS. Consider failing closed by returning an explicit error on detected cycles. **[MINOR]** Cycle detection in checkYAMLDepth returns nil without error. If the downstream decoder ever mishandles alias cycles, this could allow potentially problematic inputs to proceed, risking DoS. Consider failing closed by returning an explicit error on detected cycles.
|
||||
seen[node] = struct{}{}
|
||||
|
||||
// Depth-aware short-circuit: only skip re-checking a node if we previously
|
||||
// validated it at the same or deeper effective depth. If this visit is at a
|
||||
// greater depth than before (e.g., alias referenced deeper in the tree),
|
||||
// we must re-traverse to catch depth limit violations.
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `validated` map stores the depth at which a node was last validated, used for depth-aware short-circuiting. However, the map is keyed by `ast.Node` (interface type). Interface map keys use pointer identity for concrete pointer types, which is correct here. But the comment says "maps each node to the minimum depth" — the map is actually storing the *current visit's* depth (overwriting each time the node is visited at a greater depth), not the minimum. The comment should say "most recently validated depth" or "maximum depth at which validated" to avoid confusion.
|
||||
if prevDepth, ok := validated[node]; ok && depth <= prevDepth {
|
||||
return nil
|
||||
}
|
||||
validated[node] = depth
|
||||
|
||||
// Mark as visiting (on the current recursion path) for cycle detection.
|
||||
visiting[node] = true
|
||||
defer func() { visiting[node] = false }()
|
||||
|
||||
// Walk children based on node type.
|
||||
switch n := node.(type) {
|
||||
case *ast.MappingNode:
|
||||
|
[MAJOR] Alias handling combined with the global 'seen' short-circuit lets deeply nested aliases evade depth checks. The code follows alias nodes (good), but because the target node may already be in the 'seen' set, subsequent deeper traversals are skipped, allowing overall nesting to exceed limits and enabling DoS. **[MAJOR]** Alias handling combined with the global 'seen' short-circuit lets deeply nested aliases evade depth checks. The code follows alias nodes (good), but because the target node may already be in the 'seen' set, subsequent deeper traversals are skipped, allowing overall nesting to exceed limits and enabling DoS.
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `validated` map uses `ast.Node` (interface) as a key, which relies on pointer equality for interface comparison. This works correctly for pointer types like `*ast.MappingNode`, but if any `ast.Node` implementation is a value type (non-pointer), two structurally identical nodes could hash to different entries or the same entry incorrectly. In practice, all goccy/go-yaml AST node types appear to be pointer types, so this is low risk, but it's an implicit assumption worth a comment.
rodin
commented
Already addressed. There's an explicit comment at the
This documents the assumption directly at the usage site. No change needed. Already addressed. There's an explicit comment at the `validated` map access (the depth-aware short-circuit block):
> Note: using ast.Node (interface) as map key relies on pointer identity, which is correct because all goccy/go-yaml AST node types are pointer receivers (*MappingNode, *SequenceNode, etc.), never value types.
This documents the assumption directly at the usage site. No change needed.
|
||||
for _, value := range n.Values {
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `validated[node] = depth` assignment stores the current depth as the 'minimum depth at which it was validated'. However the comment says 'maps each node to the minimum depth at which it was previously checked' and the short-circuit condition is `depth <= prevDepth` (skip if current depth is shallower or equal). This means `validated` stores the *first* depth seen (or the deepest so far), not the minimum. On the first visit `validated[node]` is set to `depth`, and subsequent visits only proceed if `depth > prevDepth`. So the map actually stores the *shallowest* depth visited. The comment is slightly confusing — 'minimum depth' is correct, but the explanation of re-checking logic could be clearer.
sonnet-review-bot
commented
[NIT] The **[NIT]** The `checkYAMLDepth` function receives `maxNodes int` as a parameter but `MaxYAMLNodes` is passed as the constant from the call site. The `maxNodes` parameter is never varied between call sites — it's always `MaxYAMLNodes`. Could simplify by using the constant directly in the function, but having it as a parameter makes testing easier (allows injecting lower limits in tests). Current approach is acceptable.
|
||||
if err := checkYAMLDepth(value, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
|
||||
if err := checkYAMLDepth(value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
|
||||
return err
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `checkYAMLDepth` switch statement has no default case. While this is intentional (scalar types are leaf nodes and need no recursion), a comment inside the switch or a `default: // scalar leaf node, no children` case would make it clearer to future readers that the omission is deliberate rather than an oversight. The existing comment above the closing brace partially covers this but is outside the switch body.
|
||||
}
|
||||
case *ast.MappingValueNode:
|
||||
if err := checkYAMLDepth(n.Key, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
|
||||
if err := checkYAMLDepth(n.Key, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
|
||||
return err
|
||||
}
|
||||
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
|
||||
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
|
||||
return err
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `MappingValueNode` case visits both Key and Value at `depth+1` relative to the MappingValueNode's own depth. Since `MappingValueNode` is itself visited at `depth+1` from its parent `MappingNode`, keys end up at `depth+2` from the mapping. This asymmetry between key depth and value depth means scalar keys consume an extra depth level relative to what might be intuitive, but it's consistent and the tests validate the behavior works. Worth a brief comment explaining that keys consume a depth level intentionally.
|
||||
case *ast.SequenceNode:
|
||||
for _, value := range n.Values {
|
||||
if err := checkYAMLDepth(value, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
|
||||
if err := checkYAMLDepth(value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
case *ast.AliasNode:
|
||||
// Follow alias to its target, incrementing depth since aliases expand
|
||||
// the effective structure.
|
||||
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
|
||||
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
|
||||
return err
|
||||
}
|
||||
case *ast.AnchorNode:
|
||||
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
|
||||
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
|
||||
return err
|
||||
}
|
||||
case *ast.TagNode:
|
||||
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
|
||||
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `AnchorNode` case increments depth for the anchor definition. The comment explains this is intentional and asymmetric. However, this means an anchored value at the top level that nests 10 levels will consume 11 depth budget (1 for anchor + 10 for content), and when aliased at depth 5, the alias expands to depth 5+1+10=16. The comment says 'combined budget is halved' which isn't quite accurate — it's more nuanced. The comment could be clearer, but this doesn't affect correctness.
rodin
commented
Acknowledged. The AnchorNode comment already explains the asymmetry in detail — both the definition site and the reference site each consume a level, making deeply nested anchor/alias pairs hit the limit sooner. The reviewer's observation about the budget not being exactly "halved" is correct (the comment says "reduced" not "halved"), and the existing phrasing captures the design intent accurately. No change — doesn't affect correctness and the comment already explains the reasoning. Acknowledged. The AnchorNode comment already explains the asymmetry in detail — both the definition site and the reference site each consume a level, making deeply nested anchor/alias pairs hit the limit sooner. The reviewer's observation about the budget not being exactly "halved" is correct (the comment says "reduced" not "halved"), and the existing phrasing captures the design intent accurately. No change — doesn't affect correctness and the comment already explains the reasoning.
|
||||
return err
|
||||
}
|
||||
// Scalar types (StringNode, IntegerNode, FloatNode, BoolNode, NullNode,
|
||||
|
||||
@@ -484,7 +484,6 @@ func TestYAMLDeeplyNestedRejection(t *testing.T) {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] Extra blank line between **[MINOR]** Extra blank line between `TestYAMLDeeplyNestedRejection` and `TestYAMLEmptyFileRejection` (two consecutive blank lines). Minor style issue per `gofmt` conventions (one blank line between top-level declarations).
sonnet-review-bot
commented
[NIT] Double blank line before **[NIT]** Double blank line before `TestYAMLEmptyFileRejection`. Minor formatting inconsistency — `gofmt` doesn't enforce blank-line counts between top-level declarations but it's slightly noisy.
sonnet-review-bot
commented
[NIT] Extra blank line between **[NIT]** Extra blank line between `TestYAMLDeeplyNestedRejection` closing brace and `TestYAMLEmptyFileRejection`. Minor style inconsistency — `gofmt` doesn't enforce single blank lines between top-level declarations, but two blank lines is unusual.
sonnet-review-bot
commented
[MINOR] There is a spurious blank line between **[MINOR]** There is a spurious blank line between `TestYAMLDeeplyNestedRejection` and `TestYAMLEmptyFileRejection` (two consecutive blank lines). Minor style issue per gofmt conventions, but the diff shows it was introduced in this PR.
|
||||
}
|
||||
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
func TestYAMLEmptyFileRejection(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
|
||||
|
sonnet-review-bot
commented
[NIT] In **[NIT]** In `TestYAMLEmptyFileRejection`, the `dir` variable is declared in the outer test function but files are written using paths under `dir` from within subtests. Since `t.TempDir()` is called on the outer `t`, it will be cleaned up when the outer test completes, which is fine. However, each subtest creates a file with the test case name as part of the filename directly in the shared `dir`. This is fine for parallel safety (test cases have unique names) but it is slightly more idiomatic to call `t.TempDir()` inside the subtest.
sonnet-review-bot
commented
[NIT] The **[NIT]** The `TestYAMLEmptyFileRejection` subtests write files into a shared `dir` created outside the loop. If two subtests run with the same filename pattern, they'd overwrite each other. Currently the filenames are distinct (`completely_empty.yaml`, `whitespace_only.yaml`, `comment_only.yaml`) so there's no actual issue, but using `t.TempDir()` inside each subtest would be the idiomatic pattern.
sonnet-review-bot
commented
[NIT] The **[NIT]** The `TestYAMLEmptyFileRejection` test creates the temp dir outside the `tests` loop, reusing a single dir for all subtests. Since each subtest writes to `tc.name+".yaml"` (different file names), there's no collision. However, `t.TempDir()` is called once at the top of the test function rather than inside each subtest — this is fine since the file names don't overlap, but using `t.TempDir()` inside each `t.Run` would be slightly more idiomatic for isolation.
|
||||
@@ -536,7 +535,7 @@ func TestYAMLFileSizeLimit(t *testing.T) {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
func TestYAMLAliasCycleDetection(t *testing.T) {
|
||||
// Test that our checkYAMLDepth function handles alias cycles gracefully
|
||||
// by using the seen map to prevent infinite recursion.
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// by using the visiting map to prevent infinite recursion.
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
// Create a node structure where an alias points to a parent node,
|
||||
// simulating what could happen with crafted input.
|
||||
@@ -559,17 +558,18 @@ func TestYAMLAliasCycleDetection(t *testing.T) {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
})
|
||||
|
||||
nodeCount := 0
|
||||
seen := make(map[ast.Node]struct{})
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
validated := make(map[ast.Node]int)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
visiting := make(map[ast.Node]bool)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
// This should NOT hang or stack overflow - the seen map prevents infinite recursion
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
err := checkYAMLDepth(parent, 0, MaxYAMLDepth, MaxYAMLNodes, seen, &nodeCount)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// This should NOT hang or stack overflow - cycle detection prevents infinite recursion
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
err := checkYAMLDepth(parent, 0, MaxYAMLDepth, MaxYAMLNodes, validated, visiting, &nodeCount)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error traversing cyclic structure: %v", err)
|
||||
}
|
||||
|
||||
// Verify we tracked the parent in the seen map
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if _, ok := seen[parent]; !ok {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Error("parent node not tracked in seen map")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// Verify we tracked the parent in the validated map
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if _, ok := validated[parent]; !ok {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Error("parent node not tracked in validated map")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
}
|
||||
|
||||
@@ -644,16 +644,63 @@ func TestCheckYAMLDepthCycleDetectionDirect(t *testing.T) {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
})
|
||||
|
||||
nodeCount := 0
|
||||
seen := make(map[ast.Node]struct{})
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
err := checkYAMLDepth(node, 0, MaxYAMLDepth, MaxYAMLNodes, seen, &nodeCount)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
validated := make(map[ast.Node]int)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
visiting := make(map[ast.Node]bool)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
err := checkYAMLDepth(node, 0, MaxYAMLDepth, MaxYAMLNodes, validated, visiting, &nodeCount)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
// Should complete without infinite recursion due to cycle detection
|
||||
if err != nil {
|
||||
t.Errorf("unexpected error: %v", err)
|
||||
}
|
||||
// The seen map should contain multiple entries
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if len(seen) < 2 {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Errorf("seen map has %d entries, expected at least 2", len(seen))
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// The validated map should contain multiple entries
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if len(validated) < 2 {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Errorf("validated map has %d entries, expected at least 2", len(validated))
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
func TestYAMLAliasDepthBypass(t *testing.T) {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// Test that an anchored subtree first validated at a shallow depth is
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// re-checked when referenced via alias at a deeper position. Without the
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// depth-aware validated map, the alias reference would skip re-checking
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// and allow the effective nesting to exceed MaxYAMLDepth.
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
dir := t.TempDir()
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
path := filepath.Join(dir, "alias-depth-bypass.yaml")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// Build YAML with an anchor at shallow depth containing a subtree near the limit,
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// then reference it via alias deep enough that effective depth exceeds MaxYAMLDepth.
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
var sb strings.Builder
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString("name: test\nidentity: test\n")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// Create the anchored subtree at depth 1 (key level) that nests 15 levels deep.
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString("anchor_key: &deep_anchor\n")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
for i := 0; i < 15; i++ {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString(strings.Repeat(" ", i+1))
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString(fmt.Sprintf("level%d:\n", i))
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString(strings.Repeat(" ", 16))
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString("leaf: value\n")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// Create a wrapper that nests 6 levels deep, then references the anchor.
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
// Effective depth at alias target = 6 (wrapper nesting) + 1 (alias) + 15 (subtree) = 22 > 20
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString("wrapper:\n")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
for i := 0; i < 6; i++ {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString(strings.Repeat(" ", i+1))
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString(fmt.Sprintf("n%d:\n", i))
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString(strings.Repeat(" ", 7))
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
sb.WriteString("alias_ref: *deep_anchor\n")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if err := os.WriteFile(path, []byte(sb.String()), 0644); err != nil {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Fatalf("failed to write test file: %v", err)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
_, err := LoadPersona(path)
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if err == nil {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Fatal("expected error for alias depth bypass, got nil")
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
if !strings.Contains(err.Error(), "nesting depth exceeds") {
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
t.Errorf("error = %q, want containing 'nesting depth exceeds'", err.Error())
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
rodin
commented
Fixed in Fixed in 0b16c41: moved `t.TempDir()` inside each subtest for proper isolation.
|
||||
[MINOR] The design doc claims using go-yaml's built-in depth protection via MaxYAMLDepth/MaxYAMLNodes instead of a manual depth walk, but the implementation still performs a custom AST depth/node-count check. Update the doc to reflect the actual approach or adopt the library's built-in options if available.
[NIT] The design document still contains the pseudocode showing the old gopkg.in/yaml.v3 api (yaml.Node, yaml.NewDecoder, etc.) with a note saying it's outdated. Consider either removing the old pseudocode entirely or replacing it with the actual implementation approach, since having misleading pseudocode in design docs can confuse future contributors.