feat: add YAML support for persona files #58
@@ -150,12 +150,15 @@ func parsePersona(data []byte, source string) (*Persona, error) {
|
||||
return &p, nil
|
||||
}
|
||||
|
|
||||
|
||||
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting.
|
||||
// This protects against stack exhaustion from deeply nested structures.
|
||||
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `unmarshalYAMLWithDepthLimit` function uses `interface{}` as the parameter type for `out` instead of `any`. The project targets the latest stable Go release, and `any` is the idiomatic alias since Go 1.18. This is inconsistent with the rest of the codebase convention.
sonnet-review-bot
commented
[MINOR] The two-pass decode approach (first into yaml.Node for depth check, then strict decode from raw bytes) is necessary because KnownFields doesn't work on yaml.Node.Decode(), but this means the input bytes are parsed twice. The comment explains this correctly. A minor concern: if **[MINOR]** The two-pass decode approach (first into yaml.Node for depth check, then strict decode from raw bytes) is necessary because KnownFields doesn't work on yaml.Node.Decode(), but this means the input bytes are parsed twice. The comment explains this correctly. A minor concern: if `gopkg.in/yaml.v3` ever adds KnownFields support to node.Decode(), this could be simplified — but for now this is the correct workaround and is well-documented.
|
||||
// and strict field checking. This protects against stack exhaustion from deeply
|
||||
// nested structures and catches typos in field names.
|
||||
// Note: Multi-document YAML files are accepted but only the first document is
|
||||
// parsed; additional documents are silently ignored. This is acceptable for
|
||||
// persona files where multi-document support is not a use case.
|
||||
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
|
||||
|
[MAJOR] The recursive YAML depth checker does not detect alias/anchor cycles. If a YAML document contains cyclic aliases (e.g., an alias ultimately pointing back to an anchored node that references the alias), checkYAMLDepth can recurse indefinitely, leading to stack exhaustion or a hang (DoS). Add cycle detection (visited set of *yaml.Node) or a total node visitation cap to prevent infinite recursion. **[MAJOR]** The recursive YAML depth checker does not detect alias/anchor cycles. If a YAML document contains cyclic aliases (e.g., an alias ultimately pointing back to an anchored node that references the alias), checkYAMLDepth can recurse indefinitely, leading to stack exhaustion or a hang (DoS). Add cycle detection (visited set of *yaml.Node) or a total node visitation cap to prevent infinite recursion.
|
||||
// First pass: decode into a yaml.Node to check depth limits.
|
||||
|
[MINOR] JSON parsing uses json.Unmarshal, which silently ignores unknown fields. For consistency with YAML's KnownFields(true) and to harden against typos or unintended keys, consider switching to json.Decoder with DisallowUnknownFields() to reject unknown JSON fields. **[MINOR]** JSON parsing uses json.Unmarshal, which silently ignores unknown fields. For consistency with YAML's KnownFields(true) and to harden against typos or unintended keys, consider switching to json.Decoder with DisallowUnknownFields() to reject unknown JSON fields.
|
||||
// This prevents stack exhaustion before we attempt to decode into structs.
|
||||
|
sonnet-review-bot
commented
[MINOR] The decoder created with **[MINOR]** The decoder created with `yaml.NewDecoder` is used to decode a single document but does not check for a second call to `Decode` returning `io.EOF` to confirm there's only one document. For untrusted user-supplied YAML files, silently ignoring additional documents is acceptable (they won't affect the parsed struct), but it's worth noting that multi-document YAML files will have their additional documents silently ignored rather than surfacing an error. This is a minor UX issue, not a correctness bug.
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `unmarshalYAMLWithDepthLimit` function silently ignores multi-document YAML files (only parses the first document). While the comment acknowledges this, it could be a footgun: a user who accidentally writes `---` in their persona file will get no error, just silent truncation. Consider calling `dec.Decode` a second time and returning an error if a second document is found, to give users a clearer signal.
|
||||
var node yaml.Node
|
||||
dec := yaml.NewDecoder(bytes.NewReader(data))
|
||||
if err := dec.Decode(&node); err != nil {
|
||||
|
[MINOR] When encountering an alias node, the depth check calls checkYAMLDepth on the alias target without incrementing depth, potentially undercounting nesting by one compared to non-alias children. While limited in practical bypass (one level), this slightly weakens the intended MaxYAMLDepth enforcement. Consider passing depth+1 when following an alias used as a value. **[MINOR]** When encountering an alias node, the depth check calls checkYAMLDepth on the alias target without incrementing depth, potentially undercounting nesting by one compared to non-alias children. While limited in practical bypass (one level), this slightly weakens the intended MaxYAMLDepth enforcement. Consider passing depth+1 when following an alias used as a value.
[MINOR] Depth enforcement occurs after decoding the full YAML into a yaml.Node. While the 64KB file-size cap mitigates risk, an attacker could still supply deeply nested but small YAML to stress the decoder before the depth check. Consider additional safeguards (e.g., limiting total nodes visited or using a visited set, or further reducing allowed size/depth) for defense in depth. **[MINOR]** Depth enforcement occurs after decoding the full YAML into a yaml.Node. While the 64KB file-size cap mitigates risk, an attacker could still supply deeply nested but small YAML to stress the decoder before the depth check. Consider additional safeguards (e.g., limiting total nodes visited or using a visited set, or further reducing allowed size/depth) for defense in depth.
|
||||
@@ -166,7 +169,13 @@ func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
|
||||
return err
|
||||
}
|
||||
|
||||
|
sonnet-review-bot
commented
[MINOR] The multi-document rejection check ( **[MINOR]** The multi-document rejection check (`dec.Decode(&extra)`) re-uses the same decoder after the first `dec.Decode(&node)` call. If the first document exhausted the decoder but there's trailing non-document content (e.g., a trailing `...` end-of-document marker), the behavior depends on go-yaml internals. This is likely fine in practice, but the comment could clarify that this relies on go-yaml's decoder advancing past the first document.
|
||||
return node.Decode(out)
|
||||
// Second pass: decode with strict field checking enabled.
|
||||
// KnownFields(true) rejects unknown keys, catching typos like "focuss" or "identiy".
|
||||
// We must re-decode from the original data because yaml.Node.Decode() doesn't
|
||||
// support the KnownFields option.
|
||||
strictDec := yaml.NewDecoder(bytes.NewReader(data))
|
||||
|
sonnet-review-bot
commented
[MINOR] The alias-following logic in **[MINOR]** The alias-following logic in `checkYAMLDepth` does not count the alias node's own depth level before recursing into the alias target. If an alias node appears near the depth limit, the target's content is checked starting from the same `depth` value as the alias itself (not `depth+1`). This means a deeply aliased structure could potentially bypass the limit by 1 level. While the impact is minimal given the depth=20 limit, incrementing depth before recursing into the alias target would be more consistent.
|
||||
strictDec.KnownFields(true)
|
||||
|
sonnet-review-bot
commented
[MINOR] The alias depth check uses **[MINOR]** The alias depth check uses `return checkYAMLDepth(node.Alias, depth, maxDepth)` — i.e., the alias itself doesn't add to the depth. This is intentional per the comment, but YAML aliases can be used to create large fan-out structures. An alias to a deeply nested structure effectively doubles the depth impact without incrementing depth for the alias node itself. For the persona use case with MaxYAMLDepth=20 this is acceptable, but worth noting the design tradeoff.
|
||||
return strictDec.Decode(out)
|
||||
}
|
||||
|
||||
// checkYAMLDepth recursively checks that YAML nodes don't exceed the depth limit.
|
||||
|
||||
@@ -512,3 +512,56 @@ func TestListBuiltinPersonasSortedOrder(t *testing.T) {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func TestYAMLUnknownFieldsRejected(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
content string
|
||||
wantErr string
|
||||
}{
|
||||
{
|
||||
name: "unknown top-level field",
|
||||
content: `name: test
|
||||
identity: test identity
|
||||
unknown_field: should fail
|
||||
`,
|
||||
wantErr: "unknown_field",
|
||||
},
|
||||
{
|
||||
name: "typo in field name",
|
||||
content: `name: test
|
||||
identiy: typo should fail
|
||||
`,
|
||||
wantErr: "identiy",
|
||||
},
|
||||
{
|
||||
name: "unknown field in severity",
|
||||
content: `name: test
|
||||
identity: test
|
||||
severity:
|
||||
major: Major
|
||||
minro: typo
|
||||
`,
|
||||
wantErr: "minro",
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
path := filepath.Join(dir, "unknown.yaml")
|
||||
if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil {
|
||||
t.Fatalf("failed to write test file: %v", err)
|
||||
}
|
||||
|
||||
_, err := LoadPersona(path)
|
||||
if err == nil {
|
||||
t.Errorf("expected error for unknown field %q, got nil", tt.wantErr)
|
||||
return
|
||||
}
|
||||
if !strings.Contains(err.Error(), tt.wantErr) {
|
||||
t.Errorf("error = %q, want containing %q", err.Error(), tt.wantErr)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
[MINOR] YAML depth check traverses node.Content but ignores Alias nodes (node.Alias). Malicious YAML could leverage anchors/aliases to create effective deep structures without increasing Content depth. Consider handling Alias nodes explicitly.