feat: add YAML support for persona files #58

Merged
aweiker merged 6 commits from issue-57 into main 2026-05-11 01:39:43 +00:00
2 changed files with 65 additions and 3 deletions
Showing only changes of commit 4fed59ac85 - Show all commits
+12 -3
View File
36
@@ -150,12 +150,15 @@ func parsePersona(data []byte, source string) (*Persona, error) {
return &p, nil
}
Outdated
Review

[MINOR] YAML depth check traverses node.Content but ignores Alias nodes (node.Alias). Malicious YAML could leverage anchors/aliases to create effective deep structures without increasing Content depth. Consider handling Alias nodes explicitly.

**[MINOR]** YAML depth check traverses node.Content but ignores Alias nodes (node.Alias). Malicious YAML could leverage anchors/aliases to create effective deep structures without increasing Content depth. Consider handling Alias nodes explicitly.
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting.
// This protects against stack exhaustion from deeply nested structures.
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting
Outdated
Review

[MINOR] The unmarshalYAMLWithDepthLimit function uses interface{} as the parameter type for out instead of any. The project targets the latest stable Go release, and any is the idiomatic alias since Go 1.18. This is inconsistent with the rest of the codebase convention.

**[MINOR]** The `unmarshalYAMLWithDepthLimit` function uses `interface{}` as the parameter type for `out` instead of `any`. The project targets the latest stable Go release, and `any` is the idiomatic alias since Go 1.18. This is inconsistent with the rest of the codebase convention.
Outdated
Review

[MINOR] The two-pass decode approach (first into yaml.Node for depth check, then strict decode from raw bytes) is necessary because KnownFields doesn't work on yaml.Node.Decode(), but this means the input bytes are parsed twice. The comment explains this correctly. A minor concern: if gopkg.in/yaml.v3 ever adds KnownFields support to node.Decode(), this could be simplified — but for now this is the correct workaround and is well-documented.

**[MINOR]** The two-pass decode approach (first into yaml.Node for depth check, then strict decode from raw bytes) is necessary because KnownFields doesn't work on yaml.Node.Decode(), but this means the input bytes are parsed twice. The comment explains this correctly. A minor concern: if `gopkg.in/yaml.v3` ever adds KnownFields support to node.Decode(), this could be simplified — but for now this is the correct workaround and is well-documented.
// and strict field checking. This protects against stack exhaustion from deeply
// nested structures and catches typos in field names.
// Note: Multi-document YAML files are accepted but only the first document is
// parsed; additional documents are silently ignored. This is acceptable for
// persona files where multi-document support is not a use case.
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
Outdated
Review

[MAJOR] The recursive YAML depth checker does not detect alias/anchor cycles. If a YAML document contains cyclic aliases (e.g., an alias ultimately pointing back to an anchored node that references the alias), checkYAMLDepth can recurse indefinitely, leading to stack exhaustion or a hang (DoS). Add cycle detection (visited set of *yaml.Node) or a total node visitation cap to prevent infinite recursion.

**[MAJOR]** The recursive YAML depth checker does not detect alias/anchor cycles. If a YAML document contains cyclic aliases (e.g., an alias ultimately pointing back to an anchored node that references the alias), checkYAMLDepth can recurse indefinitely, leading to stack exhaustion or a hang (DoS). Add cycle detection (visited set of *yaml.Node) or a total node visitation cap to prevent infinite recursion.
// First pass: decode into a yaml.Node to check depth limits.
Outdated
Review

[MINOR] JSON parsing uses json.Unmarshal, which silently ignores unknown fields. For consistency with YAML's KnownFields(true) and to harden against typos or unintended keys, consider switching to json.Decoder with DisallowUnknownFields() to reject unknown JSON fields.

**[MINOR]** JSON parsing uses json.Unmarshal, which silently ignores unknown fields. For consistency with YAML's KnownFields(true) and to harden against typos or unintended keys, consider switching to json.Decoder with DisallowUnknownFields() to reject unknown JSON fields.
// This prevents stack exhaustion before we attempt to decode into structs.
Outdated
Review

[MINOR] The decoder created with yaml.NewDecoder is used to decode a single document but does not check for a second call to Decode returning io.EOF to confirm there's only one document. For untrusted user-supplied YAML files, silently ignoring additional documents is acceptable (they won't affect the parsed struct), but it's worth noting that multi-document YAML files will have their additional documents silently ignored rather than surfacing an error. This is a minor UX issue, not a correctness bug.

**[MINOR]** The decoder created with `yaml.NewDecoder` is used to decode a single document but does not check for a second call to `Decode` returning `io.EOF` to confirm there's only one document. For untrusted user-supplied YAML files, silently ignoring additional documents is acceptable (they won't affect the parsed struct), but it's worth noting that multi-document YAML files will have their additional documents silently ignored rather than surfacing an error. This is a minor UX issue, not a correctness bug.
Outdated
Review

[MINOR] The unmarshalYAMLWithDepthLimit function silently ignores multi-document YAML files (only parses the first document). While the comment acknowledges this, it could be a footgun: a user who accidentally writes --- in their persona file will get no error, just silent truncation. Consider calling dec.Decode a second time and returning an error if a second document is found, to give users a clearer signal.

**[MINOR]** The `unmarshalYAMLWithDepthLimit` function silently ignores multi-document YAML files (only parses the first document). While the comment acknowledges this, it could be a footgun: a user who accidentally writes `---` in their persona file will get no error, just silent truncation. Consider calling `dec.Decode` a second time and returning an error if a second document is found, to give users a clearer signal.
var node yaml.Node
dec := yaml.NewDecoder(bytes.NewReader(data))
if err := dec.Decode(&node); err != nil {
Outdated
Review

[MINOR] When encountering an alias node, the depth check calls checkYAMLDepth on the alias target without incrementing depth, potentially undercounting nesting by one compared to non-alias children. While limited in practical bypass (one level), this slightly weakens the intended MaxYAMLDepth enforcement. Consider passing depth+1 when following an alias used as a value.

**[MINOR]** When encountering an alias node, the depth check calls checkYAMLDepth on the alias target without incrementing depth, potentially undercounting nesting by one compared to non-alias children. While limited in practical bypass (one level), this slightly weakens the intended MaxYAMLDepth enforcement. Consider passing depth+1 when following an alias used as a value.
Outdated
Review

[MINOR] Depth enforcement occurs after decoding the full YAML into a yaml.Node. While the 64KB file-size cap mitigates risk, an attacker could still supply deeply nested but small YAML to stress the decoder before the depth check. Consider additional safeguards (e.g., limiting total nodes visited or using a visited set, or further reducing allowed size/depth) for defense in depth.

**[MINOR]** Depth enforcement occurs after decoding the full YAML into a yaml.Node. While the 64KB file-size cap mitigates risk, an attacker could still supply deeply nested but small YAML to stress the decoder before the depth check. Consider additional safeguards (e.g., limiting total nodes visited or using a visited set, or further reducing allowed size/depth) for defense in depth.
1
@@ -166,7 +169,13 @@ func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
return err
}
Outdated
Review

[MINOR] The multi-document rejection check (dec.Decode(&extra)) re-uses the same decoder after the first dec.Decode(&node) call. If the first document exhausted the decoder but there's trailing non-document content (e.g., a trailing ... end-of-document marker), the behavior depends on go-yaml internals. This is likely fine in practice, but the comment could clarify that this relies on go-yaml's decoder advancing past the first document.

**[MINOR]** The multi-document rejection check (`dec.Decode(&extra)`) re-uses the same decoder after the first `dec.Decode(&node)` call. If the first document exhausted the decoder but there's trailing non-document content (e.g., a trailing `...` end-of-document marker), the behavior depends on go-yaml internals. This is likely fine in practice, but the comment could clarify that this relies on go-yaml's decoder advancing past the first document.
return node.Decode(out)
// Second pass: decode with strict field checking enabled.
// KnownFields(true) rejects unknown keys, catching typos like "focuss" or "identiy".
// We must re-decode from the original data because yaml.Node.Decode() doesn't
// support the KnownFields option.
strictDec := yaml.NewDecoder(bytes.NewReader(data))
Outdated
Review

[MINOR] The alias-following logic in checkYAMLDepth does not count the alias node's own depth level before recursing into the alias target. If an alias node appears near the depth limit, the target's content is checked starting from the same depth value as the alias itself (not depth+1). This means a deeply aliased structure could potentially bypass the limit by 1 level. While the impact is minimal given the depth=20 limit, incrementing depth before recursing into the alias target would be more consistent.

**[MINOR]** The alias-following logic in `checkYAMLDepth` does not count the alias node's own depth level before recursing into the alias target. If an alias node appears near the depth limit, the target's content is checked starting from the same `depth` value as the alias itself (not `depth+1`). This means a deeply aliased structure could potentially bypass the limit by 1 level. While the impact is minimal given the depth=20 limit, incrementing depth before recursing into the alias target would be more consistent.
strictDec.KnownFields(true)
Outdated
Review

[MINOR] The alias depth check uses return checkYAMLDepth(node.Alias, depth, maxDepth) — i.e., the alias itself doesn't add to the depth. This is intentional per the comment, but YAML aliases can be used to create large fan-out structures. An alias to a deeply nested structure effectively doubles the depth impact without incrementing depth for the alias node itself. For the persona use case with MaxYAMLDepth=20 this is acceptable, but worth noting the design tradeoff.

**[MINOR]** The alias depth check uses `return checkYAMLDepth(node.Alias, depth, maxDepth)` — i.e., the alias itself doesn't add to the depth. This is intentional per the comment, but YAML aliases can be used to create large fan-out structures. An alias to a deeply nested structure effectively doubles the depth impact without incrementing depth for the alias node itself. For the persona use case with MaxYAMLDepth=20 this is acceptable, but worth noting the design tradeoff.
return strictDec.Decode(out)
}
// checkYAMLDepth recursively checks that YAML nodes don't exceed the depth limit.
4
+53
View File
4
@@ -512,3 +512,56 @@ func TestListBuiltinPersonasSortedOrder(t *testing.T) {
}
}
}
func TestYAMLUnknownFieldsRejected(t *testing.T) {
tests := []struct {
name string
content string
wantErr string
}{
{
name: "unknown top-level field",
content: `name: test
identity: test identity
unknown_field: should fail
`,
wantErr: "unknown_field",
},
{
name: "typo in field name",
content: `name: test
identiy: typo should fail
`,
wantErr: "identiy",
},
{
name: "unknown field in severity",
content: `name: test
identity: test
severity:
major: Major
minro: typo
`,
wantErr: "minro",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "unknown.yaml")
if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Errorf("expected error for unknown field %q, got nil", tt.wantErr)
return
}
if !strings.Contains(err.Error(), tt.wantErr) {
t.Errorf("error = %q, want containing %q", err.Error(), tt.wantErr)
}
})
}
}