diff --git a/README.md b/README.md index 583ebd8..23fc038 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ AI-powered code review bot for Gitea pull requests. Fetches diff + context, send - **Smart budget**: Automatically trims context to fit model token limits - **Idempotent reviews**: Posts new review, then cleans up stale ones (one review per bot) - **Custom prompts**: Load additional instructions from a file (e.g. security-focused review) -- **Zero dependencies**: Go stdlib only +- **Minimal dependencies**: Go stdlib + `gopkg.in/yaml.v3` only ## Quick Start: Composite Action @@ -208,7 +208,7 @@ AI Core handles OAuth token management and deployment discovery automatically. M | `patterns-files` | No | `README.md` | Files/directories to fetch from pattern repos | | `system-prompt-file` | No | `""` | Local file with additional system prompt instructions | | `persona` | No | `""` | Built-in persona name (security, architect, docs) | -| `persona-file` | No | `""` | Path to persona JSON file with custom review focus | +| `persona-file` | No | `""` | Path to persona file (YAML or JSON) with custom review focus | | `temperature` | No | `0` | LLM temperature (0 = server default) | | `timeout` | No | `300` | LLM request timeout in seconds | | `dry-run` | No | `false` | Print review to stdout instead of posting | @@ -408,32 +408,38 @@ Each persona posts independently with its own sentinel, so reviews don't interfe ### Custom Personas -Create a JSON file with your domain-specific review focus: +Create a YAML file with your domain-specific review focus: -```json -// .review/personas/trading.json -{ - "name": "trading", - "display_name": "Trading Domain Expert", - "identity": "You are a trading systems expert reviewing code for correctness.\n\nYour expertise:\n- Order lifecycle and state machines\n- Fill handling and partial fills\n- Position tracking and P&L calculations\n- Event sourcing invariants", - "focus": [ - "Order state machine correctness", - "Fill handling edge cases (partial, overfill)", - "Position and P&L calculation accuracy", - "Event replay determinism", - "Decimal precision for money" - ], - "ignore": [ - "Code style", - "General performance", - "Documentation formatting" - ], - "severity": { - "major": "Bugs that cause incorrect positions, fills, or money calculations", - "minor": "Edge cases that could cause issues under unusual conditions", - "nit": "Clarity improvements for domain logic" - } -} +```yaml +# .review/personas/trading.yaml +name: trading +display_name: Trading Domain Expert + +identity: | + You are a trading systems expert reviewing code for correctness. + + Your expertise: + - Order lifecycle and state machines + - Fill handling and partial fills + - Position tracking and P&L calculations + - Event sourcing invariants + +focus: + - Order state machine correctness + - Fill handling edge cases (partial, overfill) + - Position and P&L calculation accuracy + - Event replay determinism + - Decimal precision for money + +ignore: + - Code style + - General performance + - Documentation formatting + +severity: + major: "Bugs that cause incorrect positions, fills, or money calculations" + minor: "Edge cases that could cause issues under unusual conditions" + nit: "Clarity improvements for domain logic" ``` Use it in CI: @@ -442,17 +448,24 @@ Use it in CI: - uses: rodin/review-bot/.gitea/actions/review@v1 with: reviewer-name: trading - persona-file: .review/personas/trading.json + persona-file: .review/personas/trading.yaml ... ``` +YAML is the recommended format for personas because it supports: +- Multi-line strings with `|` blocks (cleaner identity definitions) +- Comments for documentation +- More readable arrays and nested structures + +JSON is also supported for backwards compatibility—just use `.json` extension. + ### Persona vs system-prompt-file | Feature | `persona` / `persona-file` | `system-prompt-file` | |---------|---------------------------|----------------------| | Replaces base prompt | Yes | No (appends) | -| Structured format | Yes (JSON) | No (freeform) | +| Structured format | Yes (YAML/JSON) | No (freeform) | | Focus/ignore lists | Yes | Manual | | Severity calibration | Yes | Manual | | Header display name | Yes | No | diff --git a/docs/DESIGN-57-yaml-persona.md b/docs/DESIGN-57-yaml-persona.md new file mode 100644 index 0000000..628e0a6 --- /dev/null +++ b/docs/DESIGN-57-yaml-persona.md @@ -0,0 +1,108 @@ +# Design: YAML Support for Persona Files (#57) + +## Problem + +JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability. + +## Constraints + +- Backwards compatibility: existing JSON personas must continue to work +- Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486) +- Consistency: use `.yaml` extension (not `.yml`) +- Library: use `gopkg.in/yaml.v3` (approved in CONVENTIONS.md) with explicit depth limiting + +## Proposed Approach + +1. **Update `parsePersona`** to detect format from file extension +2. **Add YAML parsing** with explicit depth limit (defense in depth) +3. **Keep JSON as fallback** for files without `.yaml`/`.yml` extension +4. **Convert built-in personas** to YAML format +5. **Update embed directive** to include both formats + +### File Extension Detection + +```go +func parsePersona(data []byte, source string) (*Persona, error) { + isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml") + if isYAML { + return parseYAML(data, source) + } + return parseJSON(data, source) +} +``` + +### YAML Parsing with Depth Protection + +```go +func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error { + var node yaml.Node + dec := yaml.NewDecoder(bytes.NewReader(data)) + if err := dec.Decode(&node); err != nil { + return err + } + if err := checkYAMLDepth(&node, 0, maxDepth); err != nil { + return err + } + return node.Decode(out) +} + +func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error { + if depth > maxDepth { + return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth) + } + // Handle alias nodes by following the Alias pointer + if node.Kind == yaml.AliasNode && node.Alias != nil { + return checkYAMLDepth(node.Alias, depth, maxDepth) + } + for _, child := range node.Content { + if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil { + return err + } + } + return nil +} +``` + +The `gopkg.in/yaml.v3` library does not have built-in depth protection, so we implement explicit depth checking by first decoding into a `yaml.Node`, walking the tree to verify depth (including alias resolution), then decoding into the target struct. + +## State/Data Model + +No new state. Same `Persona` struct, just different parsing. + +## Error Cases + +| Error | Handling | +|-------|----------| +| Invalid YAML syntax | Return parse error with source file | +| Deeply nested YAML | Library rejects (v1.16.0+ fix) | +| Unknown extension | Fall back to JSON parsing | +| Missing required fields | Validation rejects after parse | + +## Edge Cases + +- File with `.json` extension but YAML content → JSON parse fails, user sees error +- File with no extension → defaults to JSON +- Embedded persona reference like `builtin:security` → detect by embed path (`personas/X.yaml`) + +## Testing Strategy + +1. Unit tests for YAML parsing (valid, invalid, deeply nested) +2. Unit tests for extension detection +3. Integration test for built-in personas (now YAML) +4. Backwards compat test: verify JSON still works for external files + +## Completion Checklist + +1. [ ] `go-yaml` dependency added at v1.16.0+ +2. [ ] Extension detection uses case-insensitive comparison +3. [ ] YAML parse errors include source file name +4. [ ] JSON parsing still works for `.json` files +5. [ ] Built-in personas converted to YAML with readable multi-line strings +6. [ ] Embed directive updated to include `*.yaml` +7. [ ] Test for deeply nested YAML rejection +8. [ ] All existing tests pass + +## Open Questions + +- Should we support both `.yaml` AND `.yml`? Issue says `.yaml` only for consistency, but some users expect `.yml`. **Decision:** Support both for reading, recommend `.yaml` in docs. +- Should we add a "format" field to detect mismatched extension/content? **Decision:** No, keep it simple. Extension determines format. diff --git a/go.mod b/go.mod index 9b2e8d2..0d9eded 100644 --- a/go.mod +++ b/go.mod @@ -1,3 +1,5 @@ module gitea.weiker.me/rodin/review-bot go 1.26.2 + +require gopkg.in/yaml.v3 v3.0.1 diff --git a/go.sum b/go.sum new file mode 100644 index 0000000..a62c313 --- /dev/null +++ b/go.sum @@ -0,0 +1,4 @@ +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/review/persona.go b/review/persona.go index c8d676c..0f7f141 100644 --- a/review/persona.go +++ b/review/persona.go @@ -1,81 +1,153 @@ package review import ( + "bytes" "embed" "encoding/json" "fmt" "os" + "sort" "strings" "unicode/utf8" + + "gopkg.in/yaml.v3" ) -//go:embed personas/*.json +//go:embed personas/*.yaml var embeddedPersonas embed.FS +// MaxPersonaFileSize is the maximum size for persona files (64 KB). +// This prevents denial-of-service via excessively large files. +const MaxPersonaFileSize = 64 * 1024 + +// MaxYAMLDepth is the maximum nesting depth allowed in YAML persona files. +// This prevents stack exhaustion from deeply nested structures. +const MaxYAMLDepth = 20 + +// MaxYAMLNodes is the maximum number of YAML nodes allowed in persona files. +// This prevents DoS via wide-but-shallow structures that bypass depth limits. +const MaxYAMLNodes = 1000 + // Persona defines a specialized review role with focused expertise. type Persona struct { - Name string `json:"name"` - DisplayName string `json:"display_name"` - ModelPref string `json:"model_preference,omitempty"` - Identity string `json:"identity"` - Focus []string `json:"focus"` - Ignore []string `json:"ignore"` - Severity Severity `json:"severity"` - OutputFormat string `json:"output_format,omitempty"` + Name string `json:"name" yaml:"name"` + DisplayName string `json:"display_name" yaml:"display_name"` + ModelPref string `json:"model_preference,omitempty" yaml:"model_preference,omitempty"` + Identity string `json:"identity" yaml:"identity"` + Focus []string `json:"focus" yaml:"focus"` + Ignore []string `json:"ignore" yaml:"ignore"` + Severity Severity `json:"severity" yaml:"severity"` + OutputFormat string `json:"output_format,omitempty" yaml:"output_format,omitempty"` } // Severity defines what constitutes each severity level for this persona. // These are prompt guidance for the LLM, not output format changes. type Severity struct { - Major string `json:"major"` - Minor string `json:"minor"` - Nit string `json:"nit"` + Major string `json:"major" yaml:"major"` + Minor string `json:"minor" yaml:"minor"` + Nit string `json:"nit" yaml:"nit"` } -// LoadPersona loads a persona from a JSON file path. +// LoadPersona loads a persona from a JSON or YAML file path. +// Format is detected by file extension: .yaml/.yml for YAML, .json or other for JSON. +// Files larger than MaxPersonaFileSize are rejected. +// +// Symlinks are supported: os.Stat follows symlinks, so a symlink pointing to +// a regular file will pass the IsRegular() check. Symlinks to non-regular files +// (directories, FIFOs, devices) are still rejected. func LoadPersona(path string) (*Persona, error) { + // os.Stat follows symlinks, so symlinks to regular files are supported. + // The IsRegular() check operates on the target, not the symlink itself. + info, err := os.Stat(path) + if err != nil { + return nil, fmt.Errorf("read persona file %s: %w", path, err) + } + if !info.Mode().IsRegular() { + return nil, fmt.Errorf("persona file %s is not a regular file", path) + } + if info.Size() > MaxPersonaFileSize { + return nil, fmt.Errorf("persona file %s exceeds maximum size (%d bytes)", path, MaxPersonaFileSize) + } data, err := os.ReadFile(path) if err != nil { return nil, fmt.Errorf("read persona file %s: %w", path, err) } + // Re-check size after read to defend against TOCTOU races where file + // grows between stat and read (e.g., appending process, replaced file). + if len(data) > MaxPersonaFileSize { + return nil, fmt.Errorf("persona file %s exceeds maximum size (%d bytes)", path, MaxPersonaFileSize) + } return parsePersona(data, path) } // LoadBuiltinPersona loads a built-in persona by name. // Returns an error if the persona doesn't exist. +// Built-in personas are stored in YAML format only (see embed directive). func LoadBuiltinPersona(name string) (*Persona, error) { - filename := name + ".json" - data, err := embeddedPersonas.ReadFile("personas/" + filename) // embed.FS paths use forward slashes per io/fs spec + yamlFile := name + ".yaml" + data, err := embeddedPersonas.ReadFile("personas/" + yamlFile) if err != nil { available := ListBuiltinPersonas() return nil, fmt.Errorf("unknown built-in persona %q (available: %s)", name, strings.Join(available, ", ")) } - return parsePersona(data, "builtin:"+name) + return parsePersona(data, "builtin:"+yamlFile) } -// ListBuiltinPersonas returns the names of all built-in personas. +// ListBuiltinPersonas returns the names of all built-in personas in sorted order. // Returns an empty slice if the embedded directory cannot be read. func ListBuiltinPersonas() []string { entries, err := embeddedPersonas.ReadDir("personas") if err != nil { return []string{} } - var names []string + seen := make(map[string]bool) for _, e := range entries { if e.IsDir() { continue } name := e.Name() - if strings.HasSuffix(name, ".json") { - names = append(names, strings.TrimSuffix(name, ".json")) + // Strip extension to get persona name + var personaName string + switch { + case strings.HasSuffix(name, ".yaml"): + personaName = strings.TrimSuffix(name, ".yaml") + case strings.HasSuffix(name, ".yml"): + personaName = strings.TrimSuffix(name, ".yml") + case strings.HasSuffix(name, ".json"): + personaName = strings.TrimSuffix(name, ".json") + default: + continue + } + if !seen[personaName] { + seen[personaName] = true } } + names := make([]string, 0, len(seen)) + for name := range seen { + names = append(names, name) + } + sort.Strings(names) return names } +// parsePersona parses persona data from JSON or YAML format. +// Format is detected by the source file extension. func parsePersona(data []byte, source string) (*Persona, error) { + lowerSource := strings.ToLower(source) + isYAML := strings.HasSuffix(lowerSource, ".yaml") || strings.HasSuffix(lowerSource, ".yml") + var p Persona - if err := json.Unmarshal(data, &p); err != nil { + var err error + if isYAML { + err = unmarshalYAMLWithDepthLimit(data, &p, MaxYAMLDepth) + } else { + // Use json.Decoder with DisallowUnknownFields for consistency with + // YAML's KnownFields(true) - both reject unknown fields to catch typos. + dec := json.NewDecoder(bytes.NewReader(data)) + dec.DisallowUnknownFields() + err = dec.Decode(&p) + } + if err != nil { return nil, fmt.Errorf("parse persona %s: %w", source, err) } if err := validatePersona(&p, source); err != nil { @@ -84,6 +156,74 @@ func parsePersona(data []byte, source string) (*Persona, error) { return &p, nil } +// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting +// and strict field checking. This protects against stack exhaustion from deeply +// nested structures and catches typos in field names. +// Multi-document YAML files are rejected to prevent silent data loss. +func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error { + // First pass: decode into a yaml.Node to check depth limits and node counts. + // This prevents stack exhaustion before we attempt to decode into structs. + var node yaml.Node + dec := yaml.NewDecoder(bytes.NewReader(data)) + if err := dec.Decode(&node); err != nil { + return err + } + + // Reject multi-document YAML files - silently ignoring additional documents + // could lead to confusing behavior where users think their changes take effect. + var extra yaml.Node + if dec.Decode(&extra) == nil { + return fmt.Errorf("multi-document YAML is not supported; only single-document files are allowed") + } + + nodeCount := 0 + if err := checkYAMLDepth(&node, 0, maxDepth, MaxYAMLNodes, make(map[*yaml.Node]struct{}), &nodeCount); err != nil { + return err + } + + // Second pass: decode with strict field checking enabled. + // KnownFields(true) rejects unknown keys, catching typos like "focuss" or "identiy". + // We must re-decode from the original data because yaml.Node.Decode() doesn't + // support the KnownFields option. + strictDec := yaml.NewDecoder(bytes.NewReader(data)) + strictDec.KnownFields(true) + return strictDec.Decode(out) +} + +// checkYAMLDepth recursively checks that YAML nodes don't exceed the depth limit +// or the total node count limit. It also detects alias cycles to prevent infinite +// recursion from crafted YAML with self-referential aliases. +func checkYAMLDepth(node *yaml.Node, depth, maxDepth, maxNodes int, seen map[*yaml.Node]struct{}, nodeCount *int) error { + if depth > maxDepth { + return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth) + } + + // Track total nodes visited as defense-in-depth against wide-but-shallow attacks. + *nodeCount++ + if *nodeCount > maxNodes { + return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes) + } + + // Cycle detection: if we've seen this node before, we're in a cycle. + if _, ok := seen[node]; ok { + return nil // Already validated this subtree, skip to avoid infinite recursion. + } + seen[node] = struct{}{} + + // Handle alias nodes: follow the alias to its anchor target. + // Increment depth when following aliases since they expand the effective structure. + if node.Kind == yaml.AliasNode && node.Alias != nil { + return checkYAMLDepth(node.Alias, depth+1, maxDepth, maxNodes, seen, nodeCount) + } + + for _, child := range node.Content { + if err := checkYAMLDepth(child, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil { + return err + } + } + return nil +} + func validatePersona(p *Persona, source string) error { if p.Name == "" { return fmt.Errorf("persona %s: name is required", source) diff --git a/review/persona_test.go b/review/persona_test.go index f749ef9..b90eae7 100644 --- a/review/persona_test.go +++ b/review/persona_test.go @@ -1,10 +1,13 @@ package review import ( + "fmt" "os" "path/filepath" "strings" "testing" + + "gopkg.in/yaml.v3" ) func TestLoadBuiltinPersona(t *testing.T) { @@ -87,6 +90,83 @@ func TestListBuiltinPersonas(t *testing.T) { } } +func TestLoadPersonaFromYAMLFile(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "test.yaml") + + content := `# Test persona +name: test +display_name: Test Persona +identity: | + You are a test persona. + Multi-line identity works. +focus: + - testing + - validation +ignore: + - nothing +severity: + major: Big problems + minor: Small problems + nit: Tiny problems +` + + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + p, err := LoadPersona(path) + if err != nil { + t.Fatalf("LoadPersona failed: %v", err) + } + + if p.Name != "test" { + t.Errorf("Name = %q, want %q", p.Name, "test") + } + if p.DisplayName != "Test Persona" { + t.Errorf("DisplayName = %q, want %q", p.DisplayName, "Test Persona") + } + if len(p.Focus) != 2 { + t.Errorf("Focus len = %d, want 2", len(p.Focus)) + } + if !strings.Contains(p.Identity, "Multi-line") { + t.Error("Identity should contain multi-line content") + } +} + +func TestLoadPersonaFromYMLFile(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "test.yml") + + content := `name: test +display_name: Test YML +identity: Test identity +focus: + - testing +ignore: [] +severity: + major: Big + minor: Small + nit: Tiny +` + + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + p, err := LoadPersona(path) + if err != nil { + t.Fatalf("LoadPersona failed: %v", err) + } + + if p.Name != "test" { + t.Errorf("Name = %q, want %q", p.Name, "test") + } + if p.DisplayName != "Test YML" { + t.Errorf("DisplayName = %q, want %q", p.DisplayName, "Test YML") + } +} + func TestLoadPersonaFromJSONFile(t *testing.T) { dir := t.TempDir() path := filepath.Join(dir, "test.json") @@ -96,6 +176,7 @@ func TestLoadPersonaFromJSONFile(t *testing.T) { "display_name": "Test Persona", "identity": "You are a test persona.\nMulti-line identity works.", "focus": ["testing", "validation"], + "ignore": ["nothing"], "severity": { "major": "Big problems", @@ -130,22 +211,38 @@ func TestLoadPersonaFromJSONFile(t *testing.T) { func TestLoadPersonaValidation(t *testing.T) { tests := []struct { name string - json string + content string + ext string wantErr string }{ { - name: "missing name", - json: `{"identity": "test"}`, + name: "missing name yaml", + content: "identity: test\n", + ext: ".yaml", wantErr: "name is required", }, { - name: "missing identity", - json: `{"name": "test"}`, + name: "missing identity yaml", + content: "name: test\n", + ext: ".yaml", wantErr: "identity is required", }, { - name: "display_name defaults to name", - json: `{"name": "test", "identity": "test identity"}`, + name: "missing name json", + content: `{"identity": "test"}`, + ext: ".json", + wantErr: "name is required", + }, + { + name: "missing identity json", + content: `{"name": "test"}`, + ext: ".json", + wantErr: "identity is required", + }, + { + name: "display_name defaults to name", + content: "name: test\nidentity: test identity\n", + ext: ".yaml", // No error expected - should succeed }, } @@ -153,8 +250,8 @@ func TestLoadPersonaValidation(t *testing.T) { for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { dir := t.TempDir() - path := filepath.Join(dir, "test.json") - if err := os.WriteFile(path, []byte(tt.json), 0644); err != nil { + path := filepath.Join(dir, "test"+tt.ext) + if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil { t.Fatalf("failed to write test file: %v", err) } @@ -184,12 +281,25 @@ func TestLoadPersonaValidation(t *testing.T) { } func TestLoadPersonaFileNotFound(t *testing.T) { - _, err := LoadPersona("/nonexistent/path/persona.json") + _, err := LoadPersona("/nonexistent/path/persona.yaml") if err == nil { t.Error("expected error for nonexistent file") } } +func TestLoadPersonaInvalidYAML(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "invalid.yaml") + if err := os.WriteFile(path, []byte("not valid yaml:\n - [broken"), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + _, err := LoadPersona(path) + if err == nil { + t.Error("expected error for invalid YAML") + } +} + func TestLoadPersonaInvalidJSON(t *testing.T) { dir := t.TempDir() path := filepath.Join(dir, "invalid.json") @@ -203,6 +313,38 @@ func TestLoadPersonaInvalidJSON(t *testing.T) { } } +func TestLoadPersonaCaseInsensitiveExtension(t *testing.T) { + tests := []struct { + name string + ext string + }{ + {"lowercase yaml", ".yaml"}, + {"uppercase YAML", ".YAML"}, + {"mixed case Yaml", ".Yaml"}, + {"lowercase yml", ".yml"}, + {"uppercase YML", ".YML"}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "test"+tt.ext) + content := "name: test\nidentity: test identity\n" + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + p, err := LoadPersona(path) + if err != nil { + t.Fatalf("LoadPersona failed for extension %s: %v", tt.ext, err) + } + if p.Name != "test" { + t.Errorf("Name = %q, want %q", p.Name, "test") + } + }) + } +} + func TestCapitalizeFirst(t *testing.T) { tests := []struct { input string @@ -237,3 +379,400 @@ func TestListBuiltinPersonasReturnsEmptySlice(t *testing.T) { t.Error("ListBuiltinPersonas should return empty slice, not nil") } } + +func TestYAMLMultilineStrings(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "multiline.yaml") + + // Test literal block scalar (|) which preserves newlines + content := `name: multiline +display_name: Multiline Test +identity: | + First line. + Second line. + Third line. +focus: + - item one +ignore: [] +severity: + major: Major issue + minor: Minor issue + nit: Nit +` + + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + p, err := LoadPersona(path) + if err != nil { + t.Fatalf("LoadPersona failed: %v", err) + } + + // Literal block scalar preserves newlines + if !strings.Contains(p.Identity, "\n") { + t.Error("Identity should contain newlines from literal block scalar") + } + if !strings.Contains(p.Identity, "Second line") { + t.Error("Identity should contain 'Second line'") + } +} + +func TestYAMLComments(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "comments.yaml") + + content := `# This is a comment +name: commented # inline comment +display_name: Commented Persona +# Another comment +identity: Test identity +focus: + - item # comment after item +ignore: [] +severity: + major: Major + minor: Minor + nit: Nit +` + + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + p, err := LoadPersona(path) + if err != nil { + t.Fatalf("LoadPersona failed: %v", err) + } + + // Comments should be ignored + if p.Name != "commented" { + t.Errorf("Name = %q, want %q", p.Name, "commented") + } + if p.Focus[0] != "item" { + t.Errorf("Focus[0] = %q, want %q", p.Focus[0], "item") + } +} + +func TestYAMLDeeplyNestedRejection(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "deeply-nested.yaml") + + // Build a deeply nested YAML structure that exceeds MaxYAMLDepth (20). + // Each level adds 2 to the depth count (key + value mapping). + var sb strings.Builder + sb.WriteString("name: test\nidentity: test\nnested:\n") + indent := " " + for i := 0; i < 25; i++ { + sb.WriteString(strings.Repeat(indent, i+1)) + sb.WriteString(fmt.Sprintf("level%d:\n", i)) + } + sb.WriteString(strings.Repeat(indent, 26)) + sb.WriteString("value: too-deep\n") + + if err := os.WriteFile(path, []byte(sb.String()), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + _, err := LoadPersona(path) + if err == nil { + t.Error("expected error for deeply nested YAML, got nil") + } + if !strings.Contains(err.Error(), "nesting depth exceeds") { + t.Errorf("error = %q, want containing 'nesting depth exceeds'", err.Error()) + } +} + +func TestYAMLFileSizeLimit(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "huge.yaml") + + // Create a file larger than MaxPersonaFileSize (64 KB) + content := "name: test\nidentity: " + strings.Repeat("x", MaxPersonaFileSize+1) + "\n" + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + _, err := LoadPersona(path) + if err == nil { + t.Error("expected error for oversized file, got nil") + } + if !strings.Contains(err.Error(), "exceeds maximum size") { + t.Errorf("error = %q, want containing 'exceeds maximum size'", err.Error()) + } +} + +func TestYAMLAliasCycleDetection(t *testing.T) { + // Test that our checkYAMLDepth function handles alias cycles gracefully + // by using the seen map to prevent infinite recursion. + // We test this directly because go-yaml's parser handles most cycles + // at parse time, but we need to ensure our checker is robust. + + // Create a node structure where an alias points to a parent node, + // simulating what could happen with malicious input that bypasses + // go-yaml's cycle detection. + parent := &yaml.Node{ + Kind: yaml.MappingNode, + Content: []*yaml.Node{ + {Kind: yaml.ScalarNode, Value: "name"}, + {Kind: yaml.ScalarNode, Value: "test"}, + {Kind: yaml.ScalarNode, Value: "nested"}, + }, + } + + // Create a child that aliases back to the parent (artificial cycle) + aliasToParent := &yaml.Node{ + Kind: yaml.AliasNode, + Alias: parent, + } + parent.Content = append(parent.Content, aliasToParent) + + nodeCount := 0 + seen := make(map[*yaml.Node]struct{}) + + // This should NOT hang or stack overflow - the seen map prevents infinite recursion + err := checkYAMLDepth(parent, 0, MaxYAMLDepth, MaxYAMLNodes, seen, &nodeCount) + if err != nil { + t.Errorf("unexpected error traversing cyclic structure: %v", err) + } + + // Verify we tracked the parent in the seen map + if _, ok := seen[parent]; !ok { + t.Error("parent node not tracked in seen map") + } +} + +func TestYAMLMultiDocumentRejection(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "multi.yaml") + + // Multi-document YAML (documents separated by ---) + content := `name: first +identity: first document +--- +name: second +identity: second document +` + if err := os.WriteFile(path, []byte(content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + _, err := LoadPersona(path) + if err == nil { + t.Error("expected error for multi-document YAML, got nil") + } + if !strings.Contains(err.Error(), "multi-document") { + t.Errorf("error = %q, want containing 'multi-document'", err.Error()) + } +} + +func TestYAMLNodeCountLimit(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "wide.yaml") + + // Build a YAML structure that's shallow but wide - many keys at the same level + // to test the node count limit (should exceed MaxYAMLNodes = 1000) + var sb strings.Builder + sb.WriteString("name: test\nidentity: test\n") + for i := 0; i < 600; i++ { + sb.WriteString(fmt.Sprintf("key%d: value%d\n", i, i)) + } + + if err := os.WriteFile(path, []byte(sb.String()), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + _, err := LoadPersona(path) + if err == nil { + t.Error("expected error for wide YAML exceeding node count, got nil") + } + if !strings.Contains(err.Error(), "node count exceeds") { + t.Errorf("error = %q, want containing 'node count exceeds'", err.Error()) + } +} + +func TestCheckYAMLDepthCycleDetectionDirect(t *testing.T) { + // Direct test of cycle detection in checkYAMLDepth by creating + // a node structure with an artificial cycle. + // This tests the seen map logic independent of go-yaml's parsing. + node := &yaml.Node{ + Kind: yaml.MappingNode, + Content: []*yaml.Node{ + {Kind: yaml.ScalarNode, Value: "key"}, + {Kind: yaml.ScalarNode, Value: "value"}, + }, + } + + // Create a cycle by making a child reference the parent + cycleChild := &yaml.Node{ + Kind: yaml.AliasNode, + Alias: node, // Points back to the parent + } + node.Content = append(node.Content, + &yaml.Node{Kind: yaml.ScalarNode, Value: "cyclic"}, + cycleChild, + ) + + nodeCount := 0 + seen := make(map[*yaml.Node]struct{}) + err := checkYAMLDepth(node, 0, MaxYAMLDepth, MaxYAMLNodes, seen, &nodeCount) + + // Should complete without infinite recursion due to cycle detection + if err != nil { + t.Errorf("unexpected error: %v", err) + } + // The seen map should contain multiple entries + if len(seen) < 2 { + t.Errorf("seen map has %d entries, expected at least 2", len(seen)) + } +} + +func TestListBuiltinPersonasSortedOrder(t *testing.T) { + names := ListBuiltinPersonas() + if len(names) < 2 { + t.Skip("need at least 2 personas to test ordering") + } + + // Verify the list is sorted + for i := 1; i < len(names); i++ { + if names[i-1] > names[i] { + t.Errorf("ListBuiltinPersonas not sorted: %q > %q", names[i-1], names[i]) + } + } +} + +func TestYAMLUnknownFieldsRejected(t *testing.T) { + tests := []struct { + name string + content string + wantErr string + }{ + { + name: "unknown top-level field", + content: `name: test +identity: test identity +unknown_field: should fail +`, + wantErr: "unknown_field", + }, + { + name: "typo in field name", + content: `name: test +identiy: typo should fail +`, + wantErr: "identiy", + }, + { + name: "unknown field in severity", + content: `name: test +identity: test +severity: + major: Major + minro: typo +`, + wantErr: "minro", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "unknown.yaml") + if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + _, err := LoadPersona(path) + if err == nil { + t.Errorf("expected error for unknown field %q, got nil", tt.wantErr) + return + } + if !strings.Contains(err.Error(), tt.wantErr) { + t.Errorf("error = %q, want containing %q", err.Error(), tt.wantErr) + } + }) + } +} + +func TestJSONUnknownFieldsRejected(t *testing.T) { + tests := []struct { + name string + content string + wantErr string + }{ + { + name: "unknown top-level field", + content: `{ + "name": "test", + "identity": "test identity", + "unknown_field": "should fail" + }`, + wantErr: "unknown_field", + }, + { + name: "typo in field name", + content: `{ + "name": "test", + "identiy": "typo should fail" + }`, + wantErr: "identiy", + }, + { + name: "unknown field in severity", + content: `{ + "name": "test", + "identity": "test", + "severity": { + "major": "ok", + "miner": "typo" + } + }`, + wantErr: "miner", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "test.json") + if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + _, err := LoadPersona(path) + if err == nil { + t.Fatal("expected error for unknown field, got nil") + } + if !strings.Contains(err.Error(), tt.wantErr) { + t.Errorf("error = %q, want to contain %q", err.Error(), tt.wantErr) + } + }) + } +} + +func TestLoadPersonaSymlink(t *testing.T) { + // Create a regular persona file + dir := t.TempDir() + realFile := filepath.Join(dir, "real.yaml") + content := `name: test +identity: test identity +` + if err := os.WriteFile(realFile, []byte(content), 0644); err != nil { + t.Fatalf("failed to write test file: %v", err) + } + + // Create a symlink to it + symlink := filepath.Join(dir, "link.yaml") + if err := os.Symlink(realFile, symlink); err != nil { + t.Fatalf("failed to create symlink: %v", err) + } + + // LoadPersona should work via symlink + p, err := LoadPersona(symlink) + if err != nil { + t.Fatalf("LoadPersona via symlink failed: %v", err) + } + if p.Name != "test" { + t.Errorf("Name = %q, want %q", p.Name, "test") + } +} diff --git a/review/personas/architect.json b/review/personas/architect.json deleted file mode 100644 index 86541c1..0000000 --- a/review/personas/architect.json +++ /dev/null @@ -1,26 +0,0 @@ -{ - "name": "architect", - "display_name": "Software Architect", - "identity": "You are a software architect reviewing code for design quality.\n\nYour expertise:\n- Design patterns and anti-patterns\n- Code organization and module boundaries\n- API design and contracts\n- Testability and dependency injection\n- Consistency with existing architecture\n- Technical debt identification", - "focus": [ - "Design pattern violations or misuse", - "Module boundary violations (inappropriate coupling)", - "API design issues (unclear contracts, leaky abstractions)", - "Testability problems (hidden dependencies, god objects)", - "Inconsistency with existing codebase patterns", - "Unnecessary complexity or over-engineering", - "Missing abstractions or premature abstraction" - ], - "ignore": [ - "Security vulnerabilities (security persona handles these)", - "Performance micro-optimizations", - "Code style and formatting", - "Documentation typos", - "Test implementation details" - ], - "severity": { - "major": "Architectural violations that will cause maintenance problems or make the codebase harder to evolve", - "minor": "Design issues that reduce clarity or testability but don't block progress", - "nit": "Minor pattern deviations or style preferences" - } -} diff --git a/review/personas/architect.yaml b/review/personas/architect.yaml new file mode 100644 index 0000000..f957ed6 --- /dev/null +++ b/review/personas/architect.yaml @@ -0,0 +1,37 @@ +# Software Architect Persona +# Focuses on design quality, patterns, and code organization + +name: architect +display_name: Software Architect + +identity: | + You are a software architect reviewing code for design quality. + + Your expertise: + - Design patterns and anti-patterns + - Code organization and module boundaries + - API design and contracts + - Testability and dependency injection + - Consistency with existing architecture + - Technical debt identification + +focus: + - Design pattern violations or misuse + - Module boundary violations (inappropriate coupling) + - API design issues (unclear contracts, leaky abstractions) + - Testability problems (hidden dependencies, god objects) + - Inconsistency with existing codebase patterns + - Unnecessary complexity or over-engineering + - Missing abstractions or premature abstraction + +ignore: + - Security vulnerabilities (security persona handles these) + - Performance micro-optimizations + - Code style and formatting + - Documentation typos + - Test implementation details + +severity: + major: "Architectural violations that will cause maintenance problems or make the codebase harder to evolve" + minor: "Design issues that reduce clarity or testability but don't block progress" + nit: "Minor pattern deviations or style preferences" diff --git a/review/personas/docs.json b/review/personas/docs.json deleted file mode 100644 index 1829667..0000000 --- a/review/personas/docs.json +++ /dev/null @@ -1,26 +0,0 @@ -{ - "name": "docs", - "display_name": "Documentation Reviewer", - "identity": "You are a documentation specialist reviewing code for clarity and documentation quality.\n\nYour expertise:\n- API documentation and examples\n- Code comments and their accuracy\n- Error message clarity\n- README and guide quality\n- Naming clarity and self-documenting code", - "focus": [ - "Missing or outdated documentation", - "Unclear or misleading comments", - "Poor error messages (cryptic, unhelpful, missing context)", - "Confusing naming (functions, variables, types)", - "Missing examples for complex APIs", - "Inconsistent terminology", - "Documentation that contradicts the code" - ], - "ignore": [ - "Security vulnerabilities", - "Performance issues", - "Design patterns", - "Test coverage", - "Code style (unless it affects readability)" - ], - "severity": { - "major": "Documentation that actively misleads or missing docs for critical functionality", - "minor": "Unclear documentation or poor error messages that will confuse users", - "nit": "Minor clarity improvements or typo fixes" - } -} diff --git a/review/personas/docs.yaml b/review/personas/docs.yaml new file mode 100644 index 0000000..1c59b32 --- /dev/null +++ b/review/personas/docs.yaml @@ -0,0 +1,36 @@ +# Documentation Reviewer Persona +# Focuses on clarity, documentation quality, and self-documenting code + +name: docs +display_name: Documentation Reviewer + +identity: | + You are a documentation specialist reviewing code for clarity and documentation quality. + + Your expertise: + - API documentation and examples + - Code comments and their accuracy + - Error message clarity + - README and guide quality + - Naming clarity and self-documenting code + +focus: + - Missing or outdated documentation + - Unclear or misleading comments + - Poor error messages (cryptic, unhelpful, missing context) + - Confusing naming (functions, variables, types) + - Missing examples for complex APIs + - Inconsistent terminology + - Documentation that contradicts the code + +ignore: + - Security vulnerabilities + - Performance issues + - Design patterns + - Test coverage + - Code style (unless it affects readability) + +severity: + major: "Documentation that actively misleads or missing docs for critical functionality" + minor: "Unclear documentation or poor error messages that will confuse users" + nit: "Minor clarity improvements or typo fixes" diff --git a/review/personas/security.json b/review/personas/security.json deleted file mode 100644 index fd159f7..0000000 --- a/review/personas/security.json +++ /dev/null @@ -1,26 +0,0 @@ -{ - "name": "security", - "display_name": "Security Specialist", - "identity": "You are a security specialist reviewing code for vulnerabilities.\n\nYour expertise:\n- OWASP Top 10 vulnerabilities\n- Injection attacks (SQL, command, path traversal, template)\n- Authentication and authorization patterns\n- Secrets management and exposure risks\n- Race conditions with security implications\n- Event sourcing attack vectors (replay attacks, event injection)", - "focus": [ - "Injection attacks (SQL, command, path traversal, template injection)", - "Authentication and authorization gaps or bypasses", - "Secrets exposure (hardcoded credentials, tokens in logs, config leaks)", - "Input validation failures (unsanitized input, unsafe deserialization)", - "Race conditions that could be exploited", - "Cryptographic weaknesses (weak algorithms, improper key handling)", - "Information disclosure through error messages or logs" - ], - "ignore": [ - "Code style and naming conventions", - "Performance optimizations (unless security-related)", - "Documentation quality", - "General code quality or readability", - "Test coverage" - ], - "severity": { - "major": "Exploitable vulnerabilities: auth bypass, injection, data exfiltration, privilege escalation, RCE", - "minor": "Defense-in-depth issues: missing rate limiting, verbose errors, weak input validation", - "nit": "Theoretical risks with low exploitability or impact" - } -} diff --git a/review/personas/security.yaml b/review/personas/security.yaml new file mode 100644 index 0000000..1dec792 --- /dev/null +++ b/review/personas/security.yaml @@ -0,0 +1,37 @@ +# Security Specialist Persona +# Focuses on vulnerabilities, auth issues, and security best practices + +name: security +display_name: Security Specialist + +identity: | + You are a security specialist reviewing code for vulnerabilities. + + Your expertise: + - OWASP Top 10 vulnerabilities + - Injection attacks (SQL, command, path traversal, template) + - Authentication and authorization patterns + - Secrets management and exposure risks + - Race conditions with security implications + - Event sourcing attack vectors (replay attacks, event injection) + +focus: + - Injection attacks (SQL, command, path traversal, template injection) + - Authentication and authorization gaps or bypasses + - Secrets exposure (hardcoded credentials, tokens in logs, config leaks) + - Input validation failures (unsanitized input, unsafe deserialization) + - Race conditions that could be exploited + - Cryptographic weaknesses (weak algorithms, improper key handling) + - Information disclosure through error messages or logs + +ignore: + - Code style and naming conventions + - Performance optimizations (unless security-related) + - Documentation quality + - General code quality or readability + - Test coverage + +severity: + major: "Exploitable vulnerabilities: auth bypass, injection, data exfiltration, privilege escalation, RCE" + minor: "Defense-in-depth issues: missing rate limiting, verbose errors, weak input validation" + nit: "Theoretical risks with low exploitability or impact"