Compare commits

..

15 Commits

Author SHA1 Message Date
Rodin 27a9be38bc fix: address PR #63 review findings
1. Refactor err2 to use scoped loadErr variable (MINOR - sonnet-review-bot)
   The else-if branches are mutually exclusive, so the error variable
   should be scoped inside the block, not declared outside with err2.

2. Sanitize DisplayName before embedding in Markdown (MINOR - security-review-bot)
   Remote persona metadata is untrusted. Added sanitizeMarkdownText() to
   escape Markdown special characters and strip control characters.
   Applied to both the header title and the footer attribution.

3. Document YAML DoS mitigations (MINOR - security-review-bot)
   Added comprehensive comment in remote_persona.go explaining existing
   defenses: file size limit, file count cap, depth limit, node count cap,
   and alias cycle detection. These collectively mitigate billion-laughs
   and stack exhaustion attacks.
2026-05-10 20:54:20 -07:00
Rodin 5fac8bc505 fix: address PR #62 review findings
CI / test (pull_request) Successful in 16s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 27s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m5s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m40s
- Remove duplicate flag.Parse() call
- Fix nil map panic in LoadRemotePersonas error path by assigning
  empty map when LoadRemotePersonas returns an error
- Tighten isNotFoundError to only check HTTP 404 (remove broad
  'not found' substring check to avoid false positives)
- Clean up personaErr variable scope using narrower-scoped err variables
- Add proper doc comment to LoadRemotePersonasFromPath (Go convention)
- Add file count cap (50 files) in LoadRemotePersonasFromPath to
  prevent resource exhaustion from repos with thousands of small files
- Update test expectation for tightened isNotFoundError
2026-05-10 20:44:24 -07:00
Rodin 2f8d047ef2 feat: load personas from target repo .review-bot/personas/
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 8m12s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 8m15s
CI / test (pull_request) Successful in 15s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Failing after 42s
Adds support for repository-specific personas. When --persona is
specified, review-bot now:

1. Checks the target repo's .review-bot/personas/<name>.yaml directory
2. Falls back to built-in persona if not found in repo

This allows repos to define domain-specific personas (trading, regulatory,
etc.) or override built-in personas with project-specific rules, without
requiring changes to CI configuration.

Implementation:
- New review.PersonaFetcher interface for abstracting Gitea API access
- review.LoadRemotePersonas() with graceful fallback on 404
- review.MergePersonas() for combining remote and built-in personas
- giteaFetcher adapter in main.go to bridge gitea.Client

The feature follows a partial-success model: invalid YAML files or
network errors for individual persona files are logged and skipped,
allowing other valid personas to load.

Closes #60
2026-05-10 19:05:55 -07:00
aweiker 593b249e09 Merge pull request 'feat: add YAML support for persona files' (#58) from issue-57 into main
CI / test (push) Successful in 9m31s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #58
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
Reviewed-by: Aaron Weiker <aaron@weiker.org>
2026-05-11 01:39:43 +00:00
Rodin 10cd6203d4 fix: address remaining PR #58 review findings
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 9m31s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m54s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 10m40s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 11m27s
1. Remove dead JSON fallback in LoadBuiltinPersona
   - The embed directive only includes *.yaml files
   - JSON fallback code could never succeed
   - Simplified function to only try YAML

2. JSON parsing now rejects unknown fields
   - Switched from json.Unmarshal to json.Decoder
   - DisallowUnknownFields() matches YAML's KnownFields(true)
   - Added test coverage for JSON unknown field rejection

3. Documented symlink support in LoadPersona
   - os.Stat follows symlinks, so symlinks to regular files work
   - Added doc comment explaining the behavior
   - Added test for symlink support
2026-05-10 17:53:42 -07:00
Aaron Weiker 26f326cf51 fix: add YAML alias cycle detection and multi-document rejection
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 9m34s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m53s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 10m23s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 11m24s
Address security review findings:

MAJOR: Add cycle detection to checkYAMLDepth using a visited set
(seen map[*yaml.Node]struct{}) to prevent infinite recursion from
crafted YAML with self-referential aliases.

MINOR fixes:
- Add MaxYAMLNodes (1000) limit as defense-in-depth against
  wide-but-shallow structures that bypass depth limits
- Increment depth when following alias targets (was incorrectly
  passing same depth, allowing alias chains to bypass depth limit)
- Reject multi-document YAML files instead of silently ignoring
  additional documents (prevents confusing silent data loss)

Tests added:
- TestYAMLAliasCycleDetection: Direct test of cycle detection logic
- TestYAMLMultiDocumentRejection: Verifies multi-doc files rejected
- TestYAMLNodeCountLimit: Verifies wide structures are rejected
- TestCheckYAMLDepthCycleDetectionDirect: Unit test with artificial cycle
2026-05-10 17:12:01 -07:00
Rodin 4fed59ac85 yaml: enable strict field checking to catch typos
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 9m33s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m54s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 10m51s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 11m30s
Addresses PR #58 MINOR finding: YAML decoder now rejects unknown fields.

- Enable KnownFields(true) on YAML decoder to catch typos like
  'focuss' or 'identiy' in persona files
- Since yaml.Node.Decode() doesn't support KnownFields, we now
  do a two-pass decode: first pass checks depth limits, second
  pass decodes with strict field checking
- Add tests for unknown field rejection at top-level and nested levels
2026-05-10 16:50:07 -07:00
Rodin 6035afeea7 fix: address MINOR review findings from c3e8f0f review
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 9m33s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m51s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 11m13s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 11m25s
2026-05-10 16:29:44 -07:00
Rodin c3e8f0f231 fix: address PR review findings
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 9m32s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m53s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 10m52s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 11m0s
MAJOR fixes:
- Remove false security claim about gopkg.in/yaml.v3 having built-in depth protection
- Add explicit YAML depth limiting via yaml.Node API (MaxYAMLDepth=20)
- Add file size limit for persona files (MaxPersonaFileSize=64KB)
- Add test for deeply nested YAML rejection

MINOR fixes:
- Add sort.Strings to ListBuiltinPersonas for deterministic ordering
- Update design doc to reflect actual library used (gopkg.in/yaml.v3)
- Update README: 'Zero dependencies' → 'Minimal dependencies'
- Add test for file size limit
- Add test for sorted persona list
2026-05-10 14:43:31 -07:00
Rodin 7898dd939f feat: add YAML support for persona files (#57)
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 9m33s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 9m55s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 10m32s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 11m0s
- Add gopkg.in/yaml.v3 dependency (approved in CONVENTIONS.md)
- Update parsePersona to detect format by file extension
- Support both .yaml and .yml extensions (case-insensitive)
- Convert built-in personas to YAML format
- Add comprehensive tests for YAML parsing
- Update README with YAML examples and documentation

YAML provides cleaner multi-line strings via literal block scalars
and supports comments, making persona definitions more readable.
JSON remains supported for backwards compatibility.

Closes #57
2026-05-10 14:16:41 -07:00
rodin fededd18ad Merge pull request 'docs: allow approved third-party packages' (#59) from allow-deps into main
CI / test (push) Successful in 15s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
docs: strict dependency allowlist with CI enforcement
2026-05-10 21:07:10 +00:00
Rodin 01cde16d47 fix: validate all deps and improve robustness
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 15s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 34s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m51s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m51s
Addresses GPT review feedback:

1. MAJOR - Test deps now validated: All direct module deps (from go.mod)
   are checked against the allowlist, whether used in prod or tests.

2. MINOR - Prefix match: Uses grep -E with word boundary (^pkg(/|$|$))
   to avoid false positives on similarly-prefixed modules.

3. MINOR - Bash version check: Script now fails early with helpful
   message if Bash < 4 (macOS default). Added shebang: #!/usr/bin/env bash

4. NIT - Removed redundant grep -v '_test' (go list -deps already
   excludes test-only deps without -test flag).
2026-05-10 14:02:06 -07:00
Rodin aeb0c8cb79 fix: enforce Scope column and improve portability
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 14s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 35s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m46s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m2s
Addresses review feedback:

1. MAJOR - Scope enforcement: Script now parses the Scope column and
   ensures 'test only' packages don't appear in non-test code. Uses
   'go list -deps' to check production imports.

2. MINOR - Portability: Replaced 'grep -P' (GNU-only) with awk-based
   parsing that works on macOS/BSD.

3. MINOR - Robustness: Table parsing uses awk to split on '|' and
   extract columns properly, handling whitespace variations.

4. MINOR - Glob safety: Prefix matching now uses parameter expansion
   instead of glob patterns to prevent metacharacter issues.
2026-05-10 13:57:49 -07:00
Rodin 70267b68f4 fix: address review feedback on dependency allowlist
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 14s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 32s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m27s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m28s
Fixes:
- Single source of truth: script now parses allowlist from CONVENTIONS.md
- Fail closed: script exits non-zero if 'go list' fails
- Direct deps only: uses '-f' flag to exclude transitive deps
- Added 'precommit' to .PHONY in Makefile
- Removed unused ALLOWED_PATTERN variable
- Added Scope column to distinguish test-only vs production deps
- Clarified that transitive deps of approved packages are allowed
- Added note that enforcement script parses the table
2026-05-10 13:53:55 -07:00
Rodin 4b96231b32 docs: strict dependency allowlist with CI enforcement
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 15s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 28s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m40s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m48s
STRICT ALLOWLIST policy: Only packages explicitly listed in CONVENTIONS.md
may be imported. No exceptions.

## Changes

- Updates CONVENTIONS.md with strict allowlist language
- Adds scripts/check-deps.sh to enforce the allowlist
- Adds 'make check-deps' and 'make precommit' targets
- CI will fail if any unapproved dependency is detected

## Approved packages

- gopkg.in/yaml.v3 — YAML parsing
- github.com/google/go-cmp — test comparisons

## Process for new dependencies

1. Open a PR that ONLY updates CONVENTIONS.md
2. Requires explicit approval from Aaron
3. After merge, a separate PR may use the package
2026-05-10 13:45:12 -07:00
20 changed files with 1886 additions and 175 deletions
+14 -8
View File
@@ -3,19 +3,25 @@
## Language & Dependencies ## Language & Dependencies
- Target the latest stable Go release. - Target the latest stable Go release.
- Prefer Go standard library; approved third-party packages allowed (see below). - **STRICT ALLOWLIST:** Only packages listed below may be imported. No exceptions.
### Approved Third-Party Packages ### Approved Third-Party Packages
| Package | Use Case | Notes | | Package | Use Case | Scope |
|---------|----------|-------| |---------|----------|-------|
| `gopkg.in/yaml.v3` | YAML parsing | Persona files, config | | `gopkg.in/yaml.v3` | YAML parsing (persona files, config) | production |
| `github.com/google/go-cmp` | Test comparisons | `cmp.Diff` for readable diffs | | `github.com/google/go-cmp` | Test comparisons (`cmp.Diff`) | test only |
To add a new dependency: **Any import not in this table or the Go standard library is forbidden.**
1. Open a PR with justification (why stdlib is insufficient)
2. Package must be well-maintained, widely used, minimal transitive deps Transitive dependencies of approved packages are automatically allowed.
3. Update this table when approved
To request a new dependency:
1. Open a PR that ONLY updates this table
2. Requires explicit approval from Aaron
3. After merge, a separate PR may use the package
*Enforcement: `scripts/check-deps.sh` parses this table — update only here.*
## Error Handling ## Error Handling
+7 -1
View File
@@ -1,4 +1,4 @@
.PHONY: build test test-integration lint clean coverage .PHONY: build test test-integration lint clean coverage check-deps precommit
build: build:
go build -o review-bot ./cmd/review-bot/ go build -o review-bot ./cmd/review-bot/
@@ -12,9 +12,15 @@ test-integration:
lint: lint:
go vet ./... go vet ./...
check-deps:
@./scripts/check-deps.sh
clean: clean:
rm -f review-bot rm -f review-bot
coverage: coverage:
go test -coverprofile=coverage.out ./... go test -coverprofile=coverage.out ./...
go tool cover -func=coverage.out go tool cover -func=coverage.out
# Precommit runs all checks required before pushing
precommit: check-deps lint test
+77 -29
View File
@@ -9,7 +9,7 @@ AI-powered code review bot for Gitea pull requests. Fetches diff + context, send
- **Smart budget**: Automatically trims context to fit model token limits - **Smart budget**: Automatically trims context to fit model token limits
- **Idempotent reviews**: Posts new review, then cleans up stale ones (one review per bot) - **Idempotent reviews**: Posts new review, then cleans up stale ones (one review per bot)
- **Custom prompts**: Load additional instructions from a file (e.g. security-focused review) - **Custom prompts**: Load additional instructions from a file (e.g. security-focused review)
- **Zero dependencies**: Go stdlib only - **Minimal dependencies**: Go stdlib + `gopkg.in/yaml.v3` only
## Quick Start: Composite Action ## Quick Start: Composite Action
@@ -208,7 +208,7 @@ AI Core handles OAuth token management and deployment discovery automatically. M
| `patterns-files` | No | `README.md` | Files/directories to fetch from pattern repos | | `patterns-files` | No | `README.md` | Files/directories to fetch from pattern repos |
| `system-prompt-file` | No | `""` | Local file with additional system prompt instructions | | `system-prompt-file` | No | `""` | Local file with additional system prompt instructions |
| `persona` | No | `""` | Built-in persona name (security, architect, docs) | | `persona` | No | `""` | Built-in persona name (security, architect, docs) |
| `persona-file` | No | `""` | Path to persona JSON file with custom review focus | | `persona-file` | No | `""` | Path to persona file (YAML or JSON) with custom review focus |
| `temperature` | No | `0` | LLM temperature (0 = server default) | | `temperature` | No | `0` | LLM temperature (0 = server default) |
| `timeout` | No | `300` | LLM request timeout in seconds | | `timeout` | No | `300` | LLM request timeout in seconds |
| `dry-run` | No | `false` | Print review to stdout instead of posting | | `dry-run` | No | `false` | Print review to stdout instead of posting |
@@ -408,32 +408,38 @@ Each persona posts independently with its own sentinel, so reviews don't interfe
### Custom Personas ### Custom Personas
Create a JSON file with your domain-specific review focus: Create a YAML file with your domain-specific review focus:
```json ```yaml
// .review/personas/trading.json # .review/personas/trading.yaml
{ name: trading
"name": "trading", display_name: Trading Domain Expert
"display_name": "Trading Domain Expert",
"identity": "You are a trading systems expert reviewing code for correctness.\n\nYour expertise:\n- Order lifecycle and state machines\n- Fill handling and partial fills\n- Position tracking and P&L calculations\n- Event sourcing invariants", identity: |
"focus": [ You are a trading systems expert reviewing code for correctness.
"Order state machine correctness",
"Fill handling edge cases (partial, overfill)", Your expertise:
"Position and P&L calculation accuracy", - Order lifecycle and state machines
"Event replay determinism", - Fill handling and partial fills
"Decimal precision for money" - Position tracking and P&L calculations
], - Event sourcing invariants
"ignore": [
"Code style", focus:
"General performance", - Order state machine correctness
"Documentation formatting" - Fill handling edge cases (partial, overfill)
], - Position and P&L calculation accuracy
"severity": { - Event replay determinism
"major": "Bugs that cause incorrect positions, fills, or money calculations", - Decimal precision for money
"minor": "Edge cases that could cause issues under unusual conditions",
"nit": "Clarity improvements for domain logic" ignore:
} - Code style
} - General performance
- Documentation formatting
severity:
major: "Bugs that cause incorrect positions, fills, or money calculations"
minor: "Edge cases that could cause issues under unusual conditions"
nit: "Clarity improvements for domain logic"
``` ```
Use it in CI: Use it in CI:
@@ -442,17 +448,59 @@ Use it in CI:
- uses: rodin/review-bot/.gitea/actions/review@v1 - uses: rodin/review-bot/.gitea/actions/review@v1
with: with:
reviewer-name: trading reviewer-name: trading
persona-file: .review/personas/trading.json persona-file: .review/personas/trading.yaml
... ...
``` ```
YAML is the recommended format for personas because it supports:
- Multi-line strings with `|` blocks (cleaner identity definitions)
- Comments for documentation
- More readable arrays and nested structures
JSON is also supported for backwards compatibility—just use `.json` extension.
### Repository Personas (Auto-Discovery)
Repositories can ship their own personas in `.review-bot/personas/`. When you specify `--persona <name>`, review-bot will:
1. **Try to load from the target repo** — Checks `.review-bot/personas/<name>.yaml` (or `.yml`)
2. **Fall back to built-in** — If not found in repo, uses the built-in persona
This lets each repo define domain-specific personas without modifying CI config:
```
my-trading-repo/
├── .review-bot/
│ └── personas/
│ ├── trading.yaml # Custom trading persona
│ └── regulatory.yaml # Compliance-focused reviews
├── lib/
└── ...
```
```yaml
# CI config (no persona-file needed)
- uses: rodin/review-bot/.gitea/actions/review@v1
with:
reviewer-name: trading
persona: trading # Will find .review-bot/personas/trading.yaml
...
```
**Priority order:**
1. Repo's `.review-bot/personas/<name>.yaml`
2. Built-in persona with matching name
3. Error if neither exists
This allows repos to override built-in personas (e.g., a custom `security` persona that adds project-specific rules) while keeping the simple `persona: security` syntax in CI.
### Persona vs system-prompt-file ### Persona vs system-prompt-file
| Feature | `persona` / `persona-file` | `system-prompt-file` | | Feature | `persona` / `persona-file` | `system-prompt-file` |
|---------|---------------------------|----------------------| |---------|---------------------------|----------------------|
| Replaces base prompt | Yes | No (appends) | | Replaces base prompt | Yes | No (appends) |
| Structured format | Yes (JSON) | No (freeform) | | Structured format | Yes (YAML/JSON) | No (freeform) |
| Focus/ignore lists | Yes | Manual | | Focus/ignore lists | Yes | Manual |
| Severity calibration | Yes | Manual | | Severity calibration | Yes | Manual |
| Header display name | Yes | No | | Header display name | Yes | No |
+67 -23
View File
@@ -79,7 +79,6 @@ func main() {
aicoreAPIURL := flag.String("aicore-api-url", envOrDefault("AICORE_API_URL", ""), "SAP AI Core API URL (for provider=aicore)") aicoreAPIURL := flag.String("aicore-api-url", envOrDefault("AICORE_API_URL", ""), "SAP AI Core API URL (for provider=aicore)")
aicoreResourceGroup := flag.String("aicore-resource-group", envOrDefault("AICORE_RESOURCE_GROUP", "default"), "SAP AI Core resource group (for provider=aicore)") aicoreResourceGroup := flag.String("aicore-resource-group", envOrDefault("AICORE_RESOURCE_GROUP", "default"), "SAP AI Core resource group (for provider=aicore)")
flag.Parse()
flag.Parse() flag.Parse()
if *versionFlag { if *versionFlag {
@@ -116,29 +115,9 @@ func main() {
os.Exit(1) os.Exit(1)
} }
// Load persona if specified // Persona loading is deferred until after giteaClient is initialized,
// so we can try loading from the target repo first.
var persona *review.Persona var persona *review.Persona
if *personaName != "" {
var err error
persona, err = review.LoadBuiltinPersona(*personaName)
if err != nil {
slog.Error("failed to load persona", "persona", *personaName, "error", err)
os.Exit(1)
}
slog.Info("loaded built-in persona", "persona", persona.Name, "display", persona.DisplayName)
} else if *personaFile != "" {
resolvedPath, err := validateWorkspacePath(*personaFile, "persona-file")
if err != nil {
slog.Error("invalid persona-file path", "error", err)
os.Exit(1)
}
persona, err = review.LoadPersona(resolvedPath)
if err != nil {
slog.Error("failed to load persona file", "file", *personaFile, "error", err)
os.Exit(1)
}
slog.Info("loaded persona from file", "file", *personaFile, "persona", persona.Name)
}
// Validate reviewer-name: only safe characters allowed in sentinel // Validate reviewer-name: only safe characters allowed in sentinel
if err := validateReviewerName(*reviewerName); err != nil { if err := validateReviewerName(*reviewerName); err != nil {
@@ -196,6 +175,45 @@ func main() {
ctx, cancel := context.WithTimeout(context.Background(), overallTimeout) ctx, cancel := context.WithTimeout(context.Background(), overallTimeout)
defer cancel() defer cancel()
// Load persona: try remote repo first, then fall back to built-in
if *personaName != "" {
// Try loading from target repo's .review-bot/personas/ directory
fetcher := &giteaFetcher{client: giteaClient}
remotePersonas, err := review.LoadRemotePersonas(ctx, fetcher, owner, repoName)
if err != nil {
slog.Warn("could not load remote personas", "repo", fmt.Sprintf("%s/%s", owner, repoName), "error", err)
// Assign empty map so the lookup below doesn't panic
remotePersonas = map[string]*review.Persona{}
}
if p, ok := remotePersonas[*personaName]; ok {
persona = p
slog.Info("loaded persona from target repo", "persona", persona.Name, "display", persona.DisplayName)
} else {
// Fall back to built-in persona
var err error
persona, err = review.LoadBuiltinPersona(*personaName)
if err != nil {
slog.Error("failed to load persona", "persona", *personaName, "error", err)
os.Exit(1)
}
slog.Info("loaded built-in persona", "persona", persona.Name, "display", persona.DisplayName)
}
} else if *personaFile != "" {
resolvedPath, err := validateWorkspacePath(*personaFile, "persona-file")
if err != nil {
slog.Error("invalid persona-file path", "error", err)
os.Exit(1)
}
loadedPersona, loadErr := review.LoadPersona(resolvedPath)
if loadErr != nil {
slog.Error("failed to load persona file", "file", *personaFile, "error", loadErr)
os.Exit(1)
}
persona = loadedPersona
slog.Info("loaded persona from file", "file", *personaFile, "persona", persona.Name)
}
slog.Info("reviewing pull request", "pr", prNumber, "repo", fmt.Sprintf("%s/%s", owner, repoName)) slog.Info("reviewing pull request", "pr", prNumber, "repo", fmt.Sprintf("%s/%s", owner, repoName))
// Step 1: Fetch PR metadata // Step 1: Fetch PR metadata
@@ -783,3 +801,29 @@ func shouldSkipStaleReview(evaluatedSHA, currentSHA string) bool {
} }
return evaluatedSHA != currentSHA return evaluatedSHA != currentSHA
} }
// giteaFetcher adapts gitea.Client to review.PersonaFetcher interface.
type giteaFetcher struct {
client *gitea.Client
}
func (f *giteaFetcher) ListContents(ctx context.Context, owner, repo, path string) ([]review.ContentEntry, error) {
entries, err := f.client.ListContents(ctx, owner, repo, path)
if err != nil {
return nil, err
}
// Convert gitea.ContentEntry to review.ContentEntry
result := make([]review.ContentEntry, len(entries))
for i, e := range entries {
result[i] = review.ContentEntry{
Name: e.Name,
Path: e.Path,
Type: e.Type,
}
}
return result, nil
}
func (f *giteaFetcher) GetFileContent(ctx context.Context, owner, repo, filepath string) (string, error) {
return f.client.GetFileContent(ctx, owner, repo, filepath)
}
+108
View File
@@ -0,0 +1,108 @@
# Design: YAML Support for Persona Files (#57)
## Problem
JSON is awkward for persona files that contain multi-line text (identity, severity descriptions). YAML supports cleaner multi-line strings and comments, improving readability and maintainability.
## Constraints
- Backwards compatibility: existing JSON personas must continue to work
- Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
- Consistency: use `.yaml` extension (not `.yml`)
- Library: use `gopkg.in/yaml.v3` (approved in CONVENTIONS.md) with explicit depth limiting
## Proposed Approach
1. **Update `parsePersona`** to detect format from file extension
2. **Add YAML parsing** with explicit depth limit (defense in depth)
3. **Keep JSON as fallback** for files without `.yaml`/`.yml` extension
4. **Convert built-in personas** to YAML format
5. **Update embed directive** to include both formats
### File Extension Detection
```go
func parsePersona(data []byte, source string) (*Persona, error) {
isYAML := strings.HasSuffix(source, ".yaml") || strings.HasSuffix(source, ".yml")
if isYAML {
return parseYAML(data, source)
}
return parseJSON(data, source)
}
```
### YAML Parsing with Depth Protection
```go
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
var node yaml.Node
dec := yaml.NewDecoder(bytes.NewReader(data))
if err := dec.Decode(&node); err != nil {
return err
}
if err := checkYAMLDepth(&node, 0, maxDepth); err != nil {
return err
}
return node.Decode(out)
}
func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
if depth > maxDepth {
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
}
// Handle alias nodes by following the Alias pointer
if node.Kind == yaml.AliasNode && node.Alias != nil {
return checkYAMLDepth(node.Alias, depth, maxDepth)
}
for _, child := range node.Content {
if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil {
return err
}
}
return nil
}
```
The `gopkg.in/yaml.v3` library does not have built-in depth protection, so we implement explicit depth checking by first decoding into a `yaml.Node`, walking the tree to verify depth (including alias resolution), then decoding into the target struct.
## State/Data Model
No new state. Same `Persona` struct, just different parsing.
## Error Cases
| Error | Handling |
|-------|----------|
| Invalid YAML syntax | Return parse error with source file |
| Deeply nested YAML | Library rejects (v1.16.0+ fix) |
| Unknown extension | Fall back to JSON parsing |
| Missing required fields | Validation rejects after parse |
## Edge Cases
- File with `.json` extension but YAML content → JSON parse fails, user sees error
- File with no extension → defaults to JSON
- Embedded persona reference like `builtin:security` → detect by embed path (`personas/X.yaml`)
## Testing Strategy
1. Unit tests for YAML parsing (valid, invalid, deeply nested)
2. Unit tests for extension detection
3. Integration test for built-in personas (now YAML)
4. Backwards compat test: verify JSON still works for external files
## Completion Checklist
1. [ ] `go-yaml` dependency added at v1.16.0+
2. [ ] Extension detection uses case-insensitive comparison
3. [ ] YAML parse errors include source file name
4. [ ] JSON parsing still works for `.json` files
5. [ ] Built-in personas converted to YAML with readable multi-line strings
6. [ ] Embed directive updated to include `*.yaml`
7. [ ] Test for deeply nested YAML rejection
8. [ ] All existing tests pass
## Open Questions
- Should we support both `.yaml` AND `.yml`? Issue says `.yaml` only for consistency, but some users expect `.yml`. **Decision:** Support both for reading, recommend `.yaml` in docs.
- Should we add a "format" field to detect mismatched extension/content? **Decision:** No, keep it simple. Extension determines format.
+2
View File
@@ -1,3 +1,5 @@
module gitea.weiker.me/rodin/review-bot module gitea.weiker.me/rodin/review-bot
go 1.26.2 go 1.26.2
require gopkg.in/yaml.v3 v3.0.1
+4
View File
@@ -0,0 +1,4 @@
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
+27 -5
View File
@@ -2,6 +2,7 @@ package review
import ( import (
"fmt" "fmt"
"regexp"
"strings" "strings"
) )
@@ -22,10 +23,29 @@ func GiteaEvent(verdict string) string {
} }
} }
// markdownSpecialChars matches characters that have special meaning in Markdown.
// We escape these to prevent untrusted input from breaking formatting.
// Uses a quoted string since raw strings can't contain backticks.
var markdownSpecialChars = regexp.MustCompile("([\\\\*_`\\[\\]()#<>|~])")
// sanitizeMarkdownText escapes special Markdown characters in untrusted text.
// This prevents markdown injection attacks where a malicious display name could
// break formatting, inject links, or create unexpected rendering.
func sanitizeMarkdownText(s string) string {
// First, remove any control characters and null bytes
cleaned := strings.Map(func(r rune) rune {
if r < 32 && r != '\t' && r != '\n' {
return -1 // drop the character
}
return r
}, s)
// Escape special Markdown characters by prepending backslash
return markdownSpecialChars.ReplaceAllString(cleaned, `\$1`)
}
// FormatMarkdownWithDisplay formats a ReviewResult with separate display name and sentinel name. // FormatMarkdownWithDisplay formats a ReviewResult with separate display name and sentinel name.
// Note: displayName is not HTML-escaped as Gitea sanitizes rendered Markdown. // displayName is sanitized to prevent Markdown injection from untrusted remote persona metadata.
// Persona display names are controlled by repo owners (trusted input). // sentinelName is used for the cleanup sentinel comment (machine-readable, not rendered).
// displayName is used for the header title, sentinelName is used for the cleanup sentinel.
// If displayName is empty, sentinelName is used for both. // If displayName is empty, sentinelName is used for both.
func FormatMarkdownWithDisplay(result *ReviewResult, displayName, sentinelName string) string { func FormatMarkdownWithDisplay(result *ReviewResult, displayName, sentinelName string) string {
var sb strings.Builder var sb strings.Builder
@@ -37,7 +57,8 @@ func FormatMarkdownWithDisplay(result *ReviewResult, displayName, sentinelName s
} }
if headerName != "" { if headerName != "" {
title := CapitalizeFirst(headerName) // Sanitize the header name to prevent Markdown injection
title := CapitalizeFirst(sanitizeMarkdownText(headerName))
sb.WriteString(fmt.Sprintf("# %s Review\n\n", title)) sb.WriteString(fmt.Sprintf("# %s Review\n\n", title))
} }
@@ -61,7 +82,8 @@ func FormatMarkdownWithDisplay(result *ReviewResult, displayName, sentinelName s
sb.WriteString(fmt.Sprintf("**%s** — %s\n", result.Verdict, result.Recommendation)) sb.WriteString(fmt.Sprintf("**%s** — %s\n", result.Verdict, result.Recommendation))
if sentinelName != "" { if sentinelName != "" {
sb.WriteString(fmt.Sprintf("\n---\n*Review by %s*\n", headerName)) // Sanitize headerName for the footer as well
sb.WriteString(fmt.Sprintf("\n---\n*Review by %s*\n", sanitizeMarkdownText(headerName)))
// Hidden sentinel for identifying this bot's reviews during cleanup // Hidden sentinel for identifying this bot's reviews during cleanup
sb.WriteString(fmt.Sprintf("\n<!-- review-bot:%s -->\n", sentinelName)) sb.WriteString(fmt.Sprintf("\n<!-- review-bot:%s -->\n", sentinelName))
} }
+68
View File
@@ -214,3 +214,71 @@ func TestFormatMarkdownWithDisplay(t *testing.T) {
} }
}) })
} }
func TestSanitizeMarkdownText(t *testing.T) {
tests := []struct {
name string
input string
want string
}{
{
name: "plain text unchanged",
input: "Security Specialist",
want: "Security Specialist",
},
{
name: "escapes asterisks",
input: "**bold** attack",
want: `\*\*bold\*\* attack`,
},
{
name: "escapes brackets for links",
input: "[click me](http://evil.com)",
want: `\[click me\]\(http://evil.com\)`,
},
{
name: "escapes backticks",
input: "`code` injection",
want: "\\`code\\` injection",
},
{
name: "escapes angle brackets",
input: "<script>alert(1)</script>",
want: `\<script\>alert\(1\)\</script\>`,
},
{
name: "escapes hash for headers",
input: "# Fake Header",
want: `\# Fake Header`,
},
{
name: "escapes pipe for tables",
input: "col1 | col2",
want: `col1 \| col2`,
},
{
name: "removes control characters",
input: "hello\x00world\x1f",
want: "helloworld",
},
{
name: "preserves tabs and newlines",
input: "line1\n\tindented",
want: "line1\n\tindented",
},
{
name: "escapes tilde for strikethrough",
input: "~~strikethrough~~",
want: `\~\~strikethrough\~\~`,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := sanitizeMarkdownText(tt.input)
if got != tt.want {
t.Errorf("sanitizeMarkdownText(%q) = %q, want %q", tt.input, got, tt.want)
}
})
}
}
+161 -21
View File
@@ -1,81 +1,153 @@
package review package review
import ( import (
"bytes"
"embed" "embed"
"encoding/json" "encoding/json"
"fmt" "fmt"
"os" "os"
"sort"
"strings" "strings"
"unicode/utf8" "unicode/utf8"
"gopkg.in/yaml.v3"
) )
//go:embed personas/*.json //go:embed personas/*.yaml
var embeddedPersonas embed.FS var embeddedPersonas embed.FS
// MaxPersonaFileSize is the maximum size for persona files (64 KB).
// This prevents denial-of-service via excessively large files.
const MaxPersonaFileSize = 64 * 1024
// MaxYAMLDepth is the maximum nesting depth allowed in YAML persona files.
// This prevents stack exhaustion from deeply nested structures.
const MaxYAMLDepth = 20
// MaxYAMLNodes is the maximum number of YAML nodes allowed in persona files.
// This prevents DoS via wide-but-shallow structures that bypass depth limits.
const MaxYAMLNodes = 1000
// Persona defines a specialized review role with focused expertise. // Persona defines a specialized review role with focused expertise.
type Persona struct { type Persona struct {
Name string `json:"name"` Name string `json:"name" yaml:"name"`
DisplayName string `json:"display_name"` DisplayName string `json:"display_name" yaml:"display_name"`
ModelPref string `json:"model_preference,omitempty"` ModelPref string `json:"model_preference,omitempty" yaml:"model_preference,omitempty"`
Identity string `json:"identity"` Identity string `json:"identity" yaml:"identity"`
Focus []string `json:"focus"` Focus []string `json:"focus" yaml:"focus"`
Ignore []string `json:"ignore"` Ignore []string `json:"ignore" yaml:"ignore"`
Severity Severity `json:"severity"` Severity Severity `json:"severity" yaml:"severity"`
OutputFormat string `json:"output_format,omitempty"` OutputFormat string `json:"output_format,omitempty" yaml:"output_format,omitempty"`
} }
// Severity defines what constitutes each severity level for this persona. // Severity defines what constitutes each severity level for this persona.
// These are prompt guidance for the LLM, not output format changes. // These are prompt guidance for the LLM, not output format changes.
type Severity struct { type Severity struct {
Major string `json:"major"` Major string `json:"major" yaml:"major"`
Minor string `json:"minor"` Minor string `json:"minor" yaml:"minor"`
Nit string `json:"nit"` Nit string `json:"nit" yaml:"nit"`
} }
// LoadPersona loads a persona from a JSON file path. // LoadPersona loads a persona from a JSON or YAML file path.
// Format is detected by file extension: .yaml/.yml for YAML, .json or other for JSON.
// Files larger than MaxPersonaFileSize are rejected.
//
// Symlinks are supported: os.Stat follows symlinks, so a symlink pointing to
// a regular file will pass the IsRegular() check. Symlinks to non-regular files
// (directories, FIFOs, devices) are still rejected.
func LoadPersona(path string) (*Persona, error) { func LoadPersona(path string) (*Persona, error) {
// os.Stat follows symlinks, so symlinks to regular files are supported.
// The IsRegular() check operates on the target, not the symlink itself.
info, err := os.Stat(path)
if err != nil {
return nil, fmt.Errorf("read persona file %s: %w", path, err)
}
if !info.Mode().IsRegular() {
return nil, fmt.Errorf("persona file %s is not a regular file", path)
}
if info.Size() > MaxPersonaFileSize {
return nil, fmt.Errorf("persona file %s exceeds maximum size (%d bytes)", path, MaxPersonaFileSize)
}
data, err := os.ReadFile(path) data, err := os.ReadFile(path)
if err != nil { if err != nil {
return nil, fmt.Errorf("read persona file %s: %w", path, err) return nil, fmt.Errorf("read persona file %s: %w", path, err)
} }
// Re-check size after read to defend against TOCTOU races where file
// grows between stat and read (e.g., appending process, replaced file).
if len(data) > MaxPersonaFileSize {
return nil, fmt.Errorf("persona file %s exceeds maximum size (%d bytes)", path, MaxPersonaFileSize)
}
return parsePersona(data, path) return parsePersona(data, path)
} }
// LoadBuiltinPersona loads a built-in persona by name. // LoadBuiltinPersona loads a built-in persona by name.
// Returns an error if the persona doesn't exist. // Returns an error if the persona doesn't exist.
// Built-in personas are stored in YAML format only (see embed directive).
func LoadBuiltinPersona(name string) (*Persona, error) { func LoadBuiltinPersona(name string) (*Persona, error) {
filename := name + ".json" yamlFile := name + ".yaml"
data, err := embeddedPersonas.ReadFile("personas/" + filename) // embed.FS paths use forward slashes per io/fs spec data, err := embeddedPersonas.ReadFile("personas/" + yamlFile)
if err != nil { if err != nil {
available := ListBuiltinPersonas() available := ListBuiltinPersonas()
return nil, fmt.Errorf("unknown built-in persona %q (available: %s)", name, strings.Join(available, ", ")) return nil, fmt.Errorf("unknown built-in persona %q (available: %s)", name, strings.Join(available, ", "))
} }
return parsePersona(data, "builtin:"+name) return parsePersona(data, "builtin:"+yamlFile)
} }
// ListBuiltinPersonas returns the names of all built-in personas. // ListBuiltinPersonas returns the names of all built-in personas in sorted order.
// Returns an empty slice if the embedded directory cannot be read. // Returns an empty slice if the embedded directory cannot be read.
func ListBuiltinPersonas() []string { func ListBuiltinPersonas() []string {
entries, err := embeddedPersonas.ReadDir("personas") entries, err := embeddedPersonas.ReadDir("personas")
if err != nil { if err != nil {
return []string{} return []string{}
} }
var names []string seen := make(map[string]bool)
for _, e := range entries { for _, e := range entries {
if e.IsDir() { if e.IsDir() {
continue continue
} }
name := e.Name() name := e.Name()
if strings.HasSuffix(name, ".json") { // Strip extension to get persona name
names = append(names, strings.TrimSuffix(name, ".json")) var personaName string
switch {
case strings.HasSuffix(name, ".yaml"):
personaName = strings.TrimSuffix(name, ".yaml")
case strings.HasSuffix(name, ".yml"):
personaName = strings.TrimSuffix(name, ".yml")
case strings.HasSuffix(name, ".json"):
personaName = strings.TrimSuffix(name, ".json")
default:
continue
}
if !seen[personaName] {
seen[personaName] = true
} }
} }
names := make([]string, 0, len(seen))
for name := range seen {
names = append(names, name)
}
sort.Strings(names)
return names return names
} }
// parsePersona parses persona data from JSON or YAML format.
// Format is detected by the source file extension.
func parsePersona(data []byte, source string) (*Persona, error) { func parsePersona(data []byte, source string) (*Persona, error) {
lowerSource := strings.ToLower(source)
isYAML := strings.HasSuffix(lowerSource, ".yaml") || strings.HasSuffix(lowerSource, ".yml")
var p Persona var p Persona
if err := json.Unmarshal(data, &p); err != nil { var err error
if isYAML {
err = unmarshalYAMLWithDepthLimit(data, &p, MaxYAMLDepth)
} else {
// Use json.Decoder with DisallowUnknownFields for consistency with
// YAML's KnownFields(true) - both reject unknown fields to catch typos.
dec := json.NewDecoder(bytes.NewReader(data))
dec.DisallowUnknownFields()
err = dec.Decode(&p)
}
if err != nil {
return nil, fmt.Errorf("parse persona %s: %w", source, err) return nil, fmt.Errorf("parse persona %s: %w", source, err)
} }
if err := validatePersona(&p, source); err != nil { if err := validatePersona(&p, source); err != nil {
@@ -84,6 +156,74 @@ func parsePersona(data []byte, source string) (*Persona, error) {
return &p, nil return &p, nil
} }
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting
// and strict field checking. This protects against stack exhaustion from deeply
// nested structures and catches typos in field names.
// Multi-document YAML files are rejected to prevent silent data loss.
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
// First pass: decode into a yaml.Node to check depth limits and node counts.
// This prevents stack exhaustion before we attempt to decode into structs.
var node yaml.Node
dec := yaml.NewDecoder(bytes.NewReader(data))
if err := dec.Decode(&node); err != nil {
return err
}
// Reject multi-document YAML files - silently ignoring additional documents
// could lead to confusing behavior where users think their changes take effect.
var extra yaml.Node
if dec.Decode(&extra) == nil {
return fmt.Errorf("multi-document YAML is not supported; only single-document files are allowed")
}
nodeCount := 0
if err := checkYAMLDepth(&node, 0, maxDepth, MaxYAMLNodes, make(map[*yaml.Node]struct{}), &nodeCount); err != nil {
return err
}
// Second pass: decode with strict field checking enabled.
// KnownFields(true) rejects unknown keys, catching typos like "focuss" or "identiy".
// We must re-decode from the original data because yaml.Node.Decode() doesn't
// support the KnownFields option.
strictDec := yaml.NewDecoder(bytes.NewReader(data))
strictDec.KnownFields(true)
return strictDec.Decode(out)
}
// checkYAMLDepth recursively checks that YAML nodes don't exceed the depth limit
// or the total node count limit. It also detects alias cycles to prevent infinite
// recursion from crafted YAML with self-referential aliases.
func checkYAMLDepth(node *yaml.Node, depth, maxDepth, maxNodes int, seen map[*yaml.Node]struct{}, nodeCount *int) error {
if depth > maxDepth {
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
}
// Track total nodes visited as defense-in-depth against wide-but-shallow attacks.
*nodeCount++
if *nodeCount > maxNodes {
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
}
// Cycle detection: if we've seen this node before, we're in a cycle.
if _, ok := seen[node]; ok {
return nil // Already validated this subtree, skip to avoid infinite recursion.
}
seen[node] = struct{}{}
// Handle alias nodes: follow the alias to its anchor target.
// Increment depth when following aliases since they expand the effective structure.
if node.Kind == yaml.AliasNode && node.Alias != nil {
return checkYAMLDepth(node.Alias, depth+1, maxDepth, maxNodes, seen, nodeCount)
}
for _, child := range node.Content {
if err := checkYAMLDepth(child, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
return err
}
}
return nil
}
func validatePersona(p *Persona, source string) error { func validatePersona(p *Persona, source string) error {
if p.Name == "" { if p.Name == "" {
return fmt.Errorf("persona %s: name is required", source) return fmt.Errorf("persona %s: name is required", source)
+549 -10
View File
@@ -1,10 +1,13 @@
package review package review
import ( import (
"fmt"
"os" "os"
"path/filepath" "path/filepath"
"strings" "strings"
"testing" "testing"
"gopkg.in/yaml.v3"
) )
func TestLoadBuiltinPersona(t *testing.T) { func TestLoadBuiltinPersona(t *testing.T) {
@@ -87,6 +90,83 @@ func TestListBuiltinPersonas(t *testing.T) {
} }
} }
func TestLoadPersonaFromYAMLFile(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "test.yaml")
content := `# Test persona
name: test
display_name: Test Persona
identity: |
You are a test persona.
Multi-line identity works.
focus:
- testing
- validation
ignore:
- nothing
severity:
major: Big problems
minor: Small problems
nit: Tiny problems
`
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
p, err := LoadPersona(path)
if err != nil {
t.Fatalf("LoadPersona failed: %v", err)
}
if p.Name != "test" {
t.Errorf("Name = %q, want %q", p.Name, "test")
}
if p.DisplayName != "Test Persona" {
t.Errorf("DisplayName = %q, want %q", p.DisplayName, "Test Persona")
}
if len(p.Focus) != 2 {
t.Errorf("Focus len = %d, want 2", len(p.Focus))
}
if !strings.Contains(p.Identity, "Multi-line") {
t.Error("Identity should contain multi-line content")
}
}
func TestLoadPersonaFromYMLFile(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "test.yml")
content := `name: test
display_name: Test YML
identity: Test identity
focus:
- testing
ignore: []
severity:
major: Big
minor: Small
nit: Tiny
`
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
p, err := LoadPersona(path)
if err != nil {
t.Fatalf("LoadPersona failed: %v", err)
}
if p.Name != "test" {
t.Errorf("Name = %q, want %q", p.Name, "test")
}
if p.DisplayName != "Test YML" {
t.Errorf("DisplayName = %q, want %q", p.DisplayName, "Test YML")
}
}
func TestLoadPersonaFromJSONFile(t *testing.T) { func TestLoadPersonaFromJSONFile(t *testing.T) {
dir := t.TempDir() dir := t.TempDir()
path := filepath.Join(dir, "test.json") path := filepath.Join(dir, "test.json")
@@ -96,6 +176,7 @@ func TestLoadPersonaFromJSONFile(t *testing.T) {
"display_name": "Test Persona", "display_name": "Test Persona",
"identity": "You are a test persona.\nMulti-line identity works.", "identity": "You are a test persona.\nMulti-line identity works.",
"focus": ["testing", "validation"], "focus": ["testing", "validation"],
"ignore": ["nothing"], "ignore": ["nothing"],
"severity": { "severity": {
"major": "Big problems", "major": "Big problems",
@@ -130,22 +211,38 @@ func TestLoadPersonaFromJSONFile(t *testing.T) {
func TestLoadPersonaValidation(t *testing.T) { func TestLoadPersonaValidation(t *testing.T) {
tests := []struct { tests := []struct {
name string name string
json string content string
ext string
wantErr string wantErr string
}{ }{
{ {
name: "missing name", name: "missing name yaml",
json: `{"identity": "test"}`, content: "identity: test\n",
ext: ".yaml",
wantErr: "name is required", wantErr: "name is required",
}, },
{ {
name: "missing identity", name: "missing identity yaml",
json: `{"name": "test"}`, content: "name: test\n",
ext: ".yaml",
wantErr: "identity is required", wantErr: "identity is required",
}, },
{ {
name: "display_name defaults to name", name: "missing name json",
json: `{"name": "test", "identity": "test identity"}`, content: `{"identity": "test"}`,
ext: ".json",
wantErr: "name is required",
},
{
name: "missing identity json",
content: `{"name": "test"}`,
ext: ".json",
wantErr: "identity is required",
},
{
name: "display_name defaults to name",
content: "name: test\nidentity: test identity\n",
ext: ".yaml",
// No error expected - should succeed // No error expected - should succeed
}, },
} }
@@ -153,8 +250,8 @@ func TestLoadPersonaValidation(t *testing.T) {
for _, tt := range tests { for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) { t.Run(tt.name, func(t *testing.T) {
dir := t.TempDir() dir := t.TempDir()
path := filepath.Join(dir, "test.json") path := filepath.Join(dir, "test"+tt.ext)
if err := os.WriteFile(path, []byte(tt.json), 0644); err != nil { if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err) t.Fatalf("failed to write test file: %v", err)
} }
@@ -184,12 +281,25 @@ func TestLoadPersonaValidation(t *testing.T) {
} }
func TestLoadPersonaFileNotFound(t *testing.T) { func TestLoadPersonaFileNotFound(t *testing.T) {
_, err := LoadPersona("/nonexistent/path/persona.json") _, err := LoadPersona("/nonexistent/path/persona.yaml")
if err == nil { if err == nil {
t.Error("expected error for nonexistent file") t.Error("expected error for nonexistent file")
} }
} }
func TestLoadPersonaInvalidYAML(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "invalid.yaml")
if err := os.WriteFile(path, []byte("not valid yaml:\n - [broken"), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Error("expected error for invalid YAML")
}
}
func TestLoadPersonaInvalidJSON(t *testing.T) { func TestLoadPersonaInvalidJSON(t *testing.T) {
dir := t.TempDir() dir := t.TempDir()
path := filepath.Join(dir, "invalid.json") path := filepath.Join(dir, "invalid.json")
@@ -203,6 +313,38 @@ func TestLoadPersonaInvalidJSON(t *testing.T) {
} }
} }
func TestLoadPersonaCaseInsensitiveExtension(t *testing.T) {
tests := []struct {
name string
ext string
}{
{"lowercase yaml", ".yaml"},
{"uppercase YAML", ".YAML"},
{"mixed case Yaml", ".Yaml"},
{"lowercase yml", ".yml"},
{"uppercase YML", ".YML"},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "test"+tt.ext)
content := "name: test\nidentity: test identity\n"
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
p, err := LoadPersona(path)
if err != nil {
t.Fatalf("LoadPersona failed for extension %s: %v", tt.ext, err)
}
if p.Name != "test" {
t.Errorf("Name = %q, want %q", p.Name, "test")
}
})
}
}
func TestCapitalizeFirst(t *testing.T) { func TestCapitalizeFirst(t *testing.T) {
tests := []struct { tests := []struct {
input string input string
@@ -237,3 +379,400 @@ func TestListBuiltinPersonasReturnsEmptySlice(t *testing.T) {
t.Error("ListBuiltinPersonas should return empty slice, not nil") t.Error("ListBuiltinPersonas should return empty slice, not nil")
} }
} }
func TestYAMLMultilineStrings(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "multiline.yaml")
// Test literal block scalar (|) which preserves newlines
content := `name: multiline
display_name: Multiline Test
identity: |
First line.
Second line.
Third line.
focus:
- item one
ignore: []
severity:
major: Major issue
minor: Minor issue
nit: Nit
`
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
p, err := LoadPersona(path)
if err != nil {
t.Fatalf("LoadPersona failed: %v", err)
}
// Literal block scalar preserves newlines
if !strings.Contains(p.Identity, "\n") {
t.Error("Identity should contain newlines from literal block scalar")
}
if !strings.Contains(p.Identity, "Second line") {
t.Error("Identity should contain 'Second line'")
}
}
func TestYAMLComments(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "comments.yaml")
content := `# This is a comment
name: commented # inline comment
display_name: Commented Persona
# Another comment
identity: Test identity
focus:
- item # comment after item
ignore: []
severity:
major: Major
minor: Minor
nit: Nit
`
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
p, err := LoadPersona(path)
if err != nil {
t.Fatalf("LoadPersona failed: %v", err)
}
// Comments should be ignored
if p.Name != "commented" {
t.Errorf("Name = %q, want %q", p.Name, "commented")
}
if p.Focus[0] != "item" {
t.Errorf("Focus[0] = %q, want %q", p.Focus[0], "item")
}
}
func TestYAMLDeeplyNestedRejection(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "deeply-nested.yaml")
// Build a deeply nested YAML structure that exceeds MaxYAMLDepth (20).
// Each level adds 2 to the depth count (key + value mapping).
var sb strings.Builder
sb.WriteString("name: test\nidentity: test\nnested:\n")
indent := " "
for i := 0; i < 25; i++ {
sb.WriteString(strings.Repeat(indent, i+1))
sb.WriteString(fmt.Sprintf("level%d:\n", i))
}
sb.WriteString(strings.Repeat(indent, 26))
sb.WriteString("value: too-deep\n")
if err := os.WriteFile(path, []byte(sb.String()), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Error("expected error for deeply nested YAML, got nil")
}
if !strings.Contains(err.Error(), "nesting depth exceeds") {
t.Errorf("error = %q, want containing 'nesting depth exceeds'", err.Error())
}
}
func TestYAMLFileSizeLimit(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "huge.yaml")
// Create a file larger than MaxPersonaFileSize (64 KB)
content := "name: test\nidentity: " + strings.Repeat("x", MaxPersonaFileSize+1) + "\n"
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Error("expected error for oversized file, got nil")
}
if !strings.Contains(err.Error(), "exceeds maximum size") {
t.Errorf("error = %q, want containing 'exceeds maximum size'", err.Error())
}
}
func TestYAMLAliasCycleDetection(t *testing.T) {
// Test that our checkYAMLDepth function handles alias cycles gracefully
// by using the seen map to prevent infinite recursion.
// We test this directly because go-yaml's parser handles most cycles
// at parse time, but we need to ensure our checker is robust.
// Create a node structure where an alias points to a parent node,
// simulating what could happen with malicious input that bypasses
// go-yaml's cycle detection.
parent := &yaml.Node{
Kind: yaml.MappingNode,
Content: []*yaml.Node{
{Kind: yaml.ScalarNode, Value: "name"},
{Kind: yaml.ScalarNode, Value: "test"},
{Kind: yaml.ScalarNode, Value: "nested"},
},
}
// Create a child that aliases back to the parent (artificial cycle)
aliasToParent := &yaml.Node{
Kind: yaml.AliasNode,
Alias: parent,
}
parent.Content = append(parent.Content, aliasToParent)
nodeCount := 0
seen := make(map[*yaml.Node]struct{})
// This should NOT hang or stack overflow - the seen map prevents infinite recursion
err := checkYAMLDepth(parent, 0, MaxYAMLDepth, MaxYAMLNodes, seen, &nodeCount)
if err != nil {
t.Errorf("unexpected error traversing cyclic structure: %v", err)
}
// Verify we tracked the parent in the seen map
if _, ok := seen[parent]; !ok {
t.Error("parent node not tracked in seen map")
}
}
func TestYAMLMultiDocumentRejection(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "multi.yaml")
// Multi-document YAML (documents separated by ---)
content := `name: first
identity: first document
---
name: second
identity: second document
`
if err := os.WriteFile(path, []byte(content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Error("expected error for multi-document YAML, got nil")
}
if !strings.Contains(err.Error(), "multi-document") {
t.Errorf("error = %q, want containing 'multi-document'", err.Error())
}
}
func TestYAMLNodeCountLimit(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "wide.yaml")
// Build a YAML structure that's shallow but wide - many keys at the same level
// to test the node count limit (should exceed MaxYAMLNodes = 1000)
var sb strings.Builder
sb.WriteString("name: test\nidentity: test\n")
for i := 0; i < 600; i++ {
sb.WriteString(fmt.Sprintf("key%d: value%d\n", i, i))
}
if err := os.WriteFile(path, []byte(sb.String()), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Error("expected error for wide YAML exceeding node count, got nil")
}
if !strings.Contains(err.Error(), "node count exceeds") {
t.Errorf("error = %q, want containing 'node count exceeds'", err.Error())
}
}
func TestCheckYAMLDepthCycleDetectionDirect(t *testing.T) {
// Direct test of cycle detection in checkYAMLDepth by creating
// a node structure with an artificial cycle.
// This tests the seen map logic independent of go-yaml's parsing.
node := &yaml.Node{
Kind: yaml.MappingNode,
Content: []*yaml.Node{
{Kind: yaml.ScalarNode, Value: "key"},
{Kind: yaml.ScalarNode, Value: "value"},
},
}
// Create a cycle by making a child reference the parent
cycleChild := &yaml.Node{
Kind: yaml.AliasNode,
Alias: node, // Points back to the parent
}
node.Content = append(node.Content,
&yaml.Node{Kind: yaml.ScalarNode, Value: "cyclic"},
cycleChild,
)
nodeCount := 0
seen := make(map[*yaml.Node]struct{})
err := checkYAMLDepth(node, 0, MaxYAMLDepth, MaxYAMLNodes, seen, &nodeCount)
// Should complete without infinite recursion due to cycle detection
if err != nil {
t.Errorf("unexpected error: %v", err)
}
// The seen map should contain multiple entries
if len(seen) < 2 {
t.Errorf("seen map has %d entries, expected at least 2", len(seen))
}
}
func TestListBuiltinPersonasSortedOrder(t *testing.T) {
names := ListBuiltinPersonas()
if len(names) < 2 {
t.Skip("need at least 2 personas to test ordering")
}
// Verify the list is sorted
for i := 1; i < len(names); i++ {
if names[i-1] > names[i] {
t.Errorf("ListBuiltinPersonas not sorted: %q > %q", names[i-1], names[i])
}
}
}
func TestYAMLUnknownFieldsRejected(t *testing.T) {
tests := []struct {
name string
content string
wantErr string
}{
{
name: "unknown top-level field",
content: `name: test
identity: test identity
unknown_field: should fail
`,
wantErr: "unknown_field",
},
{
name: "typo in field name",
content: `name: test
identiy: typo should fail
`,
wantErr: "identiy",
},
{
name: "unknown field in severity",
content: `name: test
identity: test
severity:
major: Major
minro: typo
`,
wantErr: "minro",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "unknown.yaml")
if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Errorf("expected error for unknown field %q, got nil", tt.wantErr)
return
}
if !strings.Contains(err.Error(), tt.wantErr) {
t.Errorf("error = %q, want containing %q", err.Error(), tt.wantErr)
}
})
}
}
func TestJSONUnknownFieldsRejected(t *testing.T) {
tests := []struct {
name string
content string
wantErr string
}{
{
name: "unknown top-level field",
content: `{
"name": "test",
"identity": "test identity",
"unknown_field": "should fail"
}`,
wantErr: "unknown_field",
},
{
name: "typo in field name",
content: `{
"name": "test",
"identiy": "typo should fail"
}`,
wantErr: "identiy",
},
{
name: "unknown field in severity",
content: `{
"name": "test",
"identity": "test",
"severity": {
"major": "ok",
"miner": "typo"
}
}`,
wantErr: "miner",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "test.json")
if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Fatal("expected error for unknown field, got nil")
}
if !strings.Contains(err.Error(), tt.wantErr) {
t.Errorf("error = %q, want to contain %q", err.Error(), tt.wantErr)
}
})
}
}
func TestLoadPersonaSymlink(t *testing.T) {
// Create a regular persona file
dir := t.TempDir()
realFile := filepath.Join(dir, "real.yaml")
content := `name: test
identity: test identity
`
if err := os.WriteFile(realFile, []byte(content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
// Create a symlink to it
symlink := filepath.Join(dir, "link.yaml")
if err := os.Symlink(realFile, symlink); err != nil {
t.Fatalf("failed to create symlink: %v", err)
}
// LoadPersona should work via symlink
p, err := LoadPersona(symlink)
if err != nil {
t.Fatalf("LoadPersona via symlink failed: %v", err)
}
if p.Name != "test" {
t.Errorf("Name = %q, want %q", p.Name, "test")
}
}
-26
View File
@@ -1,26 +0,0 @@
{
"name": "architect",
"display_name": "Software Architect",
"identity": "You are a software architect reviewing code for design quality.\n\nYour expertise:\n- Design patterns and anti-patterns\n- Code organization and module boundaries\n- API design and contracts\n- Testability and dependency injection\n- Consistency with existing architecture\n- Technical debt identification",
"focus": [
"Design pattern violations or misuse",
"Module boundary violations (inappropriate coupling)",
"API design issues (unclear contracts, leaky abstractions)",
"Testability problems (hidden dependencies, god objects)",
"Inconsistency with existing codebase patterns",
"Unnecessary complexity or over-engineering",
"Missing abstractions or premature abstraction"
],
"ignore": [
"Security vulnerabilities (security persona handles these)",
"Performance micro-optimizations",
"Code style and formatting",
"Documentation typos",
"Test implementation details"
],
"severity": {
"major": "Architectural violations that will cause maintenance problems or make the codebase harder to evolve",
"minor": "Design issues that reduce clarity or testability but don't block progress",
"nit": "Minor pattern deviations or style preferences"
}
}
+37
View File
@@ -0,0 +1,37 @@
# Software Architect Persona
# Focuses on design quality, patterns, and code organization
name: architect
display_name: Software Architect
identity: |
You are a software architect reviewing code for design quality.
Your expertise:
- Design patterns and anti-patterns
- Code organization and module boundaries
- API design and contracts
- Testability and dependency injection
- Consistency with existing architecture
- Technical debt identification
focus:
- Design pattern violations or misuse
- Module boundary violations (inappropriate coupling)
- API design issues (unclear contracts, leaky abstractions)
- Testability problems (hidden dependencies, god objects)
- Inconsistency with existing codebase patterns
- Unnecessary complexity or over-engineering
- Missing abstractions or premature abstraction
ignore:
- Security vulnerabilities (security persona handles these)
- Performance micro-optimizations
- Code style and formatting
- Documentation typos
- Test implementation details
severity:
major: "Architectural violations that will cause maintenance problems or make the codebase harder to evolve"
minor: "Design issues that reduce clarity or testability but don't block progress"
nit: "Minor pattern deviations or style preferences"
-26
View File
@@ -1,26 +0,0 @@
{
"name": "docs",
"display_name": "Documentation Reviewer",
"identity": "You are a documentation specialist reviewing code for clarity and documentation quality.\n\nYour expertise:\n- API documentation and examples\n- Code comments and their accuracy\n- Error message clarity\n- README and guide quality\n- Naming clarity and self-documenting code",
"focus": [
"Missing or outdated documentation",
"Unclear or misleading comments",
"Poor error messages (cryptic, unhelpful, missing context)",
"Confusing naming (functions, variables, types)",
"Missing examples for complex APIs",
"Inconsistent terminology",
"Documentation that contradicts the code"
],
"ignore": [
"Security vulnerabilities",
"Performance issues",
"Design patterns",
"Test coverage",
"Code style (unless it affects readability)"
],
"severity": {
"major": "Documentation that actively misleads or missing docs for critical functionality",
"minor": "Unclear documentation or poor error messages that will confuse users",
"nit": "Minor clarity improvements or typo fixes"
}
}
+36
View File
@@ -0,0 +1,36 @@
# Documentation Reviewer Persona
# Focuses on clarity, documentation quality, and self-documenting code
name: docs
display_name: Documentation Reviewer
identity: |
You are a documentation specialist reviewing code for clarity and documentation quality.
Your expertise:
- API documentation and examples
- Code comments and their accuracy
- Error message clarity
- README and guide quality
- Naming clarity and self-documenting code
focus:
- Missing or outdated documentation
- Unclear or misleading comments
- Poor error messages (cryptic, unhelpful, missing context)
- Confusing naming (functions, variables, types)
- Missing examples for complex APIs
- Inconsistent terminology
- Documentation that contradicts the code
ignore:
- Security vulnerabilities
- Performance issues
- Design patterns
- Test coverage
- Code style (unless it affects readability)
severity:
major: "Documentation that actively misleads or missing docs for critical functionality"
minor: "Unclear documentation or poor error messages that will confuse users"
nit: "Minor clarity improvements or typo fixes"
-26
View File
@@ -1,26 +0,0 @@
{
"name": "security",
"display_name": "Security Specialist",
"identity": "You are a security specialist reviewing code for vulnerabilities.\n\nYour expertise:\n- OWASP Top 10 vulnerabilities\n- Injection attacks (SQL, command, path traversal, template)\n- Authentication and authorization patterns\n- Secrets management and exposure risks\n- Race conditions with security implications\n- Event sourcing attack vectors (replay attacks, event injection)",
"focus": [
"Injection attacks (SQL, command, path traversal, template injection)",
"Authentication and authorization gaps or bypasses",
"Secrets exposure (hardcoded credentials, tokens in logs, config leaks)",
"Input validation failures (unsanitized input, unsafe deserialization)",
"Race conditions that could be exploited",
"Cryptographic weaknesses (weak algorithms, improper key handling)",
"Information disclosure through error messages or logs"
],
"ignore": [
"Code style and naming conventions",
"Performance optimizations (unless security-related)",
"Documentation quality",
"General code quality or readability",
"Test coverage"
],
"severity": {
"major": "Exploitable vulnerabilities: auth bypass, injection, data exfiltration, privilege escalation, RCE",
"minor": "Defense-in-depth issues: missing rate limiting, verbose errors, weak input validation",
"nit": "Theoretical risks with low exploitability or impact"
}
}
+37
View File
@@ -0,0 +1,37 @@
# Security Specialist Persona
# Focuses on vulnerabilities, auth issues, and security best practices
name: security
display_name: Security Specialist
identity: |
You are a security specialist reviewing code for vulnerabilities.
Your expertise:
- OWASP Top 10 vulnerabilities
- Injection attacks (SQL, command, path traversal, template)
- Authentication and authorization patterns
- Secrets management and exposure risks
- Race conditions with security implications
- Event sourcing attack vectors (replay attacks, event injection)
focus:
- Injection attacks (SQL, command, path traversal, template injection)
- Authentication and authorization gaps or bypasses
- Secrets exposure (hardcoded credentials, tokens in logs, config leaks)
- Input validation failures (unsanitized input, unsafe deserialization)
- Race conditions that could be exploited
- Cryptographic weaknesses (weak algorithms, improper key handling)
- Information disclosure through error messages or logs
ignore:
- Code style and naming conventions
- Performance optimizations (unless security-related)
- Documentation quality
- General code quality or readability
- Test coverage
severity:
major: "Exploitable vulnerabilities: auth bypass, injection, data exfiltration, privilege escalation, RCE"
minor: "Defense-in-depth issues: missing rate limiting, verbose errors, weak input validation"
nit: "Theoretical risks with low exploitability or impact"
+171
View File
@@ -0,0 +1,171 @@
package review
import (
"context"
"fmt"
"log/slog"
"sort"
"strings"
)
// PersonaFetcher abstracts fetching files from a remote repository.
// This allows persona loading to work with any Git host API.
type PersonaFetcher interface {
// ListContents returns file/directory entries at a path.
// Returns an error if the path doesn't exist or isn't accessible.
ListContents(ctx context.Context, owner, repo, path string) ([]ContentEntry, error)
// GetFileContent returns the raw content of a file from the default branch.
GetFileContent(ctx context.Context, owner, repo, filepath string) (string, error)
}
// ContentEntry represents a file or directory entry.
type ContentEntry struct {
Name string // filename or directory name
Path string // full path from repo root
Type string // "file" or "dir"
}
// DefaultPersonasPath is the conventional location for repo-specific personas.
const DefaultPersonasPath = ".review-bot/personas"
// LoadRemotePersonas fetches personas from a remote repository's .review-bot/personas/ directory.
// Returns a map of persona name to Persona. If the directory doesn't exist or is empty,
// returns an empty map with no error (graceful fallback to built-in personas).
//
// Files larger than MaxPersonaFileSize are logged and skipped.
// Invalid YAML files are logged and skipped (partial success model).
// Only .yaml and .yml files are processed; other files are ignored.
func LoadRemotePersonas(ctx context.Context, fetcher PersonaFetcher, owner, repo string) (map[string]*Persona, error) {
return LoadRemotePersonasFromPath(ctx, fetcher, owner, repo, DefaultPersonasPath)
}
// LoadRemotePersonasFromPath loads personas from a custom path in a remote repository.
// It behaves the same as LoadRemotePersonas but allows specifying a path other than
// the default .review-bot/personas directory.
func LoadRemotePersonasFromPath(ctx context.Context, fetcher PersonaFetcher, owner, repo, path string) (map[string]*Persona, error) {
entries, err := fetcher.ListContents(ctx, owner, repo, path)
if err != nil {
// 404 is expected when repo doesn't have personas — return empty, not error
if isNotFoundError(err) {
slog.Debug("no remote personas directory found", "repo", fmt.Sprintf("%s/%s", owner, repo), "path", path)
return map[string]*Persona{}, nil
}
return nil, fmt.Errorf("list remote personas: %w", err)
}
// Cap the number of files to process to prevent resource exhaustion
// from repos with thousands of small files.
const maxPersonaFiles = 50
result := make(map[string]*Persona)
processed := 0
for _, entry := range entries {
if processed >= maxPersonaFiles {
slog.Warn("persona file limit reached", "limit", maxPersonaFiles, "repo", fmt.Sprintf("%s/%s", owner, repo))
break
}
if ctx.Err() != nil {
return nil, ctx.Err()
}
// Skip directories and non-YAML files
if entry.Type != "file" {
continue
}
if !isYAMLFile(entry.Name) {
continue
}
content, err := fetcher.GetFileContent(ctx, owner, repo, entry.Path)
if err != nil {
slog.Warn("could not fetch remote persona file", "file", entry.Path, "error", err)
continue
}
// Check size before parsing (defense in depth)
if len(content) > MaxPersonaFileSize {
slog.Warn("remote persona file exceeds size limit", "file", entry.Path, "size", len(content), "limit", MaxPersonaFileSize)
continue
}
// YAML parsing uses parsePersona which has defenses against YAML DoS attacks:
// - MaxPersonaFileSize (above) caps raw input size before any parsing
// - maxPersonaFiles (above) limits the number of files processed per repo
// - unmarshalYAMLWithDepthLimit enforces MaxYAMLDepth to prevent stack exhaustion
// - checkYAMLDepth tracks node counts (MaxYAMLNodes) against "billion laughs" expansion
// - Alias cycles are detected and capped by seen-node tracking
// See persona.go for the implementation details.
persona, err := parsePersona([]byte(content), entry.Path)
if err != nil {
slog.Warn("could not parse remote persona file", "file", entry.Path, "error", err)
continue
}
result[persona.Name] = persona
processed++
slog.Debug("loaded remote persona", "name", persona.Name, "file", entry.Path)
}
return result, nil
}
// MergePersonas combines remote and built-in personas.
// Remote personas take precedence on name collision.
// Returns the merged map and a list of persona names in sorted order.
func MergePersonas(remote, builtin map[string]*Persona) (map[string]*Persona, []string) {
merged := make(map[string]*Persona)
// Add built-in first
for name, p := range builtin {
merged[name] = p
}
// Remote overrides built-in on collision
for name, p := range remote {
if _, exists := merged[name]; exists {
slog.Debug("remote persona overrides built-in", "name", name)
}
merged[name] = p
}
// Collect sorted names
names := make([]string, 0, len(merged))
for name := range merged {
names = append(names, name)
}
sort.Strings(names)
return merged, names
}
// LoadAllBuiltinPersonas loads all built-in personas into a map.
func LoadAllBuiltinPersonas() map[string]*Persona {
result := make(map[string]*Persona)
for _, name := range ListBuiltinPersonas() {
p, err := LoadBuiltinPersona(name)
if err != nil {
slog.Warn("could not load built-in persona", "name", name, "error", err)
continue
}
result[name] = p
}
return result
}
// isYAMLFile returns true if the filename has a YAML extension.
func isYAMLFile(name string) bool {
lower := strings.ToLower(name)
return strings.HasSuffix(lower, ".yaml") || strings.HasSuffix(lower, ".yml")
}
// isNotFoundError checks if an error indicates a 404 response.
// This is a simple string check to avoid importing the gitea package
// (which would create a circular dependency).
func isNotFoundError(err error) bool {
if err == nil {
return false
}
errStr := err.Error()
return strings.Contains(errStr, "HTTP 404")
}
+394
View File
@@ -0,0 +1,394 @@
package review
import (
"context"
"errors"
"testing"
)
// mockFetcher implements PersonaFetcher for testing.
type mockFetcher struct {
contents map[string][]ContentEntry // path -> entries
files map[string]string // path -> content
listErr error // error to return from ListContents
getFileErr map[string]error // path -> error for GetFileContent
listNotFound bool // return 404-style error
}
func newMockFetcher() *mockFetcher {
return &mockFetcher{
contents: make(map[string][]ContentEntry),
files: make(map[string]string),
getFileErr: make(map[string]error),
}
}
func (m *mockFetcher) ListContents(ctx context.Context, owner, repo, path string) ([]ContentEntry, error) {
if m.listNotFound {
return nil, errors.New("HTTP 404: not found")
}
if m.listErr != nil {
return nil, m.listErr
}
entries, ok := m.contents[path]
if !ok {
return nil, errors.New("HTTP 404: not found")
}
return entries, nil
}
func (m *mockFetcher) GetFileContent(ctx context.Context, owner, repo, filepath string) (string, error) {
if err, ok := m.getFileErr[filepath]; ok {
return "", err
}
content, ok := m.files[filepath]
if !ok {
return "", errors.New("HTTP 404: file not found")
}
return content, nil
}
func TestLoadRemotePersonas_NoDirectory(t *testing.T) {
fetcher := newMockFetcher()
fetcher.listNotFound = true
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("expected no error for missing directory, got: %v", err)
}
if len(result) != 0 {
t.Errorf("expected empty map, got %d personas", len(result))
}
}
func TestLoadRemotePersonas_EmptyDirectory(t *testing.T) {
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{}
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result) != 0 {
t.Errorf("expected empty map, got %d personas", len(result))
}
}
func TestLoadRemotePersonas_SinglePersona(t *testing.T) {
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{
{Name: "trading.yaml", Path: ".review-bot/personas/trading.yaml", Type: "file"},
}
fetcher.files[".review-bot/personas/trading.yaml"] = `
name: trading
display_name: Trading Expert
identity: You are a trading systems expert.
focus:
- order execution
- market data
`
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result) != 1 {
t.Fatalf("expected 1 persona, got %d", len(result))
}
if result["trading"] == nil {
t.Fatal("expected 'trading' persona")
}
if result["trading"].DisplayName != "Trading Expert" {
t.Errorf("expected display name 'Trading Expert', got %q", result["trading"].DisplayName)
}
}
func TestLoadRemotePersonas_MultiplePersonas(t *testing.T) {
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{
{Name: "one.yaml", Path: ".review-bot/personas/one.yaml", Type: "file"},
{Name: "two.yml", Path: ".review-bot/personas/two.yml", Type: "file"},
}
fetcher.files[".review-bot/personas/one.yaml"] = `
name: one
identity: First persona.
`
fetcher.files[".review-bot/personas/two.yml"] = `
name: two
identity: Second persona.
`
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result) != 2 {
t.Fatalf("expected 2 personas, got %d", len(result))
}
if result["one"] == nil || result["two"] == nil {
t.Error("expected both personas to be loaded")
}
}
func TestLoadRemotePersonas_SkipsNonYAML(t *testing.T) {
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{
{Name: "valid.yaml", Path: ".review-bot/personas/valid.yaml", Type: "file"},
{Name: "readme.md", Path: ".review-bot/personas/readme.md", Type: "file"},
{Name: "config.json", Path: ".review-bot/personas/config.json", Type: "file"},
}
fetcher.files[".review-bot/personas/valid.yaml"] = `
name: valid
identity: Valid persona.
`
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result) != 1 {
t.Fatalf("expected 1 persona (skipping non-YAML), got %d", len(result))
}
}
func TestLoadRemotePersonas_SkipsDirectories(t *testing.T) {
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{
{Name: "valid.yaml", Path: ".review-bot/personas/valid.yaml", Type: "file"},
{Name: "subdir", Path: ".review-bot/personas/subdir", Type: "dir"},
}
fetcher.files[".review-bot/personas/valid.yaml"] = `
name: valid
identity: Valid persona.
`
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result) != 1 {
t.Fatalf("expected 1 persona (skipping dir), got %d", len(result))
}
}
func TestLoadRemotePersonas_SkipsInvalidYAML(t *testing.T) {
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{
{Name: "valid.yaml", Path: ".review-bot/personas/valid.yaml", Type: "file"},
{Name: "invalid.yaml", Path: ".review-bot/personas/invalid.yaml", Type: "file"},
}
fetcher.files[".review-bot/personas/valid.yaml"] = `
name: valid
identity: Valid persona.
`
fetcher.files[".review-bot/personas/invalid.yaml"] = `
this is not valid yaml: [unclosed bracket
`
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result) != 1 {
t.Fatalf("expected 1 persona (skipping invalid), got %d", len(result))
}
if result["valid"] == nil {
t.Error("expected valid persona to be loaded")
}
}
func TestLoadRemotePersonas_SkipsOversizedFiles(t *testing.T) {
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{
{Name: "huge.yaml", Path: ".review-bot/personas/huge.yaml", Type: "file"},
}
// Create content larger than MaxPersonaFileSize (64KB)
fetcher.files[".review-bot/personas/huge.yaml"] = `
name: huge
identity: ` + string(make([]byte, MaxPersonaFileSize+1000))
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result) != 0 {
t.Errorf("expected 0 personas (oversized file skipped), got %d", len(result))
}
}
func TestLoadRemotePersonas_SkipsFetchErrors(t *testing.T) {
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{
{Name: "valid.yaml", Path: ".review-bot/personas/valid.yaml", Type: "file"},
{Name: "error.yaml", Path: ".review-bot/personas/error.yaml", Type: "file"},
}
fetcher.files[".review-bot/personas/valid.yaml"] = `
name: valid
identity: Valid persona.
`
fetcher.getFileErr[".review-bot/personas/error.yaml"] = errors.New("network error")
result, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result) != 1 {
t.Fatalf("expected 1 persona (skipping error), got %d", len(result))
}
}
func TestLoadRemotePersonas_ListContentsError(t *testing.T) {
fetcher := newMockFetcher()
fetcher.listErr = errors.New("server error")
_, err := LoadRemotePersonas(context.Background(), fetcher, "owner", "repo")
if err == nil {
t.Fatal("expected error for list contents failure")
}
}
func TestLoadRemotePersonas_ContextCancellation(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
cancel() // Cancel immediately
fetcher := newMockFetcher()
fetcher.contents[DefaultPersonasPath] = []ContentEntry{
{Name: "one.yaml", Path: ".review-bot/personas/one.yaml", Type: "file"},
}
fetcher.files[".review-bot/personas/one.yaml"] = `
name: one
identity: One.
`
_, err := LoadRemotePersonas(ctx, fetcher, "owner", "repo")
if err == nil {
t.Fatal("expected context cancellation error")
}
}
func TestMergePersonas_NoOverlap(t *testing.T) {
remote := map[string]*Persona{
"trading": {Name: "trading", Identity: "Trading expert."},
}
builtin := map[string]*Persona{
"security": {Name: "security", Identity: "Security expert."},
}
merged, names := MergePersonas(remote, builtin)
if len(merged) != 2 {
t.Fatalf("expected 2 personas, got %d", len(merged))
}
if len(names) != 2 {
t.Fatalf("expected 2 names, got %d", len(names))
}
// Names should be sorted
if names[0] != "security" || names[1] != "trading" {
t.Errorf("expected sorted names [security, trading], got %v", names)
}
}
func TestMergePersonas_RemoteOverridesBuiltin(t *testing.T) {
remote := map[string]*Persona{
"security": {Name: "security", Identity: "Custom security expert."},
}
builtin := map[string]*Persona{
"security": {Name: "security", Identity: "Default security expert."},
}
merged, _ := MergePersonas(remote, builtin)
if merged["security"].Identity != "Custom security expert." {
t.Errorf("expected remote to override builtin, got identity: %q", merged["security"].Identity)
}
}
func TestMergePersonas_EmptyRemote(t *testing.T) {
remote := map[string]*Persona{}
builtin := map[string]*Persona{
"security": {Name: "security", Identity: "Security."},
}
merged, names := MergePersonas(remote, builtin)
if len(merged) != 1 {
t.Fatalf("expected 1 persona, got %d", len(merged))
}
if names[0] != "security" {
t.Errorf("expected 'security', got %q", names[0])
}
}
func TestMergePersonas_EmptyBuiltin(t *testing.T) {
remote := map[string]*Persona{
"trading": {Name: "trading", Identity: "Trading."},
}
builtin := map[string]*Persona{}
merged, names := MergePersonas(remote, builtin)
if len(merged) != 1 {
t.Fatalf("expected 1 persona, got %d", len(merged))
}
if names[0] != "trading" {
t.Errorf("expected 'trading', got %q", names[0])
}
}
func TestLoadAllBuiltinPersonas(t *testing.T) {
personas := LoadAllBuiltinPersonas()
// Should load at least the known built-in personas
expected := []string{"architect", "docs", "security"}
for _, name := range expected {
if personas[name] == nil {
t.Errorf("expected built-in persona %q to be loaded", name)
}
}
}
func TestIsYAMLFile(t *testing.T) {
tests := []struct {
name string
expected bool
}{
{"test.yaml", true},
{"test.yml", true},
{"test.YAML", true},
{"test.YML", true},
{"test.json", false},
{"test.md", false},
{"yaml", false},
{"", false},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
if got := isYAMLFile(tc.name); got != tc.expected {
t.Errorf("isYAMLFile(%q) = %v, want %v", tc.name, got, tc.expected)
}
})
}
}
func TestIsNotFoundError(t *testing.T) {
tests := []struct {
name string
err error
expected bool
}{
{"nil error", nil, false},
{"HTTP 404", errors.New("HTTP 404: not found"), true},
{"not found text", errors.New("path not found"), false},
{"server error", errors.New("server error"), false},
{"HTTP 500", errors.New("HTTP 500: internal error"), false},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
if got := isNotFoundError(tc.err); got != tc.expected {
t.Errorf("isNotFoundError(%v) = %v, want %v", tc.err, got, tc.expected)
}
})
}
}
+127
View File
@@ -0,0 +1,127 @@
#!/usr/bin/env bash
# check-deps.sh - Enforces the strict dependency allowlist from CONVENTIONS.md
# Exit 1 if any unapproved import is found.
#
# Requires: Bash 4+ (for associative arrays), Go toolchain
#
# The allowlist is parsed from CONVENTIONS.md to maintain a single source of truth.
# Enforces Scope column: "test only" packages cannot appear in non-test code.
set -euo pipefail
# Check bash version
if ((BASH_VERSINFO[0] < 4)); then
echo "❌ Bash 4+ required (found ${BASH_VERSION})"
echo " On macOS: brew install bash"
exit 1
fi
CONVENTIONS_FILE="${1:-CONVENTIONS.md}"
if [ ! -f "$CONVENTIONS_FILE" ]; then
echo "❌ CONVENTIONS.md not found"
exit 1
fi
# Parse approved packages from CONVENTIONS.md table using awk (POSIX-compatible)
# Format: | `package` | use case | scope |
declare -A ALLOWED_PROD=()
declare -A ALLOWED_TEST=()
while IFS= read -r line; do
# Use awk to extract package and scope from table row
pkg=$(echo "$line" | awk -F'|' '{gsub(/^[[:space:]]*`|`[[:space:]]*$/, "", $2); print $2}')
scope=$(echo "$line" | awk -F'|' '{gsub(/^[[:space:]]+|[[:space:]]+$/, "", $4); print tolower($4)}')
if [ -n "$pkg" ] && [ "$pkg" != "Package" ] && [[ "$pkg" =~ ^[a-zA-Z] ]]; then
if [[ "$scope" == *"test"* ]]; then
ALLOWED_TEST["$pkg"]=1
else
ALLOWED_PROD["$pkg"]=1
fi
fi
done < <(grep '| `' "$CONVENTIONS_FILE" 2>/dev/null || true)
ALL_ALLOWED=("${!ALLOWED_PROD[@]}" "${!ALLOWED_TEST[@]}")
if [ ${#ALL_ALLOWED[@]} -eq 0 ]; then
echo "⚠️ No approved packages found in $CONVENTIONS_FILE"
echo " (This is fine if you want stdlib-only)"
fi
# Helper: check if import matches any package in an associative array (literal prefix, no glob)
matches_allowlist() {
local import="$1"
shift
local -n allowlist=$1
for allowed in "${!allowlist[@]}"; do
# Exact match
if [ "$import" = "$allowed" ]; then
return 0
fi
# Literal prefix match for subpackages: must match "pkg/" exactly
if [ "${import#"$allowed/"}" != "$import" ]; then
return 0
fi
done
return 1
}
# Get direct module dependencies from go.mod
DIRECT_IMPORTS=$(go list -m -f '{{if and (not .Indirect) (not .Main)}}{{.Path}}{{end}}' all 2>&1) || {
echo "❌ Failed to list dependencies: $DIRECT_IMPORTS"
exit 1
}
DIRECT_IMPORTS=$(echo "$DIRECT_IMPORTS" | grep -v '^$' || true)
if [ -z "$DIRECT_IMPORTS" ]; then
echo "✅ No external dependencies"
exit 0
fi
# Check ALL direct dependencies are in some allowlist
VIOLATIONS=""
while IFS= read -r import; do
[ -z "$import" ] && continue
if ! matches_allowlist "$import" ALLOWED_PROD && ! matches_allowlist "$import" ALLOWED_TEST; then
VIOLATIONS="${VIOLATIONS} - ${import} (not in allowlist)"$'\n'
fi
done <<< "$DIRECT_IMPORTS"
if [ -n "$VIOLATIONS" ]; then
echo "❌ UNAPPROVED DEPENDENCIES DETECTED"
echo ""
echo "The following imports are not in the allowlist:"
printf "%s" "$VIOLATIONS"
echo ""
echo "To add a dependency, update CONVENTIONS.md (requires Aaron's approval)"
exit 1
fi
# Enforce Scope: test-only packages must not appear in non-test code
# Get imports used by non-test code only (go list -deps without -test excludes test deps)
PROD_IMPORTS=$(go list -deps -f '{{if not .Standard}}{{.ImportPath}}{{end}}' ./... 2>/dev/null || true)
TEST_ONLY_IN_PROD=""
for test_pkg in "${!ALLOWED_TEST[@]}"; do
# Use word-boundary matching: exact match or followed by /
if echo "$PROD_IMPORTS" | grep -qE "^${test_pkg}(/|\$|$)"; then
TEST_ONLY_IN_PROD="${TEST_ONLY_IN_PROD} - ${test_pkg} (marked 'test only' but used in production code)"$'\n'
fi
done
if [ -n "$TEST_ONLY_IN_PROD" ]; then
echo "❌ TEST-ONLY DEPENDENCIES IN PRODUCTION CODE"
echo ""
printf "%s" "$TEST_ONLY_IN_PROD"
echo ""
echo "These packages are marked 'test only' in CONVENTIONS.md"
echo "and must only be imported from *_test.go files."
exit 1
fi
echo "✅ All dependencies are approved"
echo " Direct module deps: $(echo "$DIRECT_IMPORTS" | wc -l | tr -d ' ')"
echo " Production allowlist: ${#ALLOWED_PROD[@]}, Test-only allowlist: ${#ALLOWED_TEST[@]}"