The external dependency (goccy/go-yaml) violates the repository's
stdlib-only convention (CONVENTIONS.md). While YAML provides better
readability for multi-line strings, the convenience doesn't justify
breaking a hard rule.
Reverts:
- External dependency on github.com/goccy/go-yaml
- YAML parsing logic in persona.go
- YAML persona files (restored as JSON)
- YAML-specific tests
- Design document (feature rejected)
The persona files work fine with JSON. Multi-line strings use \n escapes
which is less pretty but acceptable for internal files.
This addresses all MAJOR findings from review bots regarding the external
dependency violation.
- Add github.com/goccy/go-yaml dependency (v1.19.2)
- Update parsePersona to detect format by file extension
- Support both .yaml and .yml extensions (case-insensitive)
- Convert built-in personas to YAML format
- Add comprehensive tests for YAML parsing
- Update README with YAML examples and documentation
YAML provides cleaner multi-line strings via literal block scalars
and supports comments, making persona definitions more readable.
JSON remains supported for backwards compatibility.
Closes#57
MAJOR fixes:
- Remove external YAML dependency (github.com/goccy/go-yaml)
Per project convention: Go standard library only, zero dependencies.
Convert all persona files from YAML to JSON format.
- Fix TestValidateWorkspacePath error expectation
Go 1.21+ filepath.Join normalizes absolute paths differently.
MINOR fixes:
- Remove custom contains helper in persona_test.go (use strings.Contains)
- Add Unicode-safe CapitalizeFirst function for header titles
- ListBuiltinPersonas returns empty slice instead of nil on error
- Fix test comment about filepath.Join behavior
Documentation:
- Update README to reflect JSON-only persona format
- Update design doc with note about JSON decision
- Fix action.yml description for persona-file input
Add persona system for specialized review roles. Each persona defines:
- A specific review focus (security, architecture, documentation)
- Custom system prompt additions
- Personality/tone adjustments
Built-in personas: security, architect, docs
Custom personas: load from JSON via persona-file flag
Includes workspace validation to prevent path traversal attacks.
Closes#51
Addresses intermittent 'unexpected end of JSON input' failures where the
LLM response body is truncated in transit between the proxy and client.
Root cause: network-level truncation where io.ReadAll returns partial data
(observed in 3/50 CI runs through HAI proxy). The response body reading
was already using io.ReadAll correctly, but transient network issues
between the proxy and client can still cause partial reads.
Changes:
- Add Content-Length validation in doRequest: detect when fewer bytes
arrive than the server declared, triggering a retry
- Add retry logic in Complete: retries once on retryable errors (body
read failures, content-length mismatches) with a 500ms backoff
- Add parse-level retry in main: if ParseResponse fails, re-requests
from the LLM once before giving up (defensive, since retries always
succeed per issue evidence)
- Improve ParseResponse error diagnostics: log raw vs cleaned lengths
and a preview of the cleaned content to aid future debugging
Does NOT retry on API errors (4xx/5xx) or structural issues — only
transient body read problems.
Closes#47
When reviewer-name is set, prepend "# Security Review" / "# Sonnet Review"
etc. as a top-level header. Makes it immediately obvious which role each
review represents in the Gitea UI, especially when multiple reviews come
from the same bot account.
Sentinel-based cleanup:
- Reviews embed <!-- review-bot:NAME --> in body (hidden HTML comment)
- Cleanup matches by sentinel, not token identity
- Each reviewer-name is a logical identity (sonnet, gpt, security)
- Same token can run multiple review types without conflict
- No extra API scopes needed
System prompt file (--system-prompt-file / SYSTEM_PROMPT_FILE):
- Loads a local file with additional review instructions
- Appended to system base as "Additional Review Instructions"
- Enables specialized reviews (security, performance, etc.)
- Partially addresses #5
Security review:
- SECURITY_REVIEW.md prompt focused on vulnerabilities
- 3rd CI matrix entry using same token, different prompt
- Focus: injection, auth, secrets, input validation, crypto, races
CI changes:
- REVIEWER_NAME passed from matrix.name
- SYSTEM_PROMPT_FILE passed from matrix (empty for standard reviews)
- 3 reviewers: sonnet (general), gpt (general), security (focused)
- Account for truncation marker tokens when computing diff budget
(prevents EstTokens exceeding model limit in edge cases)
- Rune-safe truncation for both UserMeta and Diff (no split multi-byte)
- Fix misleading comment (1000 chars → ~1000 tokens/4000 chars)
- Extract marker strings as constants
- Add unit tests for BuildSystemBase and BuildUserMeta
Adds a budget package that estimates token usage and progressively
trims context to fit within model-specific limits.
Trim order (least important first):
1. Language patterns
2. Repository conventions
3. Full file context
4. Diff (truncated as last resort)
When content is trimmed, a note is appended to the user prompt so
the LLM knows context was reduced.
- New budget package with Fit(), EstimateTokens(), LimitForModel()
- Model limit table (GPT-4.1: 128K, GPT-5: 200K, Claude: 200K)
- Refactored review/prompt.go: BuildSystemBase() and BuildUserMeta()
extract non-trimmable content; old functions delegate to new ones
- main.go uses budget.Fit() instead of direct prompt assembly
- 7 unit tests covering all trim paths
Closes#19
Major improvements to review quality:
1. Full file context: fetch complete content of all modified files from
the PR branch and include as reference. This eliminates false-positive
"missing import" findings since the model sees the entire file.
2. Patterns repo: new --patterns-repo / PATTERNS_REPO flag fetches
language idiom files from a separate Gitea repo (e.g. rodin/elixir-patterns)
and includes them as review criteria.
3. Multi-file patterns: --patterns-files / PATTERNS_FILES accepts
comma-separated file paths to fetch from the patterns repo.
New API methods:
- GetPullRequestFiles: list changed files in a PR
- GetFileContentRef: fetch file content from a specific branch/ref
Prompt changes:
- BuildSystemPrompt now accepts (conventions, patterns)
- BuildUserPrompt now accepts fileContext parameter
- File context displayed before diff for model reference
- Patterns presented as "review criteria" in system prompt
Composite action updated with patterns-repo and patterns-files inputs.
The LLM was treating the diff as complete file context and flagging
"missing imports" for symbols that exist in unchanged portions of
the file. Added explicit instructions to the system prompt:
- Clarify that the diff is partial; unchanged code already exists
- Explicitly instruct: do not flag missing imports/types unless the
diff introduces a new usage without a corresponding addition
- Add rule: never flag undefined symbols that could exist in the
unchanged portions
- CLI binary with flag/env var configuration
- Gitea API client (PR metadata, diff, CI status, post review)
- OpenAI-compatible LLM client
- Structured review prompt with conventions support
- JSON response parser with validation
- Markdown review formatter for Gitea
- CI failure auto-detection (REQUEST_CHANGES)
- Dry-run mode for testing