Add persona system for specialized review roles. Each persona defines:
- A specific review focus (security, architecture, documentation)
- Custom system prompt additions
- Personality/tone adjustments
Built-in personas: security, architect, docs
Custom personas: load from JSON via persona-file flag
Includes workspace validation to prevent path traversal attacks.
Closes#51
Addresses intermittent 'unexpected end of JSON input' failures where the
LLM response body is truncated in transit between the proxy and client.
Root cause: network-level truncation where io.ReadAll returns partial data
(observed in 3/50 CI runs through HAI proxy). The response body reading
was already using io.ReadAll correctly, but transient network issues
between the proxy and client can still cause partial reads.
Changes:
- Add Content-Length validation in doRequest: detect when fewer bytes
arrive than the server declared, triggering a retry
- Add retry logic in Complete: retries once on retryable errors (body
read failures, content-length mismatches) with a 500ms backoff
- Add parse-level retry in main: if ParseResponse fails, re-requests
from the LLM once before giving up (defensive, since retries always
succeed per issue evidence)
- Improve ParseResponse error diagnostics: log raw vs cleaned lengths
and a preview of the cleaned content to aid future debugging
Does NOT retry on API errors (4xx/5xx) or structural issues — only
transient body read problems.
Closes#47
When reviewer-name is set, prepend "# Security Review" / "# Sonnet Review"
etc. as a top-level header. Makes it immediately obvious which role each
review represents in the Gitea UI, especially when multiple reviews come
from the same bot account.
Sentinel-based cleanup:
- Reviews embed <!-- review-bot:NAME --> in body (hidden HTML comment)
- Cleanup matches by sentinel, not token identity
- Each reviewer-name is a logical identity (sonnet, gpt, security)
- Same token can run multiple review types without conflict
- No extra API scopes needed
System prompt file (--system-prompt-file / SYSTEM_PROMPT_FILE):
- Loads a local file with additional review instructions
- Appended to system base as "Additional Review Instructions"
- Enables specialized reviews (security, performance, etc.)
- Partially addresses #5
Security review:
- SECURITY_REVIEW.md prompt focused on vulnerabilities
- 3rd CI matrix entry using same token, different prompt
- Focus: injection, auth, secrets, input validation, crypto, races
CI changes:
- REVIEWER_NAME passed from matrix.name
- SYSTEM_PROMPT_FILE passed from matrix (empty for standard reviews)
- 3 reviewers: sonnet (general), gpt (general), security (focused)
- Account for truncation marker tokens when computing diff budget
(prevents EstTokens exceeding model limit in edge cases)
- Rune-safe truncation for both UserMeta and Diff (no split multi-byte)
- Fix misleading comment (1000 chars → ~1000 tokens/4000 chars)
- Extract marker strings as constants
- Add unit tests for BuildSystemBase and BuildUserMeta
Adds a budget package that estimates token usage and progressively
trims context to fit within model-specific limits.
Trim order (least important first):
1. Language patterns
2. Repository conventions
3. Full file context
4. Diff (truncated as last resort)
When content is trimmed, a note is appended to the user prompt so
the LLM knows context was reduced.
- New budget package with Fit(), EstimateTokens(), LimitForModel()
- Model limit table (GPT-4.1: 128K, GPT-5: 200K, Claude: 200K)
- Refactored review/prompt.go: BuildSystemBase() and BuildUserMeta()
extract non-trimmable content; old functions delegate to new ones
- main.go uses budget.Fit() instead of direct prompt assembly
- 7 unit tests covering all trim paths
Closes#19
Major improvements to review quality:
1. Full file context: fetch complete content of all modified files from
the PR branch and include as reference. This eliminates false-positive
"missing import" findings since the model sees the entire file.
2. Patterns repo: new --patterns-repo / PATTERNS_REPO flag fetches
language idiom files from a separate Gitea repo (e.g. rodin/elixir-patterns)
and includes them as review criteria.
3. Multi-file patterns: --patterns-files / PATTERNS_FILES accepts
comma-separated file paths to fetch from the patterns repo.
New API methods:
- GetPullRequestFiles: list changed files in a PR
- GetFileContentRef: fetch file content from a specific branch/ref
Prompt changes:
- BuildSystemPrompt now accepts (conventions, patterns)
- BuildUserPrompt now accepts fileContext parameter
- File context displayed before diff for model reference
- Patterns presented as "review criteria" in system prompt
Composite action updated with patterns-repo and patterns-files inputs.
The LLM was treating the diff as complete file context and flagging
"missing imports" for symbols that exist in unchanged portions of
the file. Added explicit instructions to the system prompt:
- Clarify that the diff is partial; unchanged code already exists
- Explicitly instruct: do not flag missing imports/types unless the
diff introduces a new usage without a corresponding addition
- Add rule: never flag undefined symbols that could exist in the
unchanged portions
- CLI binary with flag/env var configuration
- Gitea API client (PR metadata, diff, CI status, post review)
- OpenAI-compatible LLM client
- Structured review prompt with conventions support
- JSON response parser with validation
- Markdown review formatter for Gitea
- CI failure auto-detection (REQUEST_CHANGES)
- Dry-run mode for testing