b01e3c487f
The doc-map YAML config was previously read from the local workspace (the PR branch checkout). A malicious PR author could modify .review-bot/doc-map.yml to map any path glob to sensitive design docs, causing review-bot to fetch and inject those docs into the LLM prompt. Fix: add --doc-map-trusted-ref (DOC_MAP_TRUSTED_REF) flag. When set to a trusted ref (e.g. 'main'), the doc-map config is fetched from the VCS API at that ref instead of from local workspace. A 404 from VCS is a hard error (no silent fallback to local copy). When unset, the local workspace is used with a security warning in the logs pointing operators to the new flag. Changes: - review/docmap.go: add ParseDocMapConfigContent + parseDocMapBytes helper to parse from in-memory content (fetched via VCS API) - cmd/review-bot/main.go: add --doc-map-trusted-ref flag; Step 6c branches on trusted-ref to fetch vs local-workspace load - .gitea/actions/review/action.yml: add doc-map-trusted-ref input - README.md: document new input - CHANGELOG.md: security and feature entries Tests: - TestParseDocMapConfigContent_Valid/Empty/InvalidYAML/UnknownKeys in review/docmap_test.go Coverage: 53.0% cmd/review-bot
515 lines
18 KiB
Markdown
515 lines
18 KiB
Markdown
# review-bot
|
|
|
|
AI-powered code review bot for Gitea pull requests. Fetches diff + context, sends to an LLM, and posts a structured review (APPROVE / REQUEST_CHANGES) back to the PR.
|
|
|
|
## Features
|
|
|
|
- **Multi-provider**: OpenAI-compatible, Anthropic Messages API, and SAP AI Core
|
|
- **Context-aware**: Fetches full file content, conventions, language patterns, CI status
|
|
- **Path-scoped docs**: `doc-map` config injects only the governing design docs for changed paths
|
|
- **Smart budget**: Automatically trims context to fit model token limits
|
|
- **Idempotent reviews**: Posts new review, then cleans up stale ones (one review per bot)
|
|
- **Custom prompts**: Load additional instructions from a file (e.g. security-focused review)
|
|
- **Minimal dependencies**: Go stdlib + `github.com/goccy/go-yaml` only
|
|
|
|
## Quick Start: Composite Action
|
|
|
|
The easiest way to use review-bot in your Gitea CI:
|
|
|
|
```yaml
|
|
# .gitea/workflows/review.yml
|
|
name: Review
|
|
on:
|
|
pull_request:
|
|
types: [opened, synchronize]
|
|
|
|
jobs:
|
|
review:
|
|
runs-on: ubuntu-24.04
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
|
|
with:
|
|
reviewer-token: ${{ secrets.REVIEW_TOKEN }}
|
|
reviewer-name: code-review
|
|
llm-base-url: ${{ secrets.LLM_BASE_URL }}
|
|
llm-api-key: ${{ secrets.LLM_API_KEY }}
|
|
llm-model: gpt-4.1
|
|
```
|
|
|
|
That's it. Every PR gets an automated review.
|
|
|
|
## Examples
|
|
|
|
### Single reviewer with conventions
|
|
|
|
```yaml
|
|
jobs:
|
|
review:
|
|
runs-on: ubuntu-24.04
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
|
|
with:
|
|
reviewer-token: ${{ secrets.REVIEW_TOKEN }}
|
|
reviewer-name: reviewer
|
|
llm-base-url: ${{ secrets.LLM_BASE_URL }}
|
|
llm-api-key: ${{ secrets.LLM_API_KEY }}
|
|
llm-model: gpt-4.1
|
|
conventions-file: CONVENTIONS.md
|
|
timeout: '600'
|
|
```
|
|
|
|
### Two reviewers with different models (diversity of opinion)
|
|
|
|
```yaml
|
|
jobs:
|
|
review:
|
|
runs-on: ubuntu-24.04
|
|
strategy:
|
|
matrix:
|
|
include:
|
|
- name: gpt
|
|
model: gpt-4.1
|
|
token_secret: GPT_REVIEW_TOKEN
|
|
- name: claude
|
|
model: claude-sonnet-4-20250514
|
|
token_secret: CLAUDE_REVIEW_TOKEN
|
|
provider: anthropic
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
|
|
with:
|
|
reviewer-token: ${{ secrets[matrix.token_secret] }}
|
|
reviewer-name: ${{ matrix.name }}
|
|
llm-base-url: ${{ secrets.LLM_BASE_URL }}
|
|
llm-api-key: ${{ secrets.LLM_API_KEY }}
|
|
llm-model: ${{ matrix.model }}
|
|
llm-provider: ${{ matrix.provider }}
|
|
conventions-file: CONVENTIONS.md
|
|
```
|
|
|
|
Each reviewer posts independently and only cleans up its own stale reviews.
|
|
|
|
### Multiple review types from a single bot account
|
|
|
|
Use the same Gitea token but different `reviewer-name` values to run specialized reviews without needing multiple bot accounts:
|
|
|
|
```yaml
|
|
jobs:
|
|
review:
|
|
runs-on: ubuntu-24.04
|
|
strategy:
|
|
matrix:
|
|
include:
|
|
- name: code-quality
|
|
model: gpt-4.1
|
|
- name: security
|
|
model: gpt-4.1
|
|
system_prompt_file: .review/SECURITY.md
|
|
- name: performance
|
|
model: gpt-4.1
|
|
system_prompt_file: .review/PERFORMANCE.md
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
|
|
with:
|
|
reviewer-token: ${{ secrets.REVIEW_TOKEN }}
|
|
reviewer-name: ${{ matrix.name }}
|
|
llm-base-url: ${{ secrets.LLM_BASE_URL }}
|
|
llm-api-key: ${{ secrets.LLM_API_KEY }}
|
|
llm-model: ${{ matrix.model }}
|
|
system-prompt-file: ${{ matrix.system_prompt_file }}
|
|
```
|
|
|
|
The sentinel `<!-- review-bot:security -->` ensures the security review only replaces previous security reviews, never the code-quality or performance reviews.
|
|
|
|
### With language patterns from another repo
|
|
|
|
```yaml
|
|
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
|
|
with:
|
|
reviewer-token: ${{ secrets.REVIEW_TOKEN }}
|
|
reviewer-name: reviewer
|
|
llm-base-url: ${{ secrets.LLM_BASE_URL }}
|
|
llm-api-key: ${{ secrets.LLM_API_KEY }}
|
|
llm-model: gpt-4.1
|
|
conventions-file: CLAUDE.md
|
|
patterns-repo: rodin/go-patterns,rodin/kubernetes-conventions
|
|
patterns-files: "README.md,patterns/"
|
|
```
|
|
|
|
Pattern repos are fetched at review time. The reviewer uses them as criteria for idiomatic code.
|
|
|
|
### Dry run (test without posting)
|
|
|
|
```yaml
|
|
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
|
|
with:
|
|
reviewer-token: ${{ secrets.REVIEW_TOKEN }}
|
|
reviewer-name: test
|
|
llm-base-url: ${{ secrets.LLM_BASE_URL }}
|
|
llm-api-key: ${{ secrets.LLM_API_KEY }}
|
|
llm-model: gpt-4.1
|
|
dry-run: 'true'
|
|
```
|
|
|
|
Prints the review to CI logs without posting to the PR. Useful for testing prompt changes.
|
|
|
|
### Using Anthropic directly
|
|
|
|
```yaml
|
|
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
|
|
with:
|
|
reviewer-token: ${{ secrets.REVIEW_TOKEN }}
|
|
reviewer-name: claude
|
|
llm-base-url: https://api.anthropic.com
|
|
llm-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
|
|
llm-model: claude-sonnet-4-20250514
|
|
llm-provider: anthropic
|
|
```
|
|
|
|
### Using SAP AI Core
|
|
|
|
For SAP environments with AI Core deployments, use the `aicore` provider for native authentication:
|
|
|
|
```yaml
|
|
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
|
|
with:
|
|
reviewer-token: ${{ secrets.REVIEW_TOKEN }}
|
|
reviewer-name: aicore-review
|
|
llm-model: anthropic--claude-4.6-sonnet # or gpt-5
|
|
llm-provider: aicore
|
|
aicore-client-id: ${{ secrets.AICORE_CLIENT_ID }}
|
|
aicore-client-secret: ${{ secrets.AICORE_CLIENT_SECRET }}
|
|
aicore-auth-url: ${{ secrets.AICORE_AUTH_URL }}
|
|
aicore-api-url: ${{ secrets.AICORE_API_URL }}
|
|
aicore-resource-group: default
|
|
```
|
|
|
|
AI Core handles OAuth token management and deployment discovery automatically. Model names must match the deployment name in AI Core (e.g. `anthropic--claude-4.6-sonnet`, `gpt-5`).
|
|
|
|
## Action Inputs
|
|
|
|
| Input | Required | Default | Description |
|
|
|-------|----------|---------|-------------|
|
|
| `reviewer-token` | Yes | — | Gitea token for posting reviews (needs `write:issue`, `write:repository`) |
|
|
| `reviewer-name` | No | `""` | Logical identity for this reviewer. Used as sentinel for idempotent cleanup. Set this when running multiple review bots on the same PR. |
|
|
| `llm-base-url` | No* | `""` | LLM API base URL (required unless using aicore provider) |
|
|
| `llm-api-key` | No* | `""` | LLM API key (required unless using aicore provider) |
|
|
| `llm-model` | Yes | — | Model name |
|
|
| `llm-provider` | No | `openai` | API provider: `openai`, `anthropic`, or `aicore` |
|
|
| `aicore-client-id` | No** | `""` | SAP AI Core client ID |
|
|
| `aicore-client-secret` | No** | `""` | SAP AI Core client secret |
|
|
| `aicore-auth-url` | No** | `""` | SAP AI Core authentication URL |
|
|
| `aicore-api-url` | No** | `""` | SAP AI Core API URL |
|
|
| `aicore-resource-group` | No | `default` | SAP AI Core resource group |
|
|
| `conventions-file` | No | `""` | Path to coding conventions file in the repo |
|
|
| `patterns-repo` | No | `""` | Comma-separated repos with language patterns (e.g. `rodin/go-patterns`) |
|
|
| `patterns-files` | No | `README.md` | Files/directories to fetch from pattern repos |
|
|
| `system-prompt-file` | No | `""` | Local file with additional system prompt instructions |
|
|
| `doc-map` | No | `""` | Path to a YAML file mapping source path globs to governing design docs |
|
|
| `doc-map-max-bytes` | No | `102400` | Maximum bytes of injected doc content from doc-map (default 100KB) |
|
|
| `doc-map-trusted-ref` | No | `""` | Git ref (e.g. `main`) to fetch the doc-map config from via VCS API instead of local workspace. **Recommended for security** — prevents a PR from modifying the doc-map config to inject arbitrary docs. |
|
|
| `persona` | No | `""` | Built-in persona name (security, architect, docs) |
|
|
| `persona-file` | No | `""` | Path to persona file (YAML or JSON) with custom review focus |
|
|
| `temperature` | No | `0` | LLM temperature (0 = server default) |
|
|
| `timeout` | No | `300` | LLM request timeout in seconds |
|
|
| `dry-run` | No | `false` | Print review to stdout instead of posting |
|
|
| `update-existing` | No | `true` | Delete previous review from same bot before posting. Accepts: true/1/yes or false/0/no |
|
|
| `version` | No | `latest` | review-bot version to install |
|
|
|
|
*Required for `openai` and `anthropic` providers, not for `aicore`.
|
|
**Required only for `aicore` provider.
|
|
|
|
## Runner Requirements
|
|
|
|
The composite action requires these tools on the runner:
|
|
|
|
| Tool | Used For |
|
|
|------|----------|
|
|
| `python3` | JSON parsing during version detection |
|
|
| `sha256sum` | Checksum verification of downloaded binary |
|
|
| `curl` | Downloading releases and querying the API |
|
|
|
|
All three are pre-installed on `ubuntu-*` runners (e.g. `ubuntu-24.04`). If you use a custom runner image, ensure these are available.
|
|
|
|
## How Review Cleanup Works
|
|
|
|
When `reviewer-name` is set, the bot embeds a hidden sentinel in each review:
|
|
|
|
```html
|
|
<!-- review-bot:code-review -->
|
|
```
|
|
|
|
On the next run, it finds and deletes any review containing its own sentinel (except the one it just posted). This means:
|
|
|
|
- **One review per bot per PR** — no clutter from repeated pushes
|
|
- **Multiple bots coexist** — each only cleans up its own reviews
|
|
- **Same token, different roles** — a single bot account can post "code-review" and "security" reviews without conflict
|
|
- **No extra permissions** — identity comes from the sentinel, not the API
|
|
|
|
If `reviewer-name` is empty, cleanup is skipped (reviews stack like before).
|
|
|
|
### Shared Token: Worst-Wins Behavior
|
|
|
|
When multiple review types share the same Gitea bot account (e.g. code-quality and security), Gitea determines the user's approval state from their **most recent review**. This creates a race condition: if security finds issues (REQUEST_CHANGES) but code-quality finishes last (APPROVE), the PR appears approved.
|
|
|
|
review-bot handles this automatically with **worst-wins reconciliation**: before posting, each job checks whether any sibling review from the same user already has REQUEST_CHANGES. If so and this job would post APPROVE, it posts as REQUEST_CHANGES instead — maintaining the block. This ensures the PR stays blocked until all checks pass, regardless of execution order.
|
|
|
|
**If you need independent approval/block per review type**, use separate Gitea bot accounts with their own tokens.
|
|
|
|
## Custom Review Prompts
|
|
|
|
Use `system-prompt-file` to specialize the review focus. The file contents are appended to the base system prompt as "Additional Review Instructions."
|
|
|
|
Example `SECURITY_REVIEW.md`:
|
|
|
|
```markdown
|
|
You are performing a security-focused code review.
|
|
|
|
Focus areas:
|
|
- Injection attacks (SQL, command, path traversal, template)
|
|
- Authentication/Authorization (missing checks, privilege escalation)
|
|
- Secrets exposure (hardcoded credentials, tokens in logs)
|
|
- Input validation (unsanitized input, unsafe deserialization)
|
|
- Race conditions (TOCTOU, unsynchronized shared state)
|
|
|
|
Rules:
|
|
- Only report findings with security implications
|
|
- Ignore style, naming, and general code quality
|
|
- MAJOR = exploitable vulnerability, MINOR = hardening opportunity, NIT = theoretical risk
|
|
- If no security-relevant changes exist, APPROVE with empty findings
|
|
```
|
|
|
|
## CLI Usage
|
|
|
|
```bash
|
|
review-bot \
|
|
--vcs-url https://gitea.example.com \
|
|
--repo owner/name \
|
|
--pr 42 \
|
|
--reviewer-token "$REVIEWER_TOKEN" \
|
|
--reviewer-name "code-review" \
|
|
--llm-base-url https://api.openai.com/v1 \
|
|
--llm-api-key "$OPENAI_API_KEY" \
|
|
--llm-model gpt-4.1 \
|
|
--conventions-file CONVENTIONS.md
|
|
```
|
|
|
|
## Subcommands
|
|
|
|
### `validate-docmap`
|
|
|
|
Verifies that a `doc-map.yml` is consistent before running a review. Two checks:
|
|
|
|
1. **Coverage**: every changed file is matched by at least one `paths:` glob.
|
|
2. **Stale docs**: every `docs:` entry exists on disk under `--repo-root`.
|
|
|
|
```bash
|
|
# Typical CI usage — pipe git diff into the command
|
|
git diff --name-only origin/main HEAD | \
|
|
review-bot validate-docmap \
|
|
--docmap .review-bot/doc-map.yml \
|
|
--repo-root .
|
|
```
|
|
|
|
| Flag | Required | Default | Description |
|
|
|------|----------|---------|-------------|
|
|
| `--docmap` | Yes | — | Path to doc-map YAML file |
|
|
| `--repo-root` | No | `.` (cwd) | Root for resolving `docs:` paths |
|
|
|
|
Exit codes: `0`=clean, `1`=failures found, `2`=usage/parse error.
|
|
|
|
### `validate-url`
|
|
|
|
Resolves a URL and verifies all IPs are publicly routable (used in CI to prevent SSRF).
|
|
|
|
```bash
|
|
review-bot validate-url https://gitea.example.com
|
|
```
|
|
|
|
Exit codes: `0`=safe, `1`=blocked/private IP, `2`=error.
|
|
|
|
## Environment Variables
|
|
|
|
All flags have environment variable equivalents:
|
|
|
|
| Flag | Env Var |
|
|
|------|---------|
|
|
| `--vcs-url` | `VCS_URL` (fallback: `GITEA_URL`) |
|
|
| `--vcs-type` | `VCS_TYPE` (auto-detected from URL if not set; `gitea` or `github`) |
|
|
| `--repo` | `GITEA_REPO` (also accepted: set `GITEA_REPO` for Gitea; VCS-agnostic `REPO` coming) |
|
|
| `--pr` | `PR_NUMBER` |
|
|
| `--reviewer-token` | `REVIEWER_TOKEN` |
|
|
| `--reviewer-name` | `REVIEWER_NAME` |
|
|
| `--llm-base-url` | `LLM_BASE_URL` |
|
|
| `--llm-api-key` | `LLM_API_KEY` |
|
|
| `--llm-model` | `LLM_MODEL` |
|
|
| `--llm-provider` | `LLM_PROVIDER` |
|
|
| `--conventions-file` | `CONVENTIONS_FILE` |
|
|
| `--patterns-repo` | `PATTERNS_REPO` |
|
|
| `--patterns-files` | `PATTERNS_FILES` |
|
|
| `--system-prompt-file` | `SYSTEM_PROMPT_FILE` |
|
|
| `--llm-temperature` | `LLM_TEMPERATURE` |
|
|
| `--llm-timeout` | `LLM_TIMEOUT` |
|
|
| `--update-existing` | `UPDATE_EXISTING` |
|
|
|
|
## Setup
|
|
|
|
1. **Create a Gitea bot account** (e.g. `review-bot`)
|
|
2. **Generate a token** with scopes: `write:issue`, `write:repository`
|
|
3. **Add secrets** to your Gitea repo (Settings → Actions → Secrets):
|
|
- `REVIEW_TOKEN` — the bot's Gitea token
|
|
- `LLM_BASE_URL` — your LLM endpoint
|
|
- `LLM_API_KEY` — your LLM key
|
|
4. **Add the workflow** (see Quick Start above)
|
|
|
|
### Token Scopes Required
|
|
|
|
| Scope | Purpose |
|
|
|-------|--------|
|
|
| `write:issue` | Post and delete reviews |
|
|
| `write:repository` | Read PR diffs, file content, commit statuses |
|
|
| `read:user` | Self-request as reviewer (optional but recommended) |
|
|
|
|
Without `read:user`, the bot still works but cannot add itself to the PR's reviewer list.
|
|
|
|
## Development
|
|
|
|
```bash
|
|
go test ./... # Unit tests
|
|
go vet ./... # Static analysis
|
|
go build -o review-bot ./cmd/review-bot
|
|
|
|
# Integration tests (requires env vars set)
|
|
go test -tags=integration ./...
|
|
```
|
|
|
|
## Architecture
|
|
|
|
```
|
|
cmd/review-bot/ CLI entrypoint + orchestration
|
|
gitea/ Gitea API client (reviews, PRs, files)
|
|
llm/ Multi-provider LLM client (OpenAI + Anthropic)
|
|
review/ Prompt building, response parsing, formatting
|
|
budget/ Token estimation + context trimming
|
|
```
|
|
|
|
## License
|
|
|
|
MIT
|
|
|
|
## Review Personas
|
|
|
|
Personas provide role-based review specialization. Instead of generic code review, each persona focuses on a specific domain (security, architecture, documentation) with tailored prompts and severity calibration.
|
|
|
|
### Built-in Personas
|
|
|
|
| Persona | Focus |
|
|
|---------|-------|
|
|
| `security` | Vulnerabilities, auth bypass, secrets exposure, injection attacks |
|
|
| `architect` | Design patterns, code organization, API contracts, testability |
|
|
| `docs` | Documentation quality, API clarity, error messages |
|
|
|
|
### Using Built-in Personas
|
|
|
|
```yaml
|
|
- uses: rodin/review-bot/.gitea/actions/review@v1
|
|
with:
|
|
reviewer-name: security
|
|
persona: security
|
|
llm-model: claude-opus-4-20250514 # Security benefits from strong reasoning
|
|
...
|
|
```
|
|
|
|
### Multiple Personas in Parallel
|
|
|
|
```yaml
|
|
jobs:
|
|
review:
|
|
strategy:
|
|
matrix:
|
|
include:
|
|
- name: security
|
|
persona: security
|
|
- name: architect
|
|
persona: architect
|
|
steps:
|
|
- uses: rodin/review-bot/.gitea/actions/review@v1
|
|
with:
|
|
reviewer-name: ${{ matrix.name }}
|
|
persona: ${{ matrix.persona }}
|
|
...
|
|
```
|
|
|
|
Each persona posts independently with its own sentinel, so reviews don't interfere.
|
|
|
|
|
|
### Custom Personas
|
|
|
|
Create a YAML file with your domain-specific review focus:
|
|
|
|
```yaml
|
|
# .review/personas/trading.yaml
|
|
name: trading
|
|
display_name: Trading Domain Expert
|
|
|
|
identity: |
|
|
You are a trading systems expert reviewing code for correctness.
|
|
|
|
Your expertise:
|
|
- Order lifecycle and state machines
|
|
- Fill handling and partial fills
|
|
- Position tracking and P&L calculations
|
|
- Event sourcing invariants
|
|
|
|
focus:
|
|
- Order state machine correctness
|
|
- Fill handling edge cases (partial, overfill)
|
|
- Position and P&L calculation accuracy
|
|
- Event replay determinism
|
|
- Decimal precision for money
|
|
|
|
ignore:
|
|
- Code style
|
|
- General performance
|
|
- Documentation formatting
|
|
|
|
severity:
|
|
major: "Bugs that cause incorrect positions, fills, or money calculations"
|
|
minor: "Edge cases that could cause issues under unusual conditions"
|
|
nit: "Clarity improvements for domain logic"
|
|
```
|
|
|
|
Use it in CI:
|
|
|
|
```yaml
|
|
- uses: rodin/review-bot/.gitea/actions/review@v1
|
|
with:
|
|
reviewer-name: trading
|
|
persona-file: .review/personas/trading.yaml
|
|
...
|
|
```
|
|
|
|
YAML is the recommended format for personas because it supports:
|
|
- Multi-line strings with `|` blocks (cleaner identity definitions)
|
|
- Comments for documentation
|
|
- More readable arrays and nested structures
|
|
|
|
JSON is also supported for backwards compatibility—just use `.json` extension.
|
|
|
|
|
|
### Persona vs system-prompt-file
|
|
|
|
| Feature | `persona` / `persona-file` | `system-prompt-file` |
|
|
|---------|---------------------------|----------------------|
|
|
| Replaces base prompt | Yes | No (appends) |
|
|
| Structured format | Yes (YAML/JSON) | No (freeform) |
|
|
| Focus/ignore lists | Yes | Manual |
|
|
| Severity calibration | Yes | Manual |
|
|
| Header display name | Yes | No |
|
|
| Built-in options | Yes | No |
|
|
|
|
Use personas for domain-specialized reviews. Use `system-prompt-file` for minor tweaks to the generic review.
|