diff --git a/README.md b/README.md index ff2356a..6512eec 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,234 @@ # review-bot -Automated code review bot for Gitea. Fetches a pull request diff, sends it to an LLM for analysis, and posts a structured review back to the PR. +AI-powered code review bot for Gitea pull requests. Fetches diff + context, sends to an LLM, and posts a structured review (APPROVE / REQUEST_CHANGES) back to the PR. ## Features -- Fetches PR metadata, diff, and CI status from Gitea API -- Sends context-rich prompts to any OpenAI-compatible LLM -- Parses structured JSON review responses -- Posts formatted reviews (APPROVE / REQUEST_CHANGES) back to Gitea -- Supports custom coding conventions via repo files -- Zero external dependencies — Go stdlib only +- **Multi-provider**: OpenAI-compatible and Anthropic Messages API +- **Context-aware**: Fetches full file content, conventions, language patterns, CI status +- **Smart budget**: Automatically trims context to fit model token limits +- **Idempotent reviews**: Deletes previous review before posting new one (one review per bot) +- **Custom prompts**: Load additional instructions from a file (e.g. security-focused review) +- **Zero dependencies**: Go stdlib only -## Usage +## Quick Start: Composite Action + +The easiest way to use review-bot in your Gitea CI: + +```yaml +# .gitea/workflows/review.yml +name: Review +on: + pull_request: + types: [opened, synchronize] + +jobs: + review: + runs-on: ubuntu-24.04 + steps: + - uses: actions/checkout@v4 + - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0 + with: + reviewer-token: ${{ secrets.REVIEW_TOKEN }} + reviewer-name: code-review + llm-base-url: ${{ secrets.LLM_BASE_URL }} + llm-api-key: ${{ secrets.LLM_API_KEY }} + llm-model: gpt-4.1 +``` + +That's it. Every PR gets an automated review. + +## Examples + +### Single reviewer with conventions + +```yaml +jobs: + review: + runs-on: ubuntu-24.04 + steps: + - uses: actions/checkout@v4 + - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0 + with: + reviewer-token: ${{ secrets.REVIEW_TOKEN }} + reviewer-name: reviewer + llm-base-url: ${{ secrets.LLM_BASE_URL }} + llm-api-key: ${{ secrets.LLM_API_KEY }} + llm-model: gpt-4.1 + conventions-file: CONVENTIONS.md + timeout: '600' +``` + +### Two reviewers with different models (diversity of opinion) + +```yaml +jobs: + review: + runs-on: ubuntu-24.04 + strategy: + matrix: + include: + - name: gpt + model: gpt-4.1 + token_secret: GPT_REVIEW_TOKEN + - name: claude + model: claude-sonnet-4-20250514 + token_secret: CLAUDE_REVIEW_TOKEN + provider: anthropic + steps: + - uses: actions/checkout@v4 + - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0 + with: + reviewer-token: ${{ secrets[matrix.token_secret] }} + reviewer-name: ${{ matrix.name }} + llm-base-url: ${{ secrets.LLM_BASE_URL }} + llm-api-key: ${{ secrets.LLM_API_KEY }} + llm-model: ${{ matrix.model }} + llm-provider: ${{ matrix.provider }} + conventions-file: CONVENTIONS.md +``` + +Each reviewer posts independently and only cleans up its own stale reviews. + +### Multiple review types from a single bot account + +Use the same Gitea token but different `reviewer-name` values to run specialized reviews without needing multiple bot accounts: + +```yaml +jobs: + review: + runs-on: ubuntu-24.04 + strategy: + matrix: + include: + - name: code-quality + model: gpt-4.1 + - name: security + model: gpt-4.1 + system_prompt_file: .review/SECURITY.md + - name: performance + model: gpt-4.1 + system_prompt_file: .review/PERFORMANCE.md + steps: + - uses: actions/checkout@v4 + - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0 + with: + reviewer-token: ${{ secrets.REVIEW_TOKEN }} + reviewer-name: ${{ matrix.name }} + llm-base-url: ${{ secrets.LLM_BASE_URL }} + llm-api-key: ${{ secrets.LLM_API_KEY }} + llm-model: ${{ matrix.model }} + system-prompt-file: ${{ matrix.system_prompt_file }} +``` + +The sentinel `` ensures the security review only replaces previous security reviews, never the code-quality or performance reviews. + +### With language patterns from another repo + +```yaml +- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0 + with: + reviewer-token: ${{ secrets.REVIEW_TOKEN }} + reviewer-name: reviewer + llm-base-url: ${{ secrets.LLM_BASE_URL }} + llm-api-key: ${{ secrets.LLM_API_KEY }} + llm-model: gpt-4.1 + conventions-file: CLAUDE.md + patterns-repo: rodin/go-patterns,rodin/kubernetes-conventions + patterns-files: "README.md,patterns/" +``` + +Pattern repos are fetched at review time. The reviewer uses them as criteria for idiomatic code. + +### Dry run (test without posting) + +```yaml +- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0 + with: + reviewer-token: ${{ secrets.REVIEW_TOKEN }} + reviewer-name: test + llm-base-url: ${{ secrets.LLM_BASE_URL }} + llm-api-key: ${{ secrets.LLM_API_KEY }} + llm-model: gpt-4.1 + dry-run: 'true' +``` + +Prints the review to CI logs without posting to the PR. Useful for testing prompt changes. + +### Using Anthropic directly + +```yaml +- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0 + with: + reviewer-token: ${{ secrets.REVIEW_TOKEN }} + reviewer-name: claude + llm-base-url: https://api.anthropic.com + llm-api-key: ${{ secrets.ANTHROPIC_API_KEY }} + llm-model: claude-sonnet-4-20250514 + llm-provider: anthropic +``` + +## Action Inputs + +| Input | Required | Default | Description | +|-------|----------|---------|-------------| +| `reviewer-token` | Yes | — | Gitea token for posting reviews (needs `write:issue`, `write:repository`) | +| `reviewer-name` | No | `""` | Logical identity for this reviewer. Used as sentinel for idempotent cleanup. Set this when running multiple review bots on the same PR. | +| `llm-base-url` | Yes | — | LLM API base URL | +| `llm-api-key` | Yes | — | LLM API key | +| `llm-model` | Yes | — | Model name | +| `llm-provider` | No | `openai` | API provider: `openai` or `anthropic` | +| `conventions-file` | No | `""` | Path to coding conventions file in the repo | +| `patterns-repo` | No | `""` | Comma-separated repos with language patterns (e.g. `rodin/go-patterns`) | +| `patterns-files` | No | `README.md` | Files/directories to fetch from pattern repos | +| `system-prompt-file` | No | `""` | Local file with additional system prompt instructions | +| `temperature` | No | `0` | LLM temperature (0 = server default) | +| `timeout` | No | `300` | LLM request timeout in seconds | +| `dry-run` | No | `false` | Print review to stdout instead of posting | +| `update-existing` | No | `true` | Delete previous review from same bot before posting. Accepts: true/1/yes or false/0/no | +| `version` | No | `latest` | review-bot version to install | + +## How Review Cleanup Works + +When `reviewer-name` is set, the bot embeds a hidden sentinel in each review: + +```html + +``` + +On the next run, it finds and deletes any review containing its own sentinel (except the one it just posted). This means: + +- **One review per bot per PR** — no clutter from repeated pushes +- **Multiple bots coexist** — each only cleans up its own reviews +- **Same token, different roles** — a single bot account can post "code-review" and "security" reviews without conflict +- **No extra permissions** — identity comes from the sentinel, not the API + +If `reviewer-name` is empty, cleanup is skipped (reviews stack like before). + +## Custom Review Prompts + +Use `system-prompt-file` to specialize the review focus. The file contents are appended to the base system prompt as "Additional Review Instructions." + +Example `SECURITY_REVIEW.md`: + +```markdown +You are performing a security-focused code review. + +Focus areas: +- Injection attacks (SQL, command, path traversal, template) +- Authentication/Authorization (missing checks, privilege escalation) +- Secrets exposure (hardcoded credentials, tokens in logs) +- Input validation (unsanitized input, unsafe deserialization) +- Race conditions (TOCTOU, unsynchronized shared state) + +Rules: +- Only report findings with security implications +- Ignore style, naming, and general code quality +- MAJOR = exploitable vulnerability, MINOR = hardening opportunity, NIT = theoretical risk +- If no security-relevant changes exist, APPROVE with empty findings +``` + +## CLI Usage ```bash review-bot \ @@ -19,71 +236,74 @@ review-bot \ --repo owner/name \ --pr 42 \ --reviewer-token "$GITEA_TOKEN" \ + --reviewer-name "code-review" \ --llm-base-url https://api.openai.com/v1 \ --llm-api-key "$OPENAI_API_KEY" \ - --llm-model gpt-4 \ - --reviewer-name "Sonnet" \ - --conventions-file CONVENTIONS.md \ - --dry-run + --llm-model gpt-4.1 \ + --conventions-file CONVENTIONS.md ``` ## Environment Variables -All flags can be set via environment variables: +All flags have environment variable equivalents: -| Flag | Env Var | Required | Description | -|------|---------|----------|-------------| -| `--gitea-url` | `GITEA_URL` | Yes | Gitea instance base URL | -| `--repo` | `GITEA_REPO` | Yes | Repository in `owner/name` format | -| `--pr` | `PR_NUMBER` | Yes | Pull request number | -| `--reviewer-token` | `REVIEWER_TOKEN` | Yes | Gitea API token for posting reviews | -| `--llm-base-url` | `LLM_BASE_URL` | Yes | OpenAI-compatible API base URL | -| `--llm-api-key` | `LLM_API_KEY` | Yes | LLM API key | -| `--llm-model` | `LLM_MODEL` | Yes | Model identifier | -| `--reviewer-name` | `REVIEWER_NAME` | No | Display name in review footer | -| `--conventions-file` | `CONVENTIONS_FILE` | No | Path to conventions file in repo | -| `--dry-run` | — | No | Print review to stdout instead of posting | +| Flag | Env Var | +|------|---------| +| `--gitea-url` | `GITEA_URL` | +| `--repo` | `GITEA_REPO` | +| `--pr` | `PR_NUMBER` | +| `--reviewer-token` | `REVIEWER_TOKEN` | +| `--reviewer-name` | `REVIEWER_NAME` | +| `--llm-base-url` | `LLM_BASE_URL` | +| `--llm-api-key` | `LLM_API_KEY` | +| `--llm-model` | `LLM_MODEL` | +| `--llm-provider` | `LLM_PROVIDER` | +| `--conventions-file` | `CONVENTIONS_FILE` | +| `--patterns-repo` | `PATTERNS_REPO` | +| `--patterns-files` | `PATTERNS_FILES` | +| `--system-prompt-file` | `SYSTEM_PROMPT_FILE` | +| `--llm-temperature` | `LLM_TEMPERATURE` | +| `--llm-timeout` | `LLM_TIMEOUT` | +| `--update-existing` | `UPDATE_EXISTING` | -## Adding to a Gitea Repository +## Setup -1. Build the binary or use the CI workflow approach (build in CI). +1. **Create a Gitea bot account** (e.g. `review-bot`) +2. **Generate a token** with scopes: `write:issue`, `write:repository` +3. **Add secrets** to your Gitea repo (Settings → Actions → Secrets): + - `REVIEW_TOKEN` — the bot's Gitea token + - `LLM_BASE_URL` — your LLM endpoint + - `LLM_API_KEY` — your LLM key +4. **Add the workflow** (see Quick Start above) -2. Add secrets to your Gitea repo (Settings → Actions → Secrets): - - `SONNET_REVIEW_TOKEN` — Gitea token for the Sonnet reviewer account - - `GPT_REVIEW_TOKEN` — Gitea token for the GPT reviewer account - - `LLM_BASE_URL` — Your LLM API endpoint - - `LLM_API_KEY` — Your LLM API key +### Token Scopes Required -3. Copy `.gitea/workflows/ci.yml` to your repo (or adapt it). +| Scope | Purpose | +|-------|---------| +| `write:issue` | Post and delete reviews | +| `write:repository` | Read PR diffs, file content, commit statuses | -4. On every PR, the bot will: - - Run tests and vet - - Build review-bot - - Post reviews from each configured LLM reviewer +No `read:user` scope needed — the bot identifies itself from the review response. ## Development ```bash -# Run tests -go test ./... - -# Run vet -go vet ./... - -# Build +go test ./... # Unit tests +go vet ./... # Static analysis go build -o review-bot ./cmd/review-bot -# Integration tests (requires env vars) +# Integration tests (requires env vars set) go test -tags=integration ./... ``` ## Architecture ``` -cmd/review-bot/ CLI entrypoint -gitea/ Gitea API client -llm/ OpenAI-compatible LLM client +cmd/review-bot/ CLI entrypoint + orchestration +gitea/ Gitea API client (reviews, PRs, files) +llm/ Multi-provider LLM client (OpenAI + Anthropic) review/ Prompt building, response parsing, formatting +budget/ Token estimation + context trimming ``` ## License