docs: comprehensive README with action usage, cleanup behavior, custom prompts

- Quick start example with composite action + matrix strategy - Full action inputs table with descriptions - How sentinel-based cleanup works (explains the reviewer-name concept) - Custom prompt file usage with security review example - CLI usage with all flags - Environment variables table - Token scopes documentation - Setup guide for new repos
2026-05-01 20:59:34 -07:00
parent 69e0a459c3
commit b8af8306a6
1 changed files with 268 additions and 48 deletions
@@ -1,17 +1,234 @@
 # review-bot

-Automated code review bot for Gitea. Fetches a pull request diff, sends it to an LLM for analysis, and posts a structured review back to the PR.
+AI-powered code review bot for Gitea pull requests. Fetches diff + context, sends to an LLM, and posts a structured review (APPROVE / REQUEST_CHANGES) back to the PR.

 ## Features

- Fetches PR metadata, diff, and CI status from Gitea API
- Sends context-rich prompts to any OpenAI-compatible LLM
- Parses structured JSON review responses
- Posts formatted reviews (APPROVE / REQUEST_CHANGES) back to Gitea
- Supports custom coding conventions via repo files
- Zero external dependencies — Go stdlib only
+- **Multi-provider**: OpenAI-compatible and Anthropic Messages API
+- **Context-aware**: Fetches full file content, conventions, language patterns, CI status
+- **Smart budget**: Automatically trims context to fit model token limits
+- **Idempotent reviews**: Deletes previous review before posting new one (one review per bot)
+- **Custom prompts**: Load additional instructions from a file (e.g. security-focused review)
+- **Zero dependencies**: Go stdlib only

-## Usage
+## Quick Start: Composite Action
+
+The easiest way to use review-bot in your Gitea CI:
+
+```yaml
+# .gitea/workflows/review.yml
+name: Review
+on:
+  pull_request:
+    types: [opened, synchronize]
+
+jobs:
+  review:
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v4
+      - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/[email protected]
+        with:
+          reviewer-token: ${{ secrets.REVIEW_TOKEN }}
+          reviewer-name: code-review
+          llm-base-url: ${{ secrets.LLM_BASE_URL }}
+          llm-api-key: ${{ secrets.LLM_API_KEY }}
+          llm-model: gpt-4.1
+```
+
+That's it. Every PR gets an automated review.
+
+## Examples
+
+### Single reviewer with conventions
+
+```yaml
+jobs:
+  review:
+    runs-on: ubuntu-24.04
+    steps:
+      - uses: actions/checkout@v4
+      - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/[email protected]
+        with:
+          reviewer-token: ${{ secrets.REVIEW_TOKEN }}
+          reviewer-name: reviewer
+          llm-base-url: ${{ secrets.LLM_BASE_URL }}
+          llm-api-key: ${{ secrets.LLM_API_KEY }}
+          llm-model: gpt-4.1
+          conventions-file: CONVENTIONS.md
+          timeout: '600'
+```
+
+### Two reviewers with different models (diversity of opinion)
+
+```yaml
+jobs:
+  review:
+    runs-on: ubuntu-24.04
+    strategy:
+      matrix:
+        include:
+          - name: gpt
+            model: gpt-4.1
+            token_secret: GPT_REVIEW_TOKEN
+          - name: claude
+            model: claude-sonnet-4-20250514
+            token_secret: CLAUDE_REVIEW_TOKEN
+            provider: anthropic
+    steps:
+      - uses: actions/checkout@v4
+      - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/[email protected]
+        with:
+          reviewer-token: ${{ secrets[matrix.token_secret] }}
+          reviewer-name: ${{ matrix.name }}
+          llm-base-url: ${{ secrets.LLM_BASE_URL }}
+          llm-api-key: ${{ secrets.LLM_API_KEY }}
+          llm-model: ${{ matrix.model }}
+          llm-provider: ${{ matrix.provider }}
+          conventions-file: CONVENTIONS.md
+```
+
+Each reviewer posts independently and only cleans up its own stale reviews.
+
+### Multiple review types from a single bot account
+
+Use the same Gitea token but different `reviewer-name` values to run specialized reviews without needing multiple bot accounts:
+
+```yaml
+jobs:
+  review:
+    runs-on: ubuntu-24.04
+    strategy:
+      matrix:
+        include:
+          - name: code-quality
+            model: gpt-4.1
+          - name: security
+            model: gpt-4.1
+            system_prompt_file: .review/SECURITY.md
+          - name: performance
+            model: gpt-4.1
+            system_prompt_file: .review/PERFORMANCE.md
+    steps:
+      - uses: actions/checkout@v4
+      - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/[email protected]
+        with:
+          reviewer-token: ${{ secrets.REVIEW_TOKEN }}
+          reviewer-name: ${{ matrix.name }}
+          llm-base-url: ${{ secrets.LLM_BASE_URL }}
+          llm-api-key: ${{ secrets.LLM_API_KEY }}
+          llm-model: ${{ matrix.model }}
+          system-prompt-file: ${{ matrix.system_prompt_file }}
+```
+
+The sentinel `<!-- review-bot:security -->` ensures the security review only replaces previous security reviews, never the code-quality or performance reviews.
+
+### With language patterns from another repo
+
+```yaml
+- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/[email protected]
+  with:
+    reviewer-token: ${{ secrets.REVIEW_TOKEN }}
+    reviewer-name: reviewer
+    llm-base-url: ${{ secrets.LLM_BASE_URL }}
+    llm-api-key: ${{ secrets.LLM_API_KEY }}
+    llm-model: gpt-4.1
+    conventions-file: CLAUDE.md
+    patterns-repo: rodin/go-patterns,rodin/kubernetes-conventions
+    patterns-files: "README.md,patterns/"
+```
+
+Pattern repos are fetched at review time. The reviewer uses them as criteria for idiomatic code.
+
+### Dry run (test without posting)
+
+```yaml
+- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/[email protected]
+  with:
+    reviewer-token: ${{ secrets.REVIEW_TOKEN }}
+    reviewer-name: test
+    llm-base-url: ${{ secrets.LLM_BASE_URL }}
+    llm-api-key: ${{ secrets.LLM_API_KEY }}
+    llm-model: gpt-4.1
+    dry-run: 'true'
+```
+
+Prints the review to CI logs without posting to the PR. Useful for testing prompt changes.
+
+### Using Anthropic directly
+
+```yaml
+- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/[email protected]
+  with:
+    reviewer-token: ${{ secrets.REVIEW_TOKEN }}
+    reviewer-name: claude
+    llm-base-url: https://api.anthropic.com
+    llm-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
+    llm-model: claude-sonnet-4-20250514
+    llm-provider: anthropic
+```
+
+## Action Inputs
+
+| Input | Required | Default | Description |
+|-------|----------|---------|-------------|
+| `reviewer-token` | Yes | — | Gitea token for posting reviews (needs `write:issue`, `write:repository`) |
+| `reviewer-name` | No | `""` | Logical identity for this reviewer. Used as sentinel for idempotent cleanup. Set this when running multiple review bots on the same PR. |
+| `llm-base-url` | Yes | — | LLM API base URL |
+| `llm-api-key` | Yes | — | LLM API key |
+| `llm-model` | Yes | — | Model name |
+| `llm-provider` | No | `openai` | API provider: `openai` or `anthropic` |
+| `conventions-file` | No | `""` | Path to coding conventions file in the repo |
+| `patterns-repo` | No | `""` | Comma-separated repos with language patterns (e.g. `rodin/go-patterns`) |
+| `patterns-files` | No | `README.md` | Files/directories to fetch from pattern repos |
+| `system-prompt-file` | No | `""` | Local file with additional system prompt instructions |
+| `temperature` | No | `0` | LLM temperature (0 = server default) |
+| `timeout` | No | `300` | LLM request timeout in seconds |
+| `dry-run` | No | `false` | Print review to stdout instead of posting |
+| `update-existing` | No | `true` | Delete previous review from same bot before posting. Accepts: true/1/yes or false/0/no |
+| `version` | No | `latest` | review-bot version to install |
+
+## How Review Cleanup Works
+
+When `reviewer-name` is set, the bot embeds a hidden sentinel in each review:
+
+```html
+<!-- review-bot:code-review -->
+```
+
+On the next run, it finds and deletes any review containing its own sentinel (except the one it just posted). This means:
+
+- **One review per bot per PR** — no clutter from repeated pushes
+- **Multiple bots coexist** — each only cleans up its own reviews
+- **Same token, different roles** — a single bot account can post "code-review" and "security" reviews without conflict
+- **No extra permissions** — identity comes from the sentinel, not the API
+
+If `reviewer-name` is empty, cleanup is skipped (reviews stack like before).
+
+## Custom Review Prompts
+
+Use `system-prompt-file` to specialize the review focus. The file contents are appended to the base system prompt as "Additional Review Instructions."
+
+Example `SECURITY_REVIEW.md`:
+
+```markdown
+You are performing a security-focused code review.
+
+Focus areas:
+- Injection attacks (SQL, command, path traversal, template)
+- Authentication/Authorization (missing checks, privilege escalation)
+- Secrets exposure (hardcoded credentials, tokens in logs)
+- Input validation (unsanitized input, unsafe deserialization)
+- Race conditions (TOCTOU, unsynchronized shared state)
+
+Rules:
+- Only report findings with security implications
+- Ignore style, naming, and general code quality
+- MAJOR = exploitable vulnerability, MINOR = hardening opportunity, NIT = theoretical risk
+- If no security-relevant changes exist, APPROVE with empty findings
+```
+
+## CLI Usage

 ```bash
 review-bot \
@@ -19,71 +236,74 @@ review-bot \
  --repo owner/name \
  --pr 42 \
  --reviewer-token "$GITEA_TOKEN" \
+  --reviewer-name "code-review" \
  --llm-base-url https://api.openai.com/v1 \
  --llm-api-key "$OPENAI_API_KEY" \
-  --llm-model gpt-4 \
-  --reviewer-name "Sonnet" \
-  --conventions-file CONVENTIONS.md \
-  --dry-run
+  --llm-model gpt-4.1 \
+  --conventions-file CONVENTIONS.md
 ```

 ## Environment Variables

-All flags can be set via environment variables:
+All flags have environment variable equivalents:

-| Flag | Env Var | Required | Description |
-|------|---------|----------|-------------|
-| `--gitea-url` | `GITEA_URL` | Yes | Gitea instance base URL |
-| `--repo` | `GITEA_REPO` | Yes | Repository in `owner/name` format |
-| `--pr` | `PR_NUMBER` | Yes | Pull request number |
-| `--reviewer-token` | `REVIEWER_TOKEN` | Yes | Gitea API token for posting reviews |
-| `--llm-base-url` | `LLM_BASE_URL` | Yes | OpenAI-compatible API base URL |
-| `--llm-api-key` | `LLM_API_KEY` | Yes | LLM API key |
-| `--llm-model` | `LLM_MODEL` | Yes | Model identifier |
-| `--reviewer-name` | `REVIEWER_NAME` | No | Display name in review footer |
-| `--conventions-file` | `CONVENTIONS_FILE` | No | Path to conventions file in repo |
-| `--dry-run` | — | No | Print review to stdout instead of posting |
+| Flag | Env Var |
+|------|---------|
+| `--gitea-url` | `GITEA_URL` |
+| `--repo` | `GITEA_REPO` |
+| `--pr` | `PR_NUMBER` |
+| `--reviewer-token` | `REVIEWER_TOKEN` |
+| `--reviewer-name` | `REVIEWER_NAME` |
+| `--llm-base-url` | `LLM_BASE_URL` |
+| `--llm-api-key` | `LLM_API_KEY` |
+| `--llm-model` | `LLM_MODEL` |
+| `--llm-provider` | `LLM_PROVIDER` |
+| `--conventions-file` | `CONVENTIONS_FILE` |
+| `--patterns-repo` | `PATTERNS_REPO` |
+| `--patterns-files` | `PATTERNS_FILES` |
+| `--system-prompt-file` | `SYSTEM_PROMPT_FILE` |
+| `--llm-temperature` | `LLM_TEMPERATURE` |
+| `--llm-timeout` | `LLM_TIMEOUT` |
+| `--update-existing` | `UPDATE_EXISTING` |

-## Adding to a Gitea Repository
+## Setup

-1. Build the binary or use the CI workflow approach (build in CI).
+1. **Create a Gitea bot account** (e.g. `review-bot`)
+2. **Generate a token** with scopes: `write:issue`, `write:repository`
+3. **Add secrets** to your Gitea repo (Settings → Actions → Secrets):
+   - `REVIEW_TOKEN` — the bot's Gitea token
+   - `LLM_BASE_URL` — your LLM endpoint
+   - `LLM_API_KEY` — your LLM key
+4. **Add the workflow** (see Quick Start above)

-2. Add secrets to your Gitea repo (Settings → Actions → Secrets):
-   - `SONNET_REVIEW_TOKEN` — Gitea token for the Sonnet reviewer account
-   - `GPT_REVIEW_TOKEN` — Gitea token for the GPT reviewer account
-   - `LLM_BASE_URL` — Your LLM API endpoint
-   - `LLM_API_KEY` — Your LLM API key
+### Token Scopes Required

-3. Copy `.gitea/workflows/ci.yml` to your repo (or adapt it).
+| Scope | Purpose |
+|-------|---------|
+| `write:issue` | Post and delete reviews |
+| `write:repository` | Read PR diffs, file content, commit statuses |

-4. On every PR, the bot will:
-   - Run tests and vet
-   - Build review-bot
-   - Post reviews from each configured LLM reviewer
+No `read:user` scope needed — the bot identifies itself from the review response.

 ## Development

 ```bash
-# Run tests
-go test ./...
-
-# Run vet
-go vet ./...
-
-# Build
+go test ./...        # Unit tests
+go vet ./...         # Static analysis
 go build -o review-bot ./cmd/review-bot

-# Integration tests (requires env vars)
+# Integration tests (requires env vars set)
 go test -tags=integration ./...
 ```

 ## Architecture

 ```
-cmd/review-bot/     CLI entrypoint
-gitea/              Gitea API client
-llm/                OpenAI-compatible LLM client
+cmd/review-bot/     CLI entrypoint + orchestration
+gitea/              Gitea API client (reviews, PRs, files)
+llm/                Multi-provider LLM client (OpenAI + Anthropic)
 review/             Prompt building, response parsing, formatting
+budget/             Token estimation + context trimming
 ```

 ## License