review-bot/README.md

# review-bot

AI-powered code review bot for Gitea pull requests. Fetches diff + context, sends to an LLM, and posts a structured review (APPROVE / REQUEST_CHANGES) back to the PR.

## Features

- **Multi-provider**: OpenAI-compatible and Anthropic Messages API
- **Context-aware**: Fetches full file content, conventions, language patterns, CI status
- **Smart budget**: Automatically trims context to fit model token limits
- **Idempotent reviews**: Posts new review, then cleans up stale ones (one review per bot)
- **Custom prompts**: Load additional instructions from a file (e.g. security-focused review)
- **Zero dependencies**: Go stdlib only

## Quick Start: Composite Action

The easiest way to use review-bot in your Gitea CI:

```yaml
# .gitea/workflows/review.yml
name: Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/checkout@v4
      - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
        with:
          reviewer-token: ${{ secrets.REVIEW_TOKEN }}
          reviewer-name: code-review
          llm-base-url: ${{ secrets.LLM_BASE_URL }}
          llm-api-key: ${{ secrets.LLM_API_KEY }}
          llm-model: gpt-4.1
```

That's it. Every PR gets an automated review.

## Examples

### Single reviewer with conventions

```yaml
jobs:
  review:
    runs-on: ubuntu-24.04
    steps:
      - uses: actions/checkout@v4
      - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
        with:
          reviewer-token: ${{ secrets.REVIEW_TOKEN }}
          reviewer-name: reviewer
          llm-base-url: ${{ secrets.LLM_BASE_URL }}
          llm-api-key: ${{ secrets.LLM_API_KEY }}
          llm-model: gpt-4.1
          conventions-file: CONVENTIONS.md
          timeout: '600'
```

### Two reviewers with different models (diversity of opinion)

```yaml
jobs:
  review:
    runs-on: ubuntu-24.04
    strategy:
      matrix:
        include:
          - name: gpt
            model: gpt-4.1
            token_secret: GPT_REVIEW_TOKEN
          - name: claude
            model: claude-sonnet-4-20250514
            token_secret: CLAUDE_REVIEW_TOKEN
            provider: anthropic
    steps:
      - uses: actions/checkout@v4
      - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
        with:
          reviewer-token: ${{ secrets[matrix.token_secret] }}
          reviewer-name: ${{ matrix.name }}
          llm-base-url: ${{ secrets.LLM_BASE_URL }}
          llm-api-key: ${{ secrets.LLM_API_KEY }}
          llm-model: ${{ matrix.model }}
          llm-provider: ${{ matrix.provider }}
          conventions-file: CONVENTIONS.md
```

Each reviewer posts independently and only cleans up its own stale reviews.

### Multiple review types from a single bot account

Use the same Gitea token but different `reviewer-name` values to run specialized reviews without needing multiple bot accounts:

```yaml
jobs:
  review:
    runs-on: ubuntu-24.04
    strategy:
      matrix:
        include:
          - name: code-quality
            model: gpt-4.1
          - name: security
            model: gpt-4.1
            system_prompt_file: .review/SECURITY.md
          - name: performance
            model: gpt-4.1
            system_prompt_file: .review/PERFORMANCE.md
    steps:
      - uses: actions/checkout@v4
      - uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
        with:
          reviewer-token: ${{ secrets.REVIEW_TOKEN }}
          reviewer-name: ${{ matrix.name }}
          llm-base-url: ${{ secrets.LLM_BASE_URL }}
          llm-api-key: ${{ secrets.LLM_API_KEY }}
          llm-model: ${{ matrix.model }}
          system-prompt-file: ${{ matrix.system_prompt_file }}
```

The sentinel `<!-- review-bot:security -->` ensures the security review only replaces previous security reviews, never the code-quality or performance reviews.

### With language patterns from another repo

```yaml
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
  with:
    reviewer-token: ${{ secrets.REVIEW_TOKEN }}
    reviewer-name: reviewer
    llm-base-url: ${{ secrets.LLM_BASE_URL }}
    llm-api-key: ${{ secrets.LLM_API_KEY }}
    llm-model: gpt-4.1
    conventions-file: CLAUDE.md
    patterns-repo: rodin/go-patterns,rodin/kubernetes-conventions
    patterns-files: "README.md,patterns/"
```

Pattern repos are fetched at review time. The reviewer uses them as criteria for idiomatic code.

### Dry run (test without posting)

```yaml
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
  with:
    reviewer-token: ${{ secrets.REVIEW_TOKEN }}
    reviewer-name: test
    llm-base-url: ${{ secrets.LLM_BASE_URL }}
    llm-api-key: ${{ secrets.LLM_API_KEY }}
    llm-model: gpt-4.1
    dry-run: 'true'
```

Prints the review to CI logs without posting to the PR. Useful for testing prompt changes.

### Using Anthropic directly

```yaml
- uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
  with:
    reviewer-token: ${{ secrets.REVIEW_TOKEN }}
    reviewer-name: claude
    llm-base-url: https://api.anthropic.com
    llm-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
    llm-model: claude-sonnet-4-20250514
    llm-provider: anthropic
```

## Action Inputs

| Input | Required | Default | Description |
|-------|----------|---------|-------------|
| `reviewer-token` | Yes | — | Gitea token for posting reviews (needs `write:issue`, `write:repository`) |
| `reviewer-name` | No | `""` | Logical identity for this reviewer. Used as sentinel for idempotent cleanup. Set this when running multiple review bots on the same PR. |
| `llm-base-url` | Yes | — | LLM API base URL |
| `llm-api-key` | Yes | — | LLM API key |
| `llm-model` | Yes | — | Model name |
| `llm-provider` | No | `openai` | API provider: `openai` or `anthropic` |
| `conventions-file` | No | `""` | Path to coding conventions file in the repo |
| `patterns-repo` | No | `""` | Comma-separated repos with language patterns (e.g. `rodin/go-patterns`) |
| `patterns-files` | No | `README.md` | Files/directories to fetch from pattern repos |
| `system-prompt-file` | No | `""` | Local file with additional system prompt instructions |
| `temperature` | No | `0` | LLM temperature (0 = server default) |
| `timeout` | No | `300` | LLM request timeout in seconds |
| `dry-run` | No | `false` | Print review to stdout instead of posting |
| `update-existing` | No | `true` | Delete previous review from same bot before posting. Accepts: true/1/yes or false/0/no |
| `version` | No | `latest` | review-bot version to install |

## Runner Requirements

The composite action requires these tools on the runner:

| Tool | Used For |
|------|----------|
| `python3` | JSON parsing during version detection |
| `sha256sum` | Checksum verification of downloaded binary |
| `curl` | Downloading releases and querying the API |

All three are pre-installed on `ubuntu-*` runners (e.g. `ubuntu-24.04`). If you use a custom runner image, ensure these are available.

## How Review Cleanup Works

When `reviewer-name` is set, the bot embeds a hidden sentinel in each review:

```html
<!-- review-bot:code-review -->
```

On the next run, it finds and deletes any review containing its own sentinel (except the one it just posted). This means:

- **One review per bot per PR** — no clutter from repeated pushes
- **Multiple bots coexist** — each only cleans up its own reviews
- **Same token, different roles** — a single bot account can post "code-review" and "security" reviews without conflict
- **No extra permissions** — identity comes from the sentinel, not the API

If `reviewer-name` is empty, cleanup is skipped (reviews stack like before).

### Shared Token: Worst-Wins Behavior

When multiple review types share the same Gitea bot account (e.g. code-quality and security), Gitea determines the user's approval state from their **most recent review**. This creates a race condition: if security finds issues (REQUEST_CHANGES) but code-quality finishes last (APPROVE), the PR appears approved.

review-bot handles this automatically with **worst-wins reconciliation**: before posting, each job checks whether any sibling review from the same user already has REQUEST_CHANGES. If so and this job would post APPROVE, it posts as REQUEST_CHANGES instead — maintaining the block. This ensures the PR stays blocked until all checks pass, regardless of execution order.

**If you need independent approval/block per review type**, use separate Gitea bot accounts with their own tokens.

## Custom Review Prompts

Use `system-prompt-file` to specialize the review focus. The file contents are appended to the base system prompt as "Additional Review Instructions."

Example `SECURITY_REVIEW.md`:

```markdown
You are performing a security-focused code review.

Focus areas:
- Injection attacks (SQL, command, path traversal, template)
- Authentication/Authorization (missing checks, privilege escalation)
- Secrets exposure (hardcoded credentials, tokens in logs)
- Input validation (unsanitized input, unsafe deserialization)
- Race conditions (TOCTOU, unsynchronized shared state)

Rules:
- Only report findings with security implications
- Ignore style, naming, and general code quality
- MAJOR = exploitable vulnerability, MINOR = hardening opportunity, NIT = theoretical risk
- If no security-relevant changes exist, APPROVE with empty findings
```

## CLI Usage

```bash
review-bot \
  --gitea-url https://gitea.example.com \
  --repo owner/name \
  --pr 42 \
  --reviewer-token "$GITEA_TOKEN" \
  --reviewer-name "code-review" \
  --llm-base-url https://api.openai.com/v1 \
  --llm-api-key "$OPENAI_API_KEY" \
  --llm-model gpt-4.1 \
  --conventions-file CONVENTIONS.md
```

## Environment Variables

All flags have environment variable equivalents:

| Flag | Env Var |
|------|---------|
| `--gitea-url` | `GITEA_URL` |
| `--repo` | `GITEA_REPO` |
| `--pr` | `PR_NUMBER` |
| `--reviewer-token` | `REVIEWER_TOKEN` |
| `--reviewer-name` | `REVIEWER_NAME` |
| `--llm-base-url` | `LLM_BASE_URL` |
| `--llm-api-key` | `LLM_API_KEY` |
| `--llm-model` | `LLM_MODEL` |
| `--llm-provider` | `LLM_PROVIDER` |
| `--conventions-file` | `CONVENTIONS_FILE` |
| `--patterns-repo` | `PATTERNS_REPO` |
| `--patterns-files` | `PATTERNS_FILES` |
| `--system-prompt-file` | `SYSTEM_PROMPT_FILE` |
| `--llm-temperature` | `LLM_TEMPERATURE` |
| `--llm-timeout` | `LLM_TIMEOUT` |
| `--update-existing` | `UPDATE_EXISTING` |

## Setup

1. **Create a Gitea bot account** (e.g. `review-bot`)
2. **Generate a token** with scopes: `write:issue`, `write:repository`
3. **Add secrets** to your Gitea repo (Settings → Actions → Secrets):
   - `REVIEW_TOKEN` — the bot's Gitea token
   - `LLM_BASE_URL` — your LLM endpoint
   - `LLM_API_KEY` — your LLM key
4. **Add the workflow** (see Quick Start above)

### Token Scopes Required

| Scope | Purpose |
|-------|---------|
| `write:issue` | Post and delete reviews |
| `write:repository` | Read PR diffs, file content, commit statuses |

No `read:user` scope needed — the bot identifies itself from the review response.

## Development

```bash
go test ./...        # Unit tests
go vet ./...         # Static analysis
go build -o review-bot ./cmd/review-bot

# Integration tests (requires env vars set)
go test -tags=integration ./...
```

## Architecture

```
cmd/review-bot/     CLI entrypoint + orchestration
gitea/              Gitea API client (reviews, PRs, files)
llm/                Multi-provider LLM client (OpenAI + Anthropic)
review/             Prompt building, response parsing, formatting
budget/             Token estimation + context trimming
```

## License

MIT