Files
model-research/review-prompts/GENERATE.md
T
Rodin cfcad67baa feat: add generic review prompts and generation guide
- review-prompts/generic/sonnet.md: language-agnostic structural review
- review-prompts/generic/gpt5.md: language-agnostic semantic/domain review
- review-prompts/generic/opus.md: language-agnostic design coherence review
- review-prompts/GENERATE.md: meta-prompt for tailoring to any repo
- review-prompts/ORCHESTRATION.md: multi-model review orchestration pattern
2026-05-06 08:00:59 -07:00

4.3 KiB

Generating Specialized Review Prompts for a Repository

Use this prompt to generate tailored review prompt files for any repository. Feed it to an AI model along with the repo's conventions file (CLAUDE.md, CONTRIBUTING.md, etc.) and a sample of the codebase.


The Prompt

I need you to generate three specialized code review prompt files for the following repository. Each prompt assigns a specific ROLE to one AI model, so that when all three run in parallel on the same PR, they produce complementary (non-overlapping) findings.

The three roles are:
1. **Structural/Pattern Reviewer** (for Claude Sonnet) — form over meaning
2. **Semantic/Domain Reviewer** (for GPT-5) — meaning over form
3. **Design Coherence Reviewer** (for Claude Opus) — system-level tensions

## Repository Context

- **Language/Framework:** [e.g., Go 1.22, Python/FastAPI, TypeScript/React, Elixir/Phoenix]
- **Domain:** [e.g., payment processing, e-commerce, trading system, infrastructure tooling]
- **Key patterns:** [e.g., hexagonal architecture, CQRS/ES, microservices with gRPC, monolith with modules]
- **Conventions file content:**
  [paste CLAUDE.md / CONTRIBUTING.md / .editorconfig / relevant docs here]
- **Critical invariants:** [e.g., "financial calculations must never silently produce wrong numbers", "all API responses must include correlation IDs", "mutations must be idempotent"]
- **Safety mechanisms:** [e.g., "circuit breakers on all external calls", "rate limiting on user-facing endpoints", "kill switch for trading"]

## Instructions for Each Prompt File

For each role, generate a markdown file with:

1. **A one-line role statement** — what this reviewer does (form, meaning, or design)
2. **"Your Domain" section** — 5 specific focus areas tailored to THIS repo's language, framework, and domain. Be concrete (e.g., "correct use of context.Context propagation" not "correct patterns"). Reference the repo's actual conventions.
3. **"NOT Your Domain" section** — explicitly exclude what the other two reviewers handle. This prevents overlap.
4. **"Output Rules" section** — severity definitions specific to this repo's risk profile. What counts as MAJOR in a payment system differs from what counts as MAJOR in a blog.
5. **Optional "Context" section** — domain-specific priorities (e.g., "silent data corruption > crashes" for financial systems, "user data exposure > downtime" for auth systems)
6. **Optional "When to Engage" section** (Opus only) — path-based trigger guidance for when the design reviewer adds value vs when it should just approve.

## Quality Criteria for Generated Prompts

- Each prompt must be SPECIFIC to this repo — no generic advice that applies everywhere
- The three prompts must be COMPLEMENTARY — reading all three, every reasonable finding type is covered exactly once
- The "NOT Your Domain" sections must form a clean partition — nothing falls through the cracks
- Severity definitions must reflect the repo's actual risk profile (a NIT in a blog engine might be a MAJOR in a payment system)
- Focus areas must reference actual frameworks/libraries/patterns the repo uses (not hypothetical ones)

Example Usage

To generate prompts for a new repo, run something like:

# Gather context
cat CLAUDE.md CONTRIBUTING.md > /tmp/repo-context.md
find lib/  -name "*.ex" | head -20 | xargs head -30 >> /tmp/repo-context.md  # sample code
tree -L 2 >> /tmp/repo-context.md  # structure

# Feed to a model with the prompt above

Then review the output, test on a real PR (dry-run mode), and iterate.


What Makes Good Prompts

Based on 29 model research experiments:

  1. Specificity beats generality. "Check for correct context.Context propagation in gRPC handlers" catches more than "check for correct patterns."
  2. Explicit exclusions prevent overlap. Without "NOT Your Domain," models default to broad review and duplicate each other's work.
  3. Domain-calibrated severity prevents noise. A missing error check in a CLI tool is a NIT. The same missing check in a payment handler is a MAJOR.
  4. Models follow instructions. If you tell Sonnet not to look for race conditions, it won't. The specialization actually works (Finding #26 from our research: prompt framing dominates model personality).
  5. Short is better. Each prompt should be <3KB. Models don't need verbose instructions — they need clear boundaries.