model-research/review-prompts/GENERATE.md

# Generating Specialized Review Prompts for a Repository

Use this prompt to generate tailored review prompt files for any repository. Feed it to an AI model along with the repo's conventions file (CLAUDE.md, CONTRIBUTING.md, etc.) and a sample of the codebase.

---

## The Prompt

```
I need you to generate three specialized code review prompt files for the following repository. Each prompt assigns a specific ROLE to one AI model, so that when all three run in parallel on the same PR, they produce complementary (non-overlapping) findings.

The three roles are:
1. **Structural/Pattern Reviewer** (for Claude Sonnet) — form over meaning
2. **Semantic/Domain Reviewer** (for GPT-5) — meaning over form
3. **Design Coherence Reviewer** (for Claude Opus) — system-level tensions

## Repository Context

- **Language/Framework:** [e.g., Go 1.22, Python/FastAPI, TypeScript/React, Elixir/Phoenix]
- **Domain:** [e.g., payment processing, e-commerce, trading system, infrastructure tooling]
- **Key patterns:** [e.g., hexagonal architecture, CQRS/ES, microservices with gRPC, monolith with modules]
- **Conventions file content:**
  [paste CLAUDE.md / CONTRIBUTING.md / .editorconfig / relevant docs here]
- **Critical invariants:** [e.g., "financial calculations must never silently produce wrong numbers", "all API responses must include correlation IDs", "mutations must be idempotent"]
- **Safety mechanisms:** [e.g., "circuit breakers on all external calls", "rate limiting on user-facing endpoints", "kill switch for trading"]

## Instructions for Each Prompt File

For each role, generate a markdown file with:

1. **A one-line role statement** — what this reviewer does (form, meaning, or design)
2. **"Your Domain" section** — 5 specific focus areas tailored to THIS repo's language, framework, and domain. Be concrete (e.g., "correct use of context.Context propagation" not "correct patterns"). Reference the repo's actual conventions.
3. **"NOT Your Domain" section** — explicitly exclude what the other two reviewers handle. This prevents overlap.
4. **"Output Rules" section** — severity definitions specific to this repo's risk profile. What counts as MAJOR in a payment system differs from what counts as MAJOR in a blog.
5. **Optional "Context" section** — domain-specific priorities (e.g., "silent data corruption > crashes" for financial systems, "user data exposure > downtime" for auth systems)
6. **Optional "When to Engage" section** (Opus only) — path-based trigger guidance for when the design reviewer adds value vs when it should just approve.

## Quality Criteria for Generated Prompts

- Each prompt must be SPECIFIC to this repo — no generic advice that applies everywhere
- The three prompts must be COMPLEMENTARY — reading all three, every reasonable finding type is covered exactly once
- The "NOT Your Domain" sections must form a clean partition — nothing falls through the cracks
- Severity definitions must reflect the repo's actual risk profile (a NIT in a blog engine might be a MAJOR in a payment system)
- Focus areas must reference actual frameworks/libraries/patterns the repo uses (not hypothetical ones)
```

---

## Example Usage

To generate prompts for a new repo, run something like:

```bash
# Gather context
cat CLAUDE.md CONTRIBUTING.md > /tmp/repo-context.md
find lib/  -name "*.ex" | head -20 | xargs head -30 >> /tmp/repo-context.md  # sample code
tree -L 2 >> /tmp/repo-context.md  # structure

# Feed to a model with the prompt above
```

Then review the output, test on a real PR (dry-run mode), and iterate.

---

## What Makes Good Prompts

Based on 29 model research experiments:

1. **Specificity beats generality.** "Check for correct `context.Context` propagation in gRPC handlers" catches more than "check for correct patterns."
2. **Explicit exclusions prevent overlap.** Without "NOT Your Domain," models default to broad review and duplicate each other's work.
3. **Domain-calibrated severity prevents noise.** A missing error check in a CLI tool is a NIT. The same missing check in a payment handler is a MAJOR.
4. **Models follow instructions.** If you tell Sonnet not to look for race conditions, it won't. The specialization actually works (Finding #26 from our research: prompt framing dominates model personality).
5. **Short is better.** Each prompt should be <3KB. Models don't need verbose instructions — they need clear boundaries.