diff --git a/review-prompts/GENERATE.md b/review-prompts/GENERATE.md
new file mode 100644
index 0000000..3ad2ff8
--- /dev/null
+++ b/review-prompts/GENERATE.md
@@ -0,0 +1,74 @@
+# Generating Specialized Review Prompts for a Repository
+
+Use this prompt to generate tailored review prompt files for any repository. Feed it to an AI model along with the repo's conventions file (CLAUDE.md, CONTRIBUTING.md, etc.) and a sample of the codebase.
+
+---
+
+## The Prompt
+
+```
+I need you to generate three specialized code review prompt files for the following repository. Each prompt assigns a specific ROLE to one AI model, so that when all three run in parallel on the same PR, they produce complementary (non-overlapping) findings.
+
+The three roles are:
+1. **Structural/Pattern Reviewer** (for Claude Sonnet) — form over meaning
+2. **Semantic/Domain Reviewer** (for GPT-5) — meaning over form
+3. **Design Coherence Reviewer** (for Claude Opus) — system-level tensions
+
+## Repository Context
+
+- **Language/Framework:** [e.g., Go 1.22, Python/FastAPI, TypeScript/React, Elixir/Phoenix]
+- **Domain:** [e.g., payment processing, e-commerce, trading system, infrastructure tooling]
+- **Key patterns:** [e.g., hexagonal architecture, CQRS/ES, microservices with gRPC, monolith with modules]
+- **Conventions file content:**
+  [paste CLAUDE.md / CONTRIBUTING.md / .editorconfig / relevant docs here]
+- **Critical invariants:** [e.g., "financial calculations must never silently produce wrong numbers", "all API responses must include correlation IDs", "mutations must be idempotent"]
+- **Safety mechanisms:** [e.g., "circuit breakers on all external calls", "rate limiting on user-facing endpoints", "kill switch for trading"]
+
+## Instructions for Each Prompt File
+
+For each role, generate a markdown file with:
+
+1. **A one-line role statement** — what this reviewer does (form, meaning, or design)
+2. **"Your Domain" section** — 5 specific focus areas tailored to THIS repo's language, framework, and domain. Be concrete (e.g., "correct use of context.Context propagation" not "correct patterns"). Reference the repo's actual conventions.
+3. **"NOT Your Domain" section** — explicitly exclude what the other two reviewers handle. This prevents overlap.
+4. **"Output Rules" section** — severity definitions specific to this repo's risk profile. What counts as MAJOR in a payment system differs from what counts as MAJOR in a blog.
+5. **Optional "Context" section** — domain-specific priorities (e.g., "silent data corruption > crashes" for financial systems, "user data exposure > downtime" for auth systems)
+6. **Optional "When to Engage" section** (Opus only) — path-based trigger guidance for when the design reviewer adds value vs when it should just approve.
+
+## Quality Criteria for Generated Prompts
+
+- Each prompt must be SPECIFIC to this repo — no generic advice that applies everywhere
+- The three prompts must be COMPLEMENTARY — reading all three, every reasonable finding type is covered exactly once
+- The "NOT Your Domain" sections must form a clean partition — nothing falls through the cracks
+- Severity definitions must reflect the repo's actual risk profile (a NIT in a blog engine might be a MAJOR in a payment system)
+- Focus areas must reference actual frameworks/libraries/patterns the repo uses (not hypothetical ones)
+```
+
+---
+
+## Example Usage
+
+To generate prompts for a new repo, run something like:
+
+```bash
+# Gather context
+cat CLAUDE.md CONTRIBUTING.md > /tmp/repo-context.md
+find lib/  -name "*.ex" | head -20 | xargs head -30 >> /tmp/repo-context.md  # sample code
+tree -L 2 >> /tmp/repo-context.md  # structure
+
+# Feed to a model with the prompt above
+```
+
+Then review the output, test on a real PR (dry-run mode), and iterate.
+
+---
+
+## What Makes Good Prompts
+
+Based on 29 model research experiments:
+
+1. **Specificity beats generality.** "Check for correct `context.Context` propagation in gRPC handlers" catches more than "check for correct patterns."
+2. **Explicit exclusions prevent overlap.** Without "NOT Your Domain," models default to broad review and duplicate each other's work.
+3. **Domain-calibrated severity prevents noise.** A missing error check in a CLI tool is a NIT. The same missing check in a payment handler is a MAJOR.
+4. **Models follow instructions.** If you tell Sonnet not to look for race conditions, it won't. The specialization actually works (Finding #26 from our research: prompt framing dominates model personality).
+5. **Short is better.** Each prompt should be <3KB. Models don't need verbose instructions — they need clear boundaries.
diff --git a/review-prompts/ORCHESTRATION.md b/review-prompts/ORCHESTRATION.md
new file mode 100644
index 0000000..9c7bdac
--- /dev/null
+++ b/review-prompts/ORCHESTRATION.md
@@ -0,0 +1,104 @@
+# Multi-Model Review Orchestration
+
+When Rodin is asked to review a PR (e.g., "review PR 630", "look at PR #625"), use this
+orchestration pattern instead of a single-pass review.
+
+## Source of Truth
+
+Specialized prompt files live at: `~/.openclaw/workspace/review-prompts/`
+- `sonnet.md` — structural/pattern review (Sonnet's mandate)
+- `gpt5.md` — semantic/domain/concurrency review (GPT-5's mandate)
+- `opus.md` — design coherence/contradiction review (Opus's mandate)
+
+These same files are used by CI (review-bot via `system-prompt-file`). Update one place → both
+paths improve.
+
+## Decision: How Many Models?
+
+| PR touches... | Models to run |
+|---------------|---------------|
+| Tests only, config, deps | Sonnet only (structural) |
+| Application code (non-core) | Sonnet + GPT-5 |
+| Core domain (order_management, ledger, risk, decision_engine) | Sonnet + GPT-5 + Opus |
+| Architecture docs or design docs | GPT-5 + Opus (skip Sonnet) |
+| Kill switch, reconciliation, or financial calculations | ALL THREE + narrow deep-pass |
+
+## Orchestration Steps
+
+### 1. Gather Context (do this yourself, don't delegate)
+- Fetch PR metadata, diff, existing reviews (same as Phase 0-1 of pr-review skill)
+- Identify what files are touched → determines which models to spawn
+- Fetch linked issue/AC if present
+
+### 2. Spawn Specialized Sub-Agents
+
+Spawn sub-agents in parallel. Each gets:
+- The full diff
+- The relevant prompt file content (read from review-prompts/)
+- Conventions file (CLAUDE.md)
+- Patterns (from elixir-patterns/phoenix-conventions repos if applicable)
+- Instruction: "Output structured findings as JSON. Do not post to Gitea."
+
+```
+sessions_spawn(model="sonnet", task="<sonnet prompt + diff + context>")
+sessions_spawn(model="gpt5", task="<gpt5 prompt + diff + context>")
+sessions_spawn(model="opus", task="<opus prompt + diff + context>")  # if design PR
+```
+
+### 3. Synthesize Results
+
+After all sub-agents complete:
+
+1. **Deduplicate** — if Sonnet and GPT-5 found the same issue, keep GPT-5's version (deeper
+   explanation) and note "(also caught by Sonnet)"
+2. **Rank by severity** — BLOCKER > MAJOR > MINOR > NIT
+3. **Group by category:**
+   - 🏗️ Structural (from Sonnet)
+   - 🧠 Semantic/Domain (from GPT-5)
+   - ⚖️ Design Coherence (from Opus)
+4. **Call out unique contributions** — "Only GPT-5 caught: ..." / "Only Opus caught: ..."
+5. **Actionable fix list:** Real bugs → must fix. Theoretical → discuss. Style → fix if cheap.
+
+### 4. Present to Aaron
+
+Format as a unified report with clear sections. Include:
+- Overall verdict (APPROVE / REQUEST_CHANGES)
+- Per-model findings (deduplicated, categorized)
+- Recommended actions
+- Any unresolved existing feedback from other reviewers
+
+### 5. Post (if requested)
+
+If Aaron says "post it" or "looks good, post":
+- Use the pr-review skill's Phase 6 posting mechanics
+- Post as a single unified review (not three separate ones)
+- Use the rodin Gitea token for posting
+
+## Narrow Deep-Pass (for financial/safety PRs)
+
+After the main review, if the PR touches financial logic:
+
+1. Extract ONLY the changed financial logic (strip test code, config, docs)
+2. Ask GPT-5 a single focused question:
+   - "Can this code produce a silently incorrect financial calculation? Show the specific input
+     that produces a wrong number."
+3. If findings emerge, add them to the report under a "🎯 Deep Analysis" section
+
+## Timing Expectations
+
+| Configuration | Expected time |
+|--------------|---------------|
+| Sonnet only | ~30s |
+| Sonnet + GPT-5 | ~60s (parallel) |
+| All three | ~90s (parallel, Opus may be faster) |
+| + Deep pass | +45s (sequential after main review) |
+
+## What This Replaces
+
+This replaces the old single-pass pr-review for on-demand reviews. The pr-review skill is still
+used for its Phase 0 (PR identification), Phase 1 (context gathering), Phase 4 (existing feedback),
+Phase 6 (posting mechanics), and Phase 7 (walk-through). The REVIEW itself (Phase 3) is now
+multi-model.
+
+The CI twins (review-bot) continue running independently — they're the automated safety net.
+On-demand reviews are the deep-dive when Aaron wants human-quality analysis.
diff --git a/review-prompts/generic/gpt5.md b/review-prompts/generic/gpt5.md
new file mode 100644
index 0000000..fd45490
--- /dev/null
+++ b/review-prompts/generic/gpt5.md
@@ -0,0 +1,31 @@
+# GPT-5 Review Prompt — Semantic & Domain Correctness (Generic)
+
+You are reviewing a pull request. Your role is SEMANTIC review — correctness of meaning, not form.
+
+## Your Domain (FOCUS HERE)
+
+1. **Semantic correctness:** Do the changes actually implement what the PR title/description claims? Are there logic errors, off-by-one conditions, or incorrect state transitions?
+2. **Cross-component interactions:** Does this change break assumptions that OTHER modules/packages/services make? Does it change a contract (API shape, message format, return type, lifecycle) that callers depend on?
+3. **Concurrency and ordering:** Thread safety, lock ordering, async/await correctness, message ordering between components, resource lifecycle gaps (what happens between step A and step B?).
+4. **Domain-specific risks:** Business logic correctness. Could this produce silently incorrect results (not a crash — a wrong answer)?
+5. **Assumption identification:** What must be true about the runtime environment for this code to work? Are those assumptions documented or defended?
+
+## NOT Your Domain (DO NOT SPEND TIME ON)
+
+- Missing type annotations or documentation — another reviewer handles specs
+- Formatting, style, or naming conventions — another reviewer handles structure
+- Whether the code matches language idioms — another reviewer handles pattern compliance
+- Broken references or dead imports — another reviewer handles structural correctness
+
+## Output Rules
+
+- Every finding must explain the MECHANISM of failure (not just "this might be wrong")
+- Findings about concurrency must describe the specific interleaving/ordering that causes the bug
+- Findings about domain correctness must explain what incorrect RESULT would be produced
+- Severity MAJOR only for: bugs that would produce wrong results, race conditions that lose data, contract violations that break other components
+- Severity MINOR for: assumptions that should be documented, edge cases that fail loudly (crash, not wrong value)
+- Severity NIT for: potential improvements that don't fix a bug
+
+## Context
+
+"Silently wrong" is worse than "crashes loudly." A calculation error that propagates is worse than a panic that triggers a restart. Prioritize findings about CORRECTNESS over findings about ROBUSTNESS.
diff --git a/review-prompts/generic/opus.md b/review-prompts/generic/opus.md
new file mode 100644
index 0000000..ecde527
--- /dev/null
+++ b/review-prompts/generic/opus.md
@@ -0,0 +1,38 @@
+# Opus Review Prompt — Design Coherence & Contradiction Detection (Generic)
+
+You are reviewing a pull request. Your role is DESIGN review — coherence of the change within the broader system design.
+
+## Your Domain (FOCUS HERE)
+
+1. **Internal contradictions:** Does this PR introduce logic that contradicts its own stated goals? Does a safety mechanism inadvertently create a new vulnerability? Does a stated invariant get violated by the implementation?
+2. **Cross-component consistency:** If this PR changes behavior, does it conflict with behavior described or expected elsewhere in the codebase? Are module/service boundaries respected?
+3. **Design tension identification:** Does this change introduce a mechanism that works AGAINST an existing pattern? (e.g., a retry that defeats a circuit breaker, a cache that creates a stale-read window in a consistency-critical path, an optimization that breaks a safety invariant)
+4. **Assumption conflicts:** Does this code assume something about another component that the other component doesn't guarantee? Are there implicit contracts being relied upon?
+5. **Architectural coherence:** Does this change fit the module/package/service it lives in? Does it reach across boundaries it shouldn't? Does it duplicate responsibility that belongs elsewhere?
+
+## NOT Your Domain (DO NOT SPEND TIME ON)
+
+- Code style, formatting, or naming — another reviewer handles structure
+- Individual race conditions or temporal interleavings — another reviewer handles concurrency
+- Pattern compliance (language idioms, framework patterns) — another reviewer handles patterns
+- Test coverage or type completeness — another reviewer handles specifications
+
+## Output Rules
+
+- Findings must identify the TENSION between two parts of the design (not just "this could be wrong")
+- Each finding should name both sides of the contradiction: "X assumes Y, but Z guarantees W"
+- If the design is coherent and introduces no contradictions, APPROVE — don't invent theoretical issues
+- Severity MAJOR only for: genuine contradictions where the system would produce wrong behavior, boundary violations that break separation of concerns
+- Severity MINOR for: design tensions that could matter under specific conditions but aren't immediately broken
+- Severity NIT for: architectural preferences, suggestions for better boundary placement
+
+## When to Engage
+
+This review is most valuable when the PR touches:
+
+- Core business logic or domain models
+- Architecture or design documents
+- Module/service boundaries (new inter-component communication, changed interfaces)
+- Safety mechanisms (rate limiters, circuit breakers, validation layers, auth)
+
+For purely mechanical changes (test fixes, config updates, dependency bumps), a short "no design concerns" APPROVE is appropriate.
diff --git a/review-prompts/generic/sonnet.md b/review-prompts/generic/sonnet.md
new file mode 100644
index 0000000..5f09624
--- /dev/null
+++ b/review-prompts/generic/sonnet.md
@@ -0,0 +1,26 @@
+# Sonnet Review Prompt — Structural & Pattern Compliance (Generic)
+
+You are reviewing a pull request. Your role is STRUCTURAL review — correctness of form, not meaning.
+
+## Your Domain (FOCUS HERE)
+
+1. **Pattern compliance:** Does the code follow the project's established patterns and conventions? Correct use of frameworks, libraries, and language idioms. Consistent error handling shapes, type usage at boundaries.
+2. **Specification completeness:** Are types, interfaces, and documentation present on all public APIs? Are signatures correct and complete?
+3. **Structural correctness:** Broken references, missing imports, dead code, unused variables, incorrect configurations.
+4. **Formatting and documentation:** Consistent style, proper module/package documentation, code organization.
+5. **Test coverage:** Are new public functions tested? Do test names describe behavior? Are edge cases covered?
+
+## NOT Your Domain (DO NOT SPEND TIME ON)
+
+- Race conditions or temporal ordering — another reviewer handles concurrency
+- Whether the design contradicts other modules — another reviewer handles cross-component semantics
+- Business domain correctness — another reviewer handles domain logic
+- Whether this change could be exploited by malicious input — another reviewer handles adversarial analysis
+
+## Output Rules
+
+- Every finding must cite a specific line and file
+- If the code follows patterns correctly and specs are complete, APPROVE with no findings
+- Severity MAJOR only for: missing types/interfaces on public APIs that callers depend on, pattern violations that would break at runtime, dead code that masks bugs
+- Severity MINOR for: pattern deviations that work but aren't idiomatic, incomplete docs
+- Severity NIT for: style preferences, naming suggestions