Files

T

Rodin cfcad67baa feat: add generic review prompts and generation guide

- review-prompts/generic/sonnet.md: language-agnostic structural review
- review-prompts/generic/gpt5.md: language-agnostic semantic/domain review
- review-prompts/generic/opus.md: language-agnostic design coherence review
- review-prompts/GENERATE.md: meta-prompt for tailoring to any repo
- review-prompts/ORCHESTRATION.md: multi-model review orchestration pattern

2026-05-06 08:00:59 -07:00

4.1 KiB

Raw Permalink Blame History

Multi-Model Review Orchestration

When Rodin is asked to review a PR (e.g., "review PR 630", "look at PR #625"), use this orchestration pattern instead of a single-pass review.

Source of Truth

Specialized prompt files live at: ~/.openclaw/workspace/review-prompts/

sonnet.md — structural/pattern review (Sonnet's mandate)
gpt5.md — semantic/domain/concurrency review (GPT-5's mandate)
opus.md — design coherence/contradiction review (Opus's mandate)

These same files are used by CI (review-bot via system-prompt-file). Update one place → both paths improve.

Decision: How Many Models?

PR touches...	Models to run
Tests only, config, deps	Sonnet only (structural)
Application code (non-core)	Sonnet + GPT-5
Core domain (order_management, ledger, risk, decision_engine)	Sonnet + GPT-5 + Opus
Architecture docs or design docs	GPT-5 + Opus (skip Sonnet)
Kill switch, reconciliation, or financial calculations	ALL THREE + narrow deep-pass

Orchestration Steps

1. Gather Context (do this yourself, don't delegate)

Fetch PR metadata, diff, existing reviews (same as Phase 0-1 of pr-review skill)
Identify what files are touched → determines which models to spawn
Fetch linked issue/AC if present

2. Spawn Specialized Sub-Agents

Spawn sub-agents in parallel. Each gets:

The full diff
The relevant prompt file content (read from review-prompts/)
Conventions file (CLAUDE.md)
Patterns (from elixir-patterns/phoenix-conventions repos if applicable)
Instruction: "Output structured findings as JSON. Do not post to Gitea."

sessions_spawn(model="sonnet", task="<sonnet prompt + diff + context>")
sessions_spawn(model="gpt5", task="<gpt5 prompt + diff + context>")
sessions_spawn(model="opus", task="<opus prompt + diff + context>")  # if design PR

3. Synthesize Results

After all sub-agents complete:

Deduplicate — if Sonnet and GPT-5 found the same issue, keep GPT-5's version (deeper explanation) and note "(also caught by Sonnet)"
Rank by severity — BLOCKER > MAJOR > MINOR > NIT
Group by category:
- 🏗️ Structural (from Sonnet)
- 🧠 Semantic/Domain (from GPT-5)
- ⚖️ Design Coherence (from Opus)
Call out unique contributions — "Only GPT-5 caught: ..." / "Only Opus caught: ..."
Actionable fix list: Real bugs → must fix. Theoretical → discuss. Style → fix if cheap.

4. Present to Aaron

Format as a unified report with clear sections. Include:

Overall verdict (APPROVE / REQUEST_CHANGES)
Per-model findings (deduplicated, categorized)
Recommended actions
Any unresolved existing feedback from other reviewers

5. Post (if requested)

If Aaron says "post it" or "looks good, post":

Use the pr-review skill's Phase 6 posting mechanics
Post as a single unified review (not three separate ones)
Use the rodin Gitea token for posting

Narrow Deep-Pass (for financial/safety PRs)

After the main review, if the PR touches financial logic:

Extract ONLY the changed financial logic (strip test code, config, docs)
Ask GPT-5 a single focused question:
- "Can this code produce a silently incorrect financial calculation? Show the specific input that produces a wrong number."
If findings emerge, add them to the report under a "🎯 Deep Analysis" section

Timing Expectations

Configuration	Expected time
Sonnet only	~30s
Sonnet + GPT-5	~60s (parallel)
All three	~90s (parallel, Opus may be faster)
+ Deep pass	+45s (sequential after main review)

What This Replaces

This replaces the old single-pass pr-review for on-demand reviews. The pr-review skill is still used for its Phase 0 (PR identification), Phase 1 (context gathering), Phase 4 (existing feedback), Phase 6 (posting mechanics), and Phase 7 (walk-through). The REVIEW itself (Phase 3) is now multi-model.

The CI twins (review-bot) continue running independently — they're the automated safety net. On-demand reviews are the deep-dive when Aaron wants human-quality analysis.

4.1 KiB Raw Permalink Blame History