Files
Rodin cfcad67baa feat: add generic review prompts and generation guide
- review-prompts/generic/sonnet.md: language-agnostic structural review
- review-prompts/generic/gpt5.md: language-agnostic semantic/domain review
- review-prompts/generic/opus.md: language-agnostic design coherence review
- review-prompts/GENERATE.md: meta-prompt for tailoring to any repo
- review-prompts/ORCHESTRATION.md: multi-model review orchestration pattern
2026-05-06 08:00:59 -07:00

4.1 KiB

Multi-Model Review Orchestration

When Rodin is asked to review a PR (e.g., "review PR 630", "look at PR #625"), use this orchestration pattern instead of a single-pass review.

Source of Truth

Specialized prompt files live at: ~/.openclaw/workspace/review-prompts/

  • sonnet.md — structural/pattern review (Sonnet's mandate)
  • gpt5.md — semantic/domain/concurrency review (GPT-5's mandate)
  • opus.md — design coherence/contradiction review (Opus's mandate)

These same files are used by CI (review-bot via system-prompt-file). Update one place → both paths improve.

Decision: How Many Models?

PR touches... Models to run
Tests only, config, deps Sonnet only (structural)
Application code (non-core) Sonnet + GPT-5
Core domain (order_management, ledger, risk, decision_engine) Sonnet + GPT-5 + Opus
Architecture docs or design docs GPT-5 + Opus (skip Sonnet)
Kill switch, reconciliation, or financial calculations ALL THREE + narrow deep-pass

Orchestration Steps

1. Gather Context (do this yourself, don't delegate)

  • Fetch PR metadata, diff, existing reviews (same as Phase 0-1 of pr-review skill)
  • Identify what files are touched → determines which models to spawn
  • Fetch linked issue/AC if present

2. Spawn Specialized Sub-Agents

Spawn sub-agents in parallel. Each gets:

  • The full diff
  • The relevant prompt file content (read from review-prompts/)
  • Conventions file (CLAUDE.md)
  • Patterns (from elixir-patterns/phoenix-conventions repos if applicable)
  • Instruction: "Output structured findings as JSON. Do not post to Gitea."
sessions_spawn(model="sonnet", task="<sonnet prompt + diff + context>")
sessions_spawn(model="gpt5", task="<gpt5 prompt + diff + context>")
sessions_spawn(model="opus", task="<opus prompt + diff + context>")  # if design PR

3. Synthesize Results

After all sub-agents complete:

  1. Deduplicate — if Sonnet and GPT-5 found the same issue, keep GPT-5's version (deeper explanation) and note "(also caught by Sonnet)"
  2. Rank by severity — BLOCKER > MAJOR > MINOR > NIT
  3. Group by category:
    • 🏗️ Structural (from Sonnet)
    • 🧠 Semantic/Domain (from GPT-5)
    • ⚖️ Design Coherence (from Opus)
  4. Call out unique contributions — "Only GPT-5 caught: ..." / "Only Opus caught: ..."
  5. Actionable fix list: Real bugs → must fix. Theoretical → discuss. Style → fix if cheap.

4. Present to Aaron

Format as a unified report with clear sections. Include:

  • Overall verdict (APPROVE / REQUEST_CHANGES)
  • Per-model findings (deduplicated, categorized)
  • Recommended actions
  • Any unresolved existing feedback from other reviewers

5. Post (if requested)

If Aaron says "post it" or "looks good, post":

  • Use the pr-review skill's Phase 6 posting mechanics
  • Post as a single unified review (not three separate ones)
  • Use the rodin Gitea token for posting

Narrow Deep-Pass (for financial/safety PRs)

After the main review, if the PR touches financial logic:

  1. Extract ONLY the changed financial logic (strip test code, config, docs)
  2. Ask GPT-5 a single focused question:
    • "Can this code produce a silently incorrect financial calculation? Show the specific input that produces a wrong number."
  3. If findings emerge, add them to the report under a "🎯 Deep Analysis" section

Timing Expectations

Configuration Expected time
Sonnet only ~30s
Sonnet + GPT-5 ~60s (parallel)
All three ~90s (parallel, Opus may be faster)
+ Deep pass +45s (sequential after main review)

What This Replaces

This replaces the old single-pass pr-review for on-demand reviews. The pr-review skill is still used for its Phase 0 (PR identification), Phase 1 (context gathering), Phase 4 (existing feedback), Phase 6 (posting mechanics), and Phase 7 (walk-through). The REVIEW itself (Phase 3) is now multi-model.

The CI twins (review-bot) continue running independently — they're the automated safety net. On-demand reviews are the deep-dive when Aaron wants human-quality analysis.