# Finding #80: Config-A/B Dispatcher Malfunction in Multi-Model Review Pipeline **Date:** 2026-05-15 **Severity:** HIGH (cost impact, measurement invalidation) **Component:** gargoyle AI review pipeline (PR #776) **Impact:** Phase 2 (lint-suppression) deployment blocked ## Issue Summary The Config-A/B even/odd PR# parity routing mechanism in gargoyle's multi-model review pipeline is **NOT operational**. Instead of alternating reviewers by PR parity, all 6 reviewers fire on all PRs simultaneously, resulting in: - **3.5x API cost overage** (14+ reviews per PR instead of 4) - **Invalidated baseline metrics** (Phase 1 data collected with broken dispatcher) - **Blocked Phase 2 deployment** (can't measure lint-suppression improvement without working parity) ## Expected vs. Actual Behavior ### Expected (Config-A/B Parity) ``` Even PR# (e.g., #784) → Config A only - GPT-5 (investigates) - Opus (judges) - Security reviewer (specialized) Odd PR# (e.g., #781) → Config B only - Opus (investigates) - GPT-5 (judges) - Security reviewer (specialized) ``` ### Actual (Broken Dispatcher) ``` All PR# → ALL 6 reviewers, always - Elixir-otp-reviewer (multiple passes) - Security-reviewer (multiple passes) - Trading-domain-reviewer - Event-sourcing-reviewer - Operational-gaps-reviewer - Structural-reviewer ``` ## Evidence ### PR #784 (DashboardLive Real-Time Monitoring) - **Created:** 2026-05-15 07:24:18Z - **Expected:** Config A (even PR#) - **Actual:** 14+ reviews from all 6 reviewers across multiple passes **Review timeline (fetched from Gitea API):** | Timestamp | Reviewer | State | Issue | |-----------|----------|-------|-------| | 07:24:43 | Elixir-OTP | ✅ APPROVED | Patterns good | | 07:25:33 | Security | ⚠️ REQUEST_CHANGES | Auth/trust missing | | 07:25:58 | Event-sourcing | ✅ APPROVED | Projection layer OK | | 07:25:59 | Trading-domain | ✅ APPROVED | No logic concerns | | 07:26:26 | Structural | ✅ APPROVED | Doc format OK | | 07:27:11 | Operational-gaps | ⚠️ REQUEST_CHANGES | P&L inconsistent | | (Pass 2 triggered by PR update) | | | | 07:36:19 | Elixir-OTP | ⚠️ REQUEST_CHANGES | CI lint-docs failing | | 07:36:56 | Elixir-OTP | ✅ APPROVED | Patterns validated | | 07:38:30 | Security | ⚠️ REQUEST_CHANGES | PubSub hardening | | 07:38:52 | Trading-domain | ⚠️ REQUEST_CHANGES | CI lint-docs failing | | 07:39:33 | Structural | ⚠️ REQUEST_CHANGES | CI + consistency | | 07:40:49 | Operational-gaps | ⚠️ REQUEST_CHANGES | Assumptions unclear | | 07:43:07 | Elixir-OTP | ✅ APPROVED | Lifecycle validated | ## Root Causes (Hypotheses) 1. **PR #776 implementation gap** — Config-A/B parity logic not deployed 2. **Router misconfiguration** — Broadcasts to all reviewers instead of filtering by PR# parity 3. **Webhook configuration** — All reviewers subscribed to global webhook (no parity filter) 4. **Code-to-config mismatch** — Even/odd logic exists in code but not used by dispatcher ## Operational Impact | Metric | Expected | Actual | Δ | Business Impact | |--------|----------|--------|---|---| | Reviews/PR | 4 | 14+ | 3.5x | **Cost: 3.5x API spend** | | Passes/PR | 1 | 2+ | 2x | Slow feedback (multi-pass) | | Config comparison | Measurable | Conflated | — | **Can't measure A vs B** | | Phase 1 baseline | Valid | Questionable | — | **Metrics contaminated** | | Phase 2 deployment | Ready | Blocked | — | **Can't proceed** | ## Real Issues Found (Legitimate) Despite dispatcher malfunction, reviews ARE catching genuine issues in PR #784: **Security concerns:** - PubSub payload validation assumed, not explicit - Message/PubSub surfaces need hardening - Authorization and trust-boundary details missing **Operational gaps:** - Trading-day boundary handling inconsistent - P&L assumptions underspecified - Could produce wrong operational results **CI failures:** - lint-docs check failing (prevents merge) ## Recommendations ### Immediate (Blocking) 1. **Investigate PR #776 implementation** — Verify Config-A/B parity routing deployed 2. **Check gargoyle webhook/router** — Why are all 6 reviewers firing on all PRs? 3. **Cost review** — Confirm if 3.5x API spend is acceptable 4. **Decision:** Fix parity first, or proceed with lint-suppression despite broken dispatcher? ### Short-term (Phase 2) - Revalidate Phase 1 baseline metrics with working dispatcher - Measure true Config-A quality vs. Config-B (currently conflated) - Establish correct baseline before lint-suppression rollout ### Long-term (Process) - Add parity routing verification to PR review checklist - Monitor API costs per PR for anomalies - Automated test: Verify even/odd dispatch ratios in test runs ## Status 🔴 **BLOCKING** — Cannot proceed with Phase 2 (lint-suppression) deployment until parity routing is verified working. **Next checkpoint:** 2026-05-15 ~09:00 UTC after Aaron investigation. --- **Logged by:** rodin (dev-loop cron) **Session:** 5342ac81-4bbc-4e4c-a123-347a7788d50c **Tracker:** `/home/ubuntu/.openclaw/workspace/memory/review-experiments/tracker.md`