5.0 KiB
Finding #80: Config-A/B Dispatcher Malfunction in Multi-Model Review Pipeline
Date: 2026-05-15
Severity: HIGH (cost impact, measurement invalidation)
Component: gargoyle AI review pipeline (PR #776)
Impact: Phase 2 (lint-suppression) deployment blocked
Issue Summary
The Config-A/B even/odd PR# parity routing mechanism in gargoyle's multi-model review pipeline is NOT operational. Instead of alternating reviewers by PR parity, all 6 reviewers fire on all PRs simultaneously, resulting in:
- 3.5x API cost overage (14+ reviews per PR instead of 4)
- Invalidated baseline metrics (Phase 1 data collected with broken dispatcher)
- Blocked Phase 2 deployment (can't measure lint-suppression improvement without working parity)
Expected vs. Actual Behavior
Expected (Config-A/B Parity)
Even PR# (e.g., #784) → Config A only
- GPT-5 (investigates)
- Opus (judges)
- Security reviewer (specialized)
Odd PR# (e.g., #781) → Config B only
- Opus (investigates)
- GPT-5 (judges)
- Security reviewer (specialized)
Actual (Broken Dispatcher)
All PR# → ALL 6 reviewers, always
- Elixir-otp-reviewer (multiple passes)
- Security-reviewer (multiple passes)
- Trading-domain-reviewer
- Event-sourcing-reviewer
- Operational-gaps-reviewer
- Structural-reviewer
Evidence
PR #784 (DashboardLive Real-Time Monitoring)
- Created: 2026-05-15 07:24:18Z
- Expected: Config A (even PR#)
- Actual: 14+ reviews from all 6 reviewers across multiple passes
Review timeline (fetched from Gitea API):
| Timestamp | Reviewer | State | Issue |
|---|---|---|---|
| 07:24:43 | Elixir-OTP | ✅ APPROVED | Patterns good |
| 07:25:33 | Security | ⚠️ REQUEST_CHANGES | Auth/trust missing |
| 07:25:58 | Event-sourcing | ✅ APPROVED | Projection layer OK |
| 07:25:59 | Trading-domain | ✅ APPROVED | No logic concerns |
| 07:26:26 | Structural | ✅ APPROVED | Doc format OK |
| 07:27:11 | Operational-gaps | ⚠️ REQUEST_CHANGES | P&L inconsistent |
| (Pass 2 triggered by PR update) | |||
| 07:36:19 | Elixir-OTP | ⚠️ REQUEST_CHANGES | CI lint-docs failing |
| 07:36:56 | Elixir-OTP | ✅ APPROVED | Patterns validated |
| 07:38:30 | Security | ⚠️ REQUEST_CHANGES | PubSub hardening |
| 07:38:52 | Trading-domain | ⚠️ REQUEST_CHANGES | CI lint-docs failing |
| 07:39:33 | Structural | ⚠️ REQUEST_CHANGES | CI + consistency |
| 07:40:49 | Operational-gaps | ⚠️ REQUEST_CHANGES | Assumptions unclear |
| 07:43:07 | Elixir-OTP | ✅ APPROVED | Lifecycle validated |
Root Causes (Hypotheses)
- PR #776 implementation gap — Config-A/B parity logic not deployed
- Router misconfiguration — Broadcasts to all reviewers instead of filtering by PR# parity
- Webhook configuration — All reviewers subscribed to global webhook (no parity filter)
- Code-to-config mismatch — Even/odd logic exists in code but not used by dispatcher
Operational Impact
| Metric | Expected | Actual | Δ | Business Impact |
|---|---|---|---|---|
| Reviews/PR | 4 | 14+ | 3.5x | Cost: 3.5x API spend |
| Passes/PR | 1 | 2+ | 2x | Slow feedback (multi-pass) |
| Config comparison | Measurable | Conflated | — | Can't measure A vs B |
| Phase 1 baseline | Valid | Questionable | — | Metrics contaminated |
| Phase 2 deployment | Ready | Blocked | — | Can't proceed |
Real Issues Found (Legitimate)
Despite dispatcher malfunction, reviews ARE catching genuine issues in PR #784:
Security concerns:
- PubSub payload validation assumed, not explicit
- Message/PubSub surfaces need hardening
- Authorization and trust-boundary details missing
Operational gaps:
- Trading-day boundary handling inconsistent
- P&L assumptions underspecified
- Could produce wrong operational results
CI failures:
- lint-docs check failing (prevents merge)
Recommendations
Immediate (Blocking)
- Investigate PR #776 implementation — Verify Config-A/B parity routing deployed
- Check gargoyle webhook/router — Why are all 6 reviewers firing on all PRs?
- Cost review — Confirm if 3.5x API spend is acceptable
- Decision: Fix parity first, or proceed with lint-suppression despite broken dispatcher?
Short-term (Phase 2)
- Revalidate Phase 1 baseline metrics with working dispatcher
- Measure true Config-A quality vs. Config-B (currently conflated)
- Establish correct baseline before lint-suppression rollout
Long-term (Process)
- Add parity routing verification to PR review checklist
- Monitor API costs per PR for anomalies
- Automated test: Verify even/odd dispatch ratios in test runs
Status
🔴 BLOCKING — Cannot proceed with Phase 2 (lint-suppression) deployment until parity routing is verified working.
Next checkpoint: 2026-05-15 ~09:00 UTC after Aaron investigation.
Logged by: rodin (dev-loop cron)
Session: 5342ac81-4bbc-4e4c-a123-347a7788d50c
Tracker: /home/ubuntu/.openclaw/workspace/memory/review-experiments/tracker.md