Files
model-research/findings/2026-05-15-config-ab-dispatcher-malfunction.md
T

5.0 KiB

Finding #80: Config-A/B Dispatcher Malfunction in Multi-Model Review Pipeline

Date: 2026-05-15
Severity: HIGH (cost impact, measurement invalidation)
Component: gargoyle AI review pipeline (PR #776)
Impact: Phase 2 (lint-suppression) deployment blocked

Issue Summary

The Config-A/B even/odd PR# parity routing mechanism in gargoyle's multi-model review pipeline is NOT operational. Instead of alternating reviewers by PR parity, all 6 reviewers fire on all PRs simultaneously, resulting in:

  • 3.5x API cost overage (14+ reviews per PR instead of 4)
  • Invalidated baseline metrics (Phase 1 data collected with broken dispatcher)
  • Blocked Phase 2 deployment (can't measure lint-suppression improvement without working parity)

Expected vs. Actual Behavior

Expected (Config-A/B Parity)

Even PR# (e.g., #784) → Config A only
- GPT-5 (investigates)
- Opus (judges)
- Security reviewer (specialized)

Odd PR# (e.g., #781) → Config B only
- Opus (investigates)  
- GPT-5 (judges)
- Security reviewer (specialized)

Actual (Broken Dispatcher)

All PR# → ALL 6 reviewers, always
- Elixir-otp-reviewer (multiple passes)
- Security-reviewer (multiple passes)
- Trading-domain-reviewer
- Event-sourcing-reviewer
- Operational-gaps-reviewer
- Structural-reviewer

Evidence

PR #784 (DashboardLive Real-Time Monitoring)

  • Created: 2026-05-15 07:24:18Z
  • Expected: Config A (even PR#)
  • Actual: 14+ reviews from all 6 reviewers across multiple passes

Review timeline (fetched from Gitea API):

Timestamp Reviewer State Issue
07:24:43 Elixir-OTP APPROVED Patterns good
07:25:33 Security ⚠️ REQUEST_CHANGES Auth/trust missing
07:25:58 Event-sourcing APPROVED Projection layer OK
07:25:59 Trading-domain APPROVED No logic concerns
07:26:26 Structural APPROVED Doc format OK
07:27:11 Operational-gaps ⚠️ REQUEST_CHANGES P&L inconsistent
(Pass 2 triggered by PR update)
07:36:19 Elixir-OTP ⚠️ REQUEST_CHANGES CI lint-docs failing
07:36:56 Elixir-OTP APPROVED Patterns validated
07:38:30 Security ⚠️ REQUEST_CHANGES PubSub hardening
07:38:52 Trading-domain ⚠️ REQUEST_CHANGES CI lint-docs failing
07:39:33 Structural ⚠️ REQUEST_CHANGES CI + consistency
07:40:49 Operational-gaps ⚠️ REQUEST_CHANGES Assumptions unclear
07:43:07 Elixir-OTP APPROVED Lifecycle validated

Root Causes (Hypotheses)

  1. PR #776 implementation gap — Config-A/B parity logic not deployed
  2. Router misconfiguration — Broadcasts to all reviewers instead of filtering by PR# parity
  3. Webhook configuration — All reviewers subscribed to global webhook (no parity filter)
  4. Code-to-config mismatch — Even/odd logic exists in code but not used by dispatcher

Operational Impact

Metric Expected Actual Δ Business Impact
Reviews/PR 4 14+ 3.5x Cost: 3.5x API spend
Passes/PR 1 2+ 2x Slow feedback (multi-pass)
Config comparison Measurable Conflated Can't measure A vs B
Phase 1 baseline Valid Questionable Metrics contaminated
Phase 2 deployment Ready Blocked Can't proceed

Real Issues Found (Legitimate)

Despite dispatcher malfunction, reviews ARE catching genuine issues in PR #784:

Security concerns:

  • PubSub payload validation assumed, not explicit
  • Message/PubSub surfaces need hardening
  • Authorization and trust-boundary details missing

Operational gaps:

  • Trading-day boundary handling inconsistent
  • P&L assumptions underspecified
  • Could produce wrong operational results

CI failures:

  • lint-docs check failing (prevents merge)

Recommendations

Immediate (Blocking)

  1. Investigate PR #776 implementation — Verify Config-A/B parity routing deployed
  2. Check gargoyle webhook/router — Why are all 6 reviewers firing on all PRs?
  3. Cost review — Confirm if 3.5x API spend is acceptable
  4. Decision: Fix parity first, or proceed with lint-suppression despite broken dispatcher?

Short-term (Phase 2)

  • Revalidate Phase 1 baseline metrics with working dispatcher
  • Measure true Config-A quality vs. Config-B (currently conflated)
  • Establish correct baseline before lint-suppression rollout

Long-term (Process)

  • Add parity routing verification to PR review checklist
  • Monitor API costs per PR for anomalies
  • Automated test: Verify even/odd dispatch ratios in test runs

Status

🔴 BLOCKING — Cannot proceed with Phase 2 (lint-suppression) deployment until parity routing is verified working.

Next checkpoint: 2026-05-15 ~09:00 UTC after Aaron investigation.


Logged by: rodin (dev-loop cron)
Session: 5342ac81-4bbc-4e4c-a123-347a7788d50c
Tracker: /home/ubuntu/.openclaw/workspace/memory/review-experiments/tracker.md