Files

T

Rodin afbc013e2e finding #80 : config-a/b dispatcher malfunction detected in multi-model review pipeline (3.5x cost overage)

2026-05-15 08:37:01 +00:00

5.0 KiB

Raw Blame History

Finding #80: Config-A/B Dispatcher Malfunction in Multi-Model Review Pipeline

Date: 2026-05-15
Severity: HIGH (cost impact, measurement invalidation)
Component: gargoyle AI review pipeline (PR #776)
Impact: Phase 2 (lint-suppression) deployment blocked

Issue Summary

The Config-A/B even/odd PR# parity routing mechanism in gargoyle's multi-model review pipeline is NOT operational. Instead of alternating reviewers by PR parity, all 6 reviewers fire on all PRs simultaneously, resulting in:

3.5x API cost overage (14+ reviews per PR instead of 4)
Invalidated baseline metrics (Phase 1 data collected with broken dispatcher)
Blocked Phase 2 deployment (can't measure lint-suppression improvement without working parity)

Expected vs. Actual Behavior

Expected (Config-A/B Parity)

Even PR# (e.g., #784) → Config A only
- GPT-5 (investigates)
- Opus (judges)
- Security reviewer (specialized)

Odd PR# (e.g., #781) → Config B only
- Opus (investigates)  
- GPT-5 (judges)
- Security reviewer (specialized)

Actual (Broken Dispatcher)

All PR# → ALL 6 reviewers, always
- Elixir-otp-reviewer (multiple passes)
- Security-reviewer (multiple passes)
- Trading-domain-reviewer
- Event-sourcing-reviewer
- Operational-gaps-reviewer
- Structural-reviewer

Evidence

PR #784 (DashboardLive Real-Time Monitoring)

Created: 2026-05-15 07:24:18Z
Expected: Config A (even PR#)
Actual: 14+ reviews from all 6 reviewers across multiple passes

Review timeline (fetched from Gitea API):

Timestamp	Reviewer	State	Issue
07:24:43	Elixir-OTP	✅ APPROVED	Patterns good
07:25:33	Security	⚠️ REQUEST_CHANGES	Auth/trust missing
07:25:58	Event-sourcing	✅ APPROVED	Projection layer OK
07:25:59	Trading-domain	✅ APPROVED	No logic concerns
07:26:26	Structural	✅ APPROVED	Doc format OK
07:27:11	Operational-gaps	⚠️ REQUEST_CHANGES	P&L inconsistent
(Pass 2 triggered by PR update)
07:36:19	Elixir-OTP	⚠️ REQUEST_CHANGES	CI lint-docs failing
07:36:56	Elixir-OTP	✅ APPROVED	Patterns validated
07:38:30	Security	⚠️ REQUEST_CHANGES	PubSub hardening
07:38:52	Trading-domain	⚠️ REQUEST_CHANGES	CI lint-docs failing
07:39:33	Structural	⚠️ REQUEST_CHANGES	CI + consistency
07:40:49	Operational-gaps	⚠️ REQUEST_CHANGES	Assumptions unclear
07:43:07	Elixir-OTP	✅ APPROVED	Lifecycle validated

Root Causes (Hypotheses)

PR #776 implementation gap — Config-A/B parity logic not deployed
Router misconfiguration — Broadcasts to all reviewers instead of filtering by PR# parity
Webhook configuration — All reviewers subscribed to global webhook (no parity filter)
Code-to-config mismatch — Even/odd logic exists in code but not used by dispatcher

Operational Impact

Metric	Expected	Actual	Δ	Business Impact
Reviews/PR	4	14+	3.5x	Cost: 3.5x API spend
Passes/PR	1	2+	2x	Slow feedback (multi-pass)
Config comparison	Measurable	Conflated	—	Can't measure A vs B
Phase 1 baseline	Valid	Questionable	—	Metrics contaminated
Phase 2 deployment	Ready	Blocked	—	Can't proceed

Real Issues Found (Legitimate)

Despite dispatcher malfunction, reviews ARE catching genuine issues in PR #784:

Security concerns:

PubSub payload validation assumed, not explicit
Message/PubSub surfaces need hardening
Authorization and trust-boundary details missing

Operational gaps:

Trading-day boundary handling inconsistent
P&L assumptions underspecified
Could produce wrong operational results

CI failures:

lint-docs check failing (prevents merge)

Recommendations

Immediate (Blocking)

Investigate PR #776 implementation — Verify Config-A/B parity routing deployed
Check gargoyle webhook/router — Why are all 6 reviewers firing on all PRs?
Cost review — Confirm if 3.5x API spend is acceptable
Decision: Fix parity first, or proceed with lint-suppression despite broken dispatcher?

Short-term (Phase 2)

Revalidate Phase 1 baseline metrics with working dispatcher
Measure true Config-A quality vs. Config-B (currently conflated)
Establish correct baseline before lint-suppression rollout

Long-term (Process)

Add parity routing verification to PR review checklist
Monitor API costs per PR for anomalies
Automated test: Verify even/odd dispatch ratios in test runs

Status

🔴 BLOCKING — Cannot proceed with Phase 2 (lint-suppression) deployment until parity routing is verified working.

Next checkpoint: 2026-05-15 ~09:00 UTC after Aaron investigation.

Logged by: rodin (dev-loop cron)
Session: 5342ac81-4bbc-4e4c-a123-347a7788d50c
Tracker: /home/ubuntu/.openclaw/workspace/memory/review-experiments/tracker.md

5.0 KiB Raw Blame History