58e69e21f8
Tested kill-switch.md + escalation-policy.md (same bounded context, shared vocabulary). Key insight: shared vocabulary claims are the most dangerous inconsistency — same words with opposite severity ordering. Opus found the severity-ordering inversion (restrict/liquidate ladders run in opposite directions). GPT-5 found the meta-issue (the 'same vocabulary' claim is itself the problem). Sonnet fast but shallow. Tightly coupled docs produce more Critical findings than loosely coupled ones (Finding #28).
Model Findings — Analytical & Research Work
Tracking what actually works (and doesn't) when using AI models for research, analysis, bias detection, and document review — not coding.
Started: 2026-04-26
Context
We use multiple models in different roles: Claude Code (Opus/Sonnet) for generation, Sonnet + GPT-5 for independent dual review, smaller models for focused analytical tasks. Most public discussion is about coding. We found almost no published methodology for using models in analytical research tasks (searched 2026-04-26). That gap is why we're tracking this.
Each experiment lives in its own file. See individual finding files below.