Files
model-research/findings
Rodin 873591877d Finding #64: Specification gap analysis - new analytical lens
Tested GPT-5, Opus, Sonnet on specid-lot-selection.md (125 lines)
for implementation specification gaps.

Key findings:
- Opus most cost-effective (4.6 gaps/1K tokens vs 1.8 for GPT-5)
- GPT-5 catches operational/financial edge cases (fees, multi-execution)
- Opus catches design-level binding ambiguities
- Sonnet too shallow for serious spec review

New lens distinct from hidden assumptions and race conditions:
focuses on ambiguity of intent, not risks.
2026-05-10 11:10:33 -07:00
..

Model Findings — Analytical & Research Work

Tracking what actually works (and doesn't) when using AI models for research, analysis, bias detection, and document review — not coding.

Started: 2026-04-26

Context

We use multiple models in different roles: Claude Code (Opus/Sonnet) for generation, Sonnet + GPT-5 for independent dual review, smaller models for focused analytical tasks. Most public discussion is about coding. We found almost no published methodology for using models in analytical research tasks (searched 2026-04-26). That gap is why we're tracking this.

Each experiment lives in its own file. See individual finding files below.