Files
model-research/findings
Rodin 2988f31fc3 finding 59: convention rule gap analysis
New task type: analyzing prescriptive/specification documents for completeness.

- GPT-5 dominates with exhaustive enumeration (34 findings)
- Opus traces gaps to consequences (routing failures, compiler issues)
- Sonnet surface-level (not recommended for thorough analysis)

Key insight: GPT-5 found internal contradiction (telemetry verb rule vs example)
that neither Claude model caught. Opus unique in tracing PubSub collision
to actual routing failure scenario.

Task taxonomy: convention gap analysis follows same pattern as architecture
docs - GPT-5 for coverage, Opus for consequences.
2026-05-09 17:28:53 -07:00
..

Model Findings — Analytical & Research Work

Tracking what actually works (and doesn't) when using AI models for research, analysis, bias detection, and document review — not coding.

Started: 2026-04-26

Context

We use multiple models in different roles: Claude Code (Opus/Sonnet) for generation, Sonnet + GPT-5 for independent dual review, smaller models for focused analytical tasks. Most public discussion is about coding. We found almost no published methodology for using models in analytical research tasks (searched 2026-04-26). That gap is why we're tracking this.

Each experiment lives in its own file. See individual finding files below.