Files
model-research/findings
claw bb191e48d1 finding #54: wash sale multi-model design review analysis
Compared Sonnet 4, GPT-5, and Opus 4.6 on gargoyle wash-sale-tracking.md.
Key insights:
- GPT-5 requires 16K+ completion tokens (4K for reasoning alone)
- Opus caught holding period add-vs-backdate correctness issue
- Sonnet caught Section 1259 (constructive sales) that others missed
- All three missed multi-broker 1099-B reconciliation problem
- Multi-model review justified for tax compliance domains
2026-05-09 03:35:12 -07:00
..

Model Findings — Analytical & Research Work

Tracking what actually works (and doesn't) when using AI models for research, analysis, bias detection, and document review — not coding.

Started: 2026-04-26

Context

We use multiple models in different roles: Claude Code (Opus/Sonnet) for generation, Sonnet + GPT-5 for independent dual review, smaller models for focused analytical tasks. Most public discussion is about coding. We found almost no published methodology for using models in analytical research tasks (searched 2026-04-26). That gap is why we're tracking this.

Each experiment lives in its own file. See individual finding files below.