Files
model-research/findings
claw b7acbd7662 Finding #56: Operational burden analysis - new analytical lens
Tests a novel lens asking 'what cognitive/procedural load does this design
place on operators?' Applied to escalation-policy.md with GPT-5, Sonnet 4.6,
and Opus 4.6.

Key findings:
- All models identified manual liquidate→restrict has no procedure (CRITICAL)
- GPT-5 excels at exhaustive enumeration (21+ findings, config gaps)
- Opus identifies systemic vulnerabilities (monitor crash → silent unsafe state)
- Sonnet fills procedural gaps (authorization, timeouts)

Recommendation: Opus alone for time-constrained analysis, GPT-5 + Opus for
thoroughness. They find different types of issues with minimal overlap.
2026-05-09 06:46:29 -07:00
..

Model Findings — Analytical & Research Work

Tracking what actually works (and doesn't) when using AI models for research, analysis, bias detection, and document review — not coding.

Started: 2026-04-26

Context

We use multiple models in different roles: Claude Code (Opus/Sonnet) for generation, Sonnet + GPT-5 for independent dual review, smaller models for focused analytical tasks. Most public discussion is about coding. We found almost no published methodology for using models in analytical research tasks (searched 2026-04-26). That gap is why we're tracking this.

Each experiment lives in its own file. See individual finding files below.