Files

T

claw 0c632c255a finding #39 : narrow framing does not close Sonnet-GPT-5 gap for semantic consistency

Tested open question from Finding #5: does narrow framing give Sonnet
GPT-5-level semantic analysis?

Result: NO. Narrow framing changes WHAT Sonnet looks for (redirects from
gaps to contradictions) but not HOW WELL it reasons. Sonnet narrow found
3 contradictions but only 1 was genuine (2 were analytical errors/misreads).
GPT-5 found 4 all-genuine findings with precise reasoning.

Key insight: framing controls scope, not reasoning depth. For tasks
requiring logical verification (contradictions, race conditions, invariant
violations), reasoning tokens are necessary — framing alone is insufficient.

Updated open-questions.md: marked Sonnet+narrow as answered, added new
question about Opus+narrow for contradiction detection.

2026-05-07 09:26:08 -07:00

2026-04-26-01-different-models-catch-different-things.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-02-cheap-model-narrow-lens-expensive.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-03-gpt5-times-out-on-complex.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-04-gpt5-defaults-to-delegation-claude.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-05-sonnet-is-fast-and-catches.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-06-single-agent-cant-handle-1000.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-07-emerging-role-assignments-pattern-not.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-27-08-bias-detection-all-models-catch.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-02-09-gapfinding-in-architecture-docs-gpt5.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-02-10-hiddenassumption-identification-gpt5s-reasoning-produces.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-02-11-hiddenassumption-identification-on-simpler-doc.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-02-12-sonnet-46-outperforms-expectations-on.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-07b-token-budget-matters-more-than.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-13-race-condition-identification-opus-excels.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-14-crosscomponent-interaction-analysis-gpt5-mini.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-15-design-coherence-analysis.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-16-specification-completeness-sonnet-45-produces.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-04-18-temporal-boundary-analysis-gpt5-is.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-04-19-union-coverage-test-gpt5-mini.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-04-20-invariant-violation-path-analysis-gpt5.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-04-21-reasoning-effort-lowmediumhigh-has-negligible.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-22-silent-correctness-failures-new-analytical.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-23-regulatory-compliance-analysis-gpt5-finds.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-24-design-improvement-proposals-gpt5-excels.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-25-contradiction-detection-new-task-type.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-26-missingfeature-identification-is-promptable-across.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-27-design-coherence-on-riskcontrolsmd-gpt5.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-28-crossdocument-consistency-analysis-new-task.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-29-adversarial-manipulation-analysis-new-task.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-06-31-spec-gap-analysis-continuous-risk-monitoring.md

finding #31 : spec-gap analysis on continuous-risk-monitoring.md

2026-05-06 08:27:00 -07:00

2026-05-06-32-testability-analysis-wash-sale-tracking.md

experiment #32 : testability analysis — new analytical lens

2026-05-06 10:09:05 -07:00

2026-05-06-33-observability-gap-analysis-aggregation.md

feat: experiment #33 — observability gap analysis on aggregation.md

2026-05-06 11:49:05 -07:00

2026-05-06-34-information-flow-hazard-analysis.md

finding #34 : information flow hazard analysis on lot-accounting.md

2026-05-06 18:29:06 -07:00

2026-05-07-35-adversarial-ensemble-critique-extend.md

finding #35 : adversarial ensemble (critique+extend) produces 30% more coverage

2026-05-06 21:29:17 -07:00

2026-05-07-36-compositional-interface-analysis.md

Finding #36 : Compositional interface analysis - two-document interface assumptions

2026-05-07 02:48:46 -07:00

2026-05-07-37-crossdoc-consistency-tightlycoupled-risk-docs.md

finding 37: cross-doc consistency on tightly coupled risk docs

2026-05-07 04:29:23 -07:00

2026-05-07-38-regulatory-compliance-gap-analysis.md

finding #38 : regulatory compliance gap analysis (FINRA/PDT domain knowledge test)

2026-05-07 07:47:11 -07:00

2026-05-07-39-narrow-framing-does-not-close-sonnet-gpt5-gap.md

finding #39 : narrow framing does not close Sonnet-GPT-5 gap for semantic consistency

2026-05-07 09:26:08 -07:00

README.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

README.md

Model Findings — Analytical & Research Work

Tracking what actually works (and doesn't) when using AI models for research, analysis, bias detection, and document review — not coding.

Started: 2026-04-26

Context

We use multiple models in different roles: Claude Code (Opus/Sonnet) for generation, Sonnet + GPT-5 for independent dual review, smaller models for focused analytical tasks. Most public discussion is about coding. We found almost no published methodology for using models in analytical research tasks (searched 2026-04-26). That gap is why we're tracking this.

Each experiment lives in its own file. See individual finding files below.