Files

T

Rodin 8e64f8f012 finding(79): multi-model security review catches HTTPS bypass in GitHub client (PR #131 )

Security-review-bot persona caught HTTPS enforcement inconsistency in write-path
methods (PostReview, DeleteReview, RequestReviewer) that generalist reviewers missed.
Issue fixed within 30 minutes, all reviewers re-approved. Validates specialized
security persona value in multi-model pipeline.

2026-05-14 21:56:58 +00:00

2026-04-26-01-different-models-catch-different-things.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-02-cheap-model-narrow-lens-expensive.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-03-gpt5-times-out-on-complex.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-04-gpt5-defaults-to-delegation-claude.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-05-sonnet-is-fast-and-catches.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-06-single-agent-cant-handle-1000.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-26-07-emerging-role-assignments-pattern-not.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-04-27-08-bias-detection-all-models-catch.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-02-09-gapfinding-in-architecture-docs-gpt5.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-02-10-hiddenassumption-identification-gpt5s-reasoning-produces.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-02-11-hiddenassumption-identification-on-simpler-doc.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-02-12-sonnet-46-outperforms-expectations-on.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-07b-token-budget-matters-more-than.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-13-race-condition-identification-opus-excels.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-14-crosscomponent-interaction-analysis-gpt5-mini.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-15-design-coherence-analysis.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-03-16-specification-completeness-sonnet-45-produces.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-04-18-temporal-boundary-analysis-gpt5-is.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-04-19-union-coverage-test-gpt5-mini.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-04-20-invariant-violation-path-analysis-gpt5.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-04-21-reasoning-effort-lowmediumhigh-has-negligible.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-22-silent-correctness-failures-new-analytical.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-23-regulatory-compliance-analysis-gpt5-finds.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-24-design-improvement-proposals-gpt5-excels.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-25-contradiction-detection-new-task-type.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-26-missingfeature-identification-is-promptable-across.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-27-design-coherence-on-riskcontrolsmd-gpt5.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-28-crossdocument-consistency-analysis-new-task.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-05-29-adversarial-manipulation-analysis-new-task.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

2026-05-06-31-spec-gap-analysis-continuous-risk-monitoring.md

finding #31 : spec-gap analysis on continuous-risk-monitoring.md

2026-05-06 08:27:00 -07:00

2026-05-06-32-testability-analysis-wash-sale-tracking.md

experiment #32 : testability analysis — new analytical lens

2026-05-06 10:09:05 -07:00

2026-05-06-33-observability-gap-analysis-aggregation.md

feat: experiment #33 — observability gap analysis on aggregation.md

2026-05-06 11:49:05 -07:00

2026-05-06-34-information-flow-hazard-analysis.md

finding #34 : information flow hazard analysis on lot-accounting.md

2026-05-06 18:29:06 -07:00

2026-05-07-35-adversarial-ensemble-critique-extend.md

finding #35 : adversarial ensemble (critique+extend) produces 30% more coverage

2026-05-06 21:29:17 -07:00

2026-05-07-36-compositional-interface-analysis.md

Finding #36 : Compositional interface analysis - two-document interface assumptions

2026-05-07 02:48:46 -07:00

2026-05-07-37-crossdoc-consistency-tightlycoupled-risk-docs.md

finding 37: cross-doc consistency on tightly coupled risk docs

2026-05-07 04:29:23 -07:00

2026-05-07-38-regulatory-compliance-gap-analysis.md

finding #38 : regulatory compliance gap analysis (FINRA/PDT domain knowledge test)

2026-05-07 07:47:11 -07:00

2026-05-07-39-narrow-framing-does-not-close-sonnet-gpt5-gap.md

finding #39 : narrow framing does not close Sonnet-GPT-5 gap for semantic consistency

2026-05-07 09:26:08 -07:00

2026-05-07-40-silent-data-corruption-paths-financial-accounting.md

Finding #40 : Silent data corruption paths in financial accounting

2026-05-07 11:09:58 -07:00

2026-05-07-41-temporal-ordering-dependency-analysis.md

finding 41: temporal ordering dependency analysis on kill-switch.md

2026-05-07 12:47:03 -07:00

2026-05-07-42-failure-propagation-chain-analysis.md

finding #42 : failure propagation chain analysis on system-overview.md

2026-05-07 14:28:26 -07:00

2026-05-07-43-opus-narrow-contradiction-detection.md

finding #43 : opus + narrow framing for contradiction detection

2026-05-07 16:05:14 -07:00

2026-05-07-44-cross-doc-consistency-subtle-contradictions.md

finding 44: cross-doc consistency on closely related docs

2026-05-07 19:27:20 -07:00

2026-05-07-45-operator-decision-support-gap-analysis.md

finding 45: operator decision support gap analysis — new task type

2026-05-07 21:07:46 -07:00

2026-05-08-46-operational-blind-spot-analysis-observability.md

finding #46 : operational blind spot analysis — new task type

2026-05-08 00:27:23 -07:00

2026-05-08-47-emergent-behavior-rule-composition.md

finding 47: emergent behavior from rule composition - new analytical lens

2026-05-08 02:06:25 -07:00

2026-05-08-48-defense-in-depth-gap-analysis.md

finding #48 : defense-in-depth gap analysis on auth-and-credentials.md

2026-05-08 03:47:09 -07:00

2026-05-08-49-adversarial-evasion-tampering-audit-log.md

finding 49: adversarial evasion/tampering analysis on audit-log.md

2026-05-08 09:09:58 -07:00

2026-05-08-50-concurrency-race-condition-analysis.md

finding 50: concurrency and race condition analysis lens

2026-05-08 11:06:06 -07:00

2026-05-08-51-implementation-ambiguity-analysis.md

finding 51: implementation ambiguity analysis — new analytical lens

2026-05-08 12:46:32 -07:00

2026-05-08-52-degraded-mode-propagation-analysis.md

finding #52 : degraded-mode propagation analysis (new lens)

2026-05-08 14:29:29 -07:00

2026-05-09-53-unstated-constraints.md

Add finding #53 : unstated constraint detection on state machines

2026-05-08 23:47:51 -07:00

2026-05-09-54-wash-sale-multi-model-analysis.md

finding #54 : wash sale multi-model design review analysis

2026-05-09 03:35:12 -07:00

2026-05-09-55-state-reconstruction-correctness.md

experiment #55 : state reconstruction correctness — new analytical lens

2026-05-09 05:06:45 -07:00

2026-05-09-56-operational-burden-analysis.md

Finding #56 : Operational burden analysis - new analytical lens

2026-05-09 06:46:29 -07:00

2026-05-09-57-event-flow-correctness-analysis.md

Finding #57 : Event flow correctness analysis - new analytical lens

2026-05-09 13:29:58 -07:00

2026-05-09-58-state-machine-completeness-analysis.md

Finding 58: State machine completeness analysis on kill-switch.md

2026-05-09 15:06:32 -07:00

2026-05-09-59-convention-rule-gap-analysis.md

finding 59: convention rule gap analysis

2026-05-09 17:28:53 -07:00

2026-05-09-60-counterfactual-event-ordering-analysis.md

Add finding #60 : Counterfactual event ordering analysis

2026-05-09 18:28:40 -07:00

2026-05-09-61-regulatory-completeness-analysis.md

finding #61 : regulatory completeness analysis lens

2026-05-09 20:06:51 -07:00

2026-05-09-62-data-integrity-signal-flow.md

Finding #62 : Data integrity analysis on signal-lifecycle.md

2026-05-09 22:26:46 -07:00

2026-05-10-63-external-system-assumptions.md

Finding #63 : External System Assumptions Analysis

2026-05-10 02:27:53 -07:00

2026-05-10-64-regulatory-implementation-gap-analysis.md

Finding #64 : Regulatory implementation gap analysis

2026-05-10 12:30:20 -07:00

2026-05-10-64-specification-gap-analysis.md

Finding #64 : Specification gap analysis - new analytical lens

2026-05-10 11:10:33 -07:00

2026-05-10-65-concurrent-write-hazards-event-sourcing.md

Add finding #65 : concurrent write hazards in event sourcing

2026-05-10 11:48:41 -07:00

2026-05-10-65-temporal-correctness-analysis.md

Add finding #65 : Temporal correctness analysis (new lens)

2026-05-10 14:50:56 -07:00

2026-05-10-68-cross-context-contract-coherence.md

Finding #68 : Cross-context contract coherence analysis

2026-05-10 21:47:27 -07:00

2026-05-10-boundary-contract-analysis.md

Add Finding #62 : Boundary contract analysis (new analytical lens)

2026-05-09 23:35:36 -07:00

2026-05-10-boundary-violation-analysis.md

Add Finding #30 : Boundary violation analysis on context README

2026-05-10 17:28:54 -07:00

2026-05-10-inter-document-contradiction-analysis.md

Add finding #67 : Inter-document contradiction analysis

2026-05-10 18:32:45 -07:00

2026-05-10-security-boundary-analysis.md

Add security boundary analysis experiment (2026-05-10)

2026-05-10 16:05:45 -07:00

2026-05-11-audit-log-data-integrity-analysis.md

Add finding #25 : Data integrity analysis on audit-log.md

2026-05-11 08:49:32 -07:00

2026-05-11-wash-sale-regulatory-compliance.md

Finding 28: Regulatory compliance analysis on wash sale tracking

2026-05-11 00:29:12 -07:00

2026-05-14-cgn-proxy-ssrf-multimodel-catch.md

finding #79 : multi-model security review catches CGN + proxy-assisted SSRF gaps

2026-05-14 12:24:54 +00:00

2026-05-14-dev-loop-effectiveness-analysis.md

data: dev loop effectiveness analysis (2026-05-14)

2026-05-14 06:54:42 +00:00

2026-05-14-github-client-https-bypass-multimodel-catch.md

finding(79): multi-model security review catches HTTPS bypass in GitHub client (PR #131 )

2026-05-14 21:56:58 +00:00

README.md

refactor(findings): split ALL-FINDINGS.md into per-experiment files

2026-05-06 07:15:50 -07:00

README.md

Model Findings — Analytical & Research Work

Tracking what actually works (and doesn't) when using AI models for research, analysis, bias detection, and document review — not coding.

Started: 2026-04-26

Context

We use multiple models in different roles: Claude Code (Opus/Sonnet) for generation, Sonnet + GPT-5 for independent dual review, smaller models for focused analytical tasks. Most public discussion is about coding. We found almost no published methodology for using models in analytical research tasks (searched 2026-04-26). That gap is why we're tracking this.

Each experiment lives in its own file. See individual finding files below.