Commit Graph

9 Commits

Author SHA1 Message Date
Rodin 8cfabfdc55 experiment #32: testability analysis — new analytical lens
Tested GPT-5, Opus, Sonnet on wash-sale-tracking.md spec.
Opus found a genuine spec bug (trigger logic described backwards).
Confirms pattern: GPT-5 for breadth, Opus for logic contradictions,
Sonnet adds no value for systematic analytical tasks.
2026-05-06 10:09:05 -07:00
Rodin ee3063997a finding #31: spec-gap analysis on continuous-risk-monitoring.md
New task type: specification gap/completeness analysis (vs adversarial gaming).
GPT-5 dominates count (25 findings), Opus produces best single insight
(realized P&L non-reversibility violates de-escalation model assumption).
Sonnet adds no unique value for this task type — skip for completeness audits.
2026-05-06 08:27:00 -07:00
Rodin cfcad67baa feat: add generic review prompts and generation guide
- review-prompts/generic/sonnet.md: language-agnostic structural review
- review-prompts/generic/gpt5.md: language-agnostic semantic/domain review
- review-prompts/generic/opus.md: language-agnostic design coherence review
- review-prompts/GENERATE.md: meta-prompt for tailoring to any repo
- review-prompts/ORCHESTRATION.md: multi-model review orchestration pattern
2026-05-06 08:00:59 -07:00
Rodin a3aebc7cc1 docs(readme): add Reports section with links to REPORT.md and LESSONS.md
Explains what each file contains, that they're auto-regenerated weekly,
and includes generation timestamps.
2026-05-06 07:29:03 -07:00
Rodin b832f32a16 docs: add generation timestamps to REPORT.md and LESSONS.md 2026-05-06 07:26:48 -07:00
Rodin f865a0d778 docs: add research report and actionable lessons summary
REPORT.md — full analysis of 29 experiments: model strengths, task-type
mappings, meta-findings, cost-effectiveness, and open questions.

LESSONS.md — distilled operational playbook: which model for which task,
anti-patterns, decision framework, and the three core rules.
2026-05-06 07:24:12 -07:00
Rodin 6af8a6ee10 refactor(findings): split ALL-FINDINGS.md into per-experiment files
Break the monolithic 3249-line findings file into 29 individual files,
one per experiment. Each file is named YYYY-MM-DD-NN-slug.md for easy
chronological sorting and discovery.

No content changes — purely structural reorganization.
2026-05-06 07:15:50 -07:00
Rodin 1b108ff66e Initial publish: 29 findings, 6 prompts, methodology, open questions
Full comparative analysis of GPT-5, Claude Opus 4.6, Claude Sonnet 4.6,
GPT-4.1, and GPT-4.1 Mini on analytical tasks (not coding).

Contents:
- findings/ALL-FINDINGS.md — complete 3,249-line research log with all
  29 findings, methodology notes, and open questions
- prompts/ — 6 exact prompts used across experiments
- methodology.md — experimental setup and evaluation criteria
- open-questions.md — unanswered questions for future work
- README.md — overview and summary table

Key findings:
- Cross-document consistency: Opus is 2.4x faster with more findings
- Gap-finding: GPT-5 reasoning tokens find domain-specific gaps
- Race conditions: Opus excels at temporal interaction reasoning
- Bias detection: Signal-to-noise ratio > model capability
- Adversarial analysis: GPT-5 exhaustive, Opus qualitatively different

Signed-off-by: Rodin
2026-05-05 19:13:03 -07:00
rodin 4aea0d004b Initial commit 2026-05-06 02:10:14 +00:00