Files

T

Rodin 6af8a6ee10 refactor(findings): split ALL-FINDINGS.md into per-experiment files

Break the monolithic 3249-line findings file into 29 individual files,
one per experiment. Each file is named YYYY-MM-DD-NN-slug.md for easy
chronological sorting and discovery.

No content changes — purely structural reorganization.

2026-05-06 07:15:50 -07:00

965 B

Raw Blame History

Finding 1: Different models catch different things (confirmed)

Date: 2026-04-26 Task: PR reviews on DDD reference docs (~6,600 lines across 18 files) How we used them: Both models got the same task via pr-review skill — fetch diff, fetch full file content for changed files, review against PR description and linked issue acceptance criteria. Rich context: full diff, project CLAUDE.md conventions, issue body. Each reviewer ran independently in its own sub-agent with its own Gitea token. No cross-pollination.

GPT-5 caught SUMMARY.md verdict mismatches (Commanded classification, small teams classification) that Sonnet missed entirely (PR #375)
Sonnet caught a broken cross-reference link first that GPT-5 missed (PR #378)
Takeaway: Different blind spots are real. Neither model is strictly better for analytical review — they complement each other. This is why we run two independent reviewers from different model families.

965 B Raw Blame History

Finding 1: Different models catch different things (confirmed)

965 B

Raw Blame History