New analytical lens applied to lot-accounting.md (181 lines).
Tests how models identify sequences of individually correct
operations that produce silently wrong financial results.
Results:
- GPT-5: 12 findings (137s, 10688 reasoning tokens) - tax law domain knowledge
- Opus: 8 findings (121s) - concurrent systems / crash recovery focus
- Sonnet: 8 findings (111s) - structural meta-analysis, highest-leverage finding
Key insight: First experiment where domain-specific knowledge (tax law)
is the primary differentiator. Models reason from different knowledge
domains: GPT-5=tax law, Opus=distributed systems, Sonnet=architecture patterns.
Sonnet produced the most architecturally significant finding: that the
system's reconciliation mechanism confirms corruption rather than detecting
it (because it re-derives from LotClosed which is itself the corrupted source).