New task type testing distributed systems consistency analysis.
GPT-5 found 18 issues (with 4,416 reasoning tokens), Sonnet found 13.
Key insight: distributed systems reasoning benefits from extended
reasoning - Sonnet at 72% of GPT-5 count, similar to race condition
analysis (58%) and worse than assumption-finding (85%).