Tested GPT-5, Opus, Sonnet on wash-sale-tracking.md spec. Opus found a genuine spec bug (trigger logic described backwards). Confirms pattern: GPT-5 for breadth, Opus for logic contradictions, Sonnet adds no value for systematic analytical tasks.