Files

T

Rodin ac55ecdb98 Finding 28: Regulatory compliance analysis on wash sale tracking

- GPT-5 most comprehensive on IRS-specific rules (18 findings, 9600 reasoning tokens)
- Sonnet fast first-pass (14 findings in 25s)
- Opus high-density actionable (11 findings with clear remediation)
- New insight: domain expertise tasks favor GPT-5 reasoning depth
- Updated model assignment for compliance review workflow

2026-05-11 00:29:12 -07:00

5.1 KiB

Raw Permalink Blame History

Finding 28: Regulatory Compliance Assumption-Finding on Wash Sale Tracking

Date: 2026-05-11 Task: Regulatory compliance assumption-finding on tax law specification Document: wash-sale-tracking.md (127 lines) Topic: IRC §1091 wash sale detection, disallowed loss computation, holding period tacking

Summary

GPT-5 is most comprehensive on domain-specific IRS rules; Opus produces high-density actionable recommendations; Sonnet is strong on structural compliance gaps.

Results

Model	Time	Output tokens	Reasoning tokens	Findings	Critical	High	Medium
GPT-5	~160s	2,832	9,600	18	2	11	5
Claude Sonnet 4.6	~25s	1,614	(internal)	14	3	5	6
Claude Opus 4.6	~77s	2,688	(internal)	11	2	4	5

Common Ground (All 3 Identified)

"Substantially identical" definition too narrow (misses options, convertibles, ADRs)
No cross-account/cross-entity detection (IRS rule applies to spouse, controlled corp)
Trade date vs settlement date ambiguity
Short sale wash sale rules not addressed
Corporate action interactions (splits, mergers creating replacement lots)
DRIP purchases should trigger wash sale detection

GPT-5 Unique Findings

IRA/Roth IRA permanent disallowance — If replacement is in IRA, loss is PERMANENTLY disallowed (no basis adjustment permitted), not just deferred. The doc's model is incorrect for IRA replacements.
Per-share vs lot-level granularity — Doc says "replacement lot's effective cost basis" but IRS requires per-share matching within lots.
Multiple replacement lot allocation order — IRS requires FIFO chronological matching; doc's simple min() formula doesn't handle multiple replacements.
Pre-sale purchase "still held" requirement — Shares bought 30 days before sale must still be held ON sale date to count as replacement.
Replacement chain propagation — Successive wash sales need to layer basis adjustments through multiple lots.
§475(f) mark-to-market trader carve-out — MTM traders are exempt from wash sale rules.
Global caps across multiple replacement lots — Sum of disallowed losses can't exceed total loss.
Conversions/exercise/assignment as acquisitions — Option exercise counts as acquisition.

Sonnet Unique Findings

Constructive sales and short-against-the-box interactions
Related party trust/estate transactions (IRC §1091 extends to trusts)
Year-end timing boundary issues (December losses with January replacements span tax years)
Broker 1099-B reconciliation differences creating user confusion
Fractional share rounding errors compounding over many wash sales

Opus Unique Findings

Holding period tacking should ADD periods, not just backdate — Doc says "backdated" but correct rule is sum of holding periods
Same-day transaction ordering race condition
(Note: One finding referenced wrong document — rare analytical error)

Key Insights

Domain Expertise Matters

This experiment tested a domain-specific analytical task (IRS tax law) rather than general architectural analysis. The results show:

GPT-5's reasoning tokens excel at regulatory specificity. It consistently cited correct IRS publication sections, code numbers, and regulatory references. The 9,600 reasoning tokens appear spent on VERIFICATION.
Opus is MORE reliable on regulatory tasks than general architecture review. 10/11 findings were precise and actionable. Domain specificity constrains Opus's tendency toward exploratory analysis.
Sonnet is well-suited for compliance first-pass. At 25s for 14 findings, provides good breadth for identifying areas of concern.

Severity Distribution

Model	Critical	High	Medium	Total
GPT-5	2	11	5	18
Sonnet	3	5	6	14
Opus	2	4	5	11

Sonnet rated MORE findings as Critical despite finding fewer issues overall, suggesting different severity calibration.

Practical Implication

Updated Model Assignment for Compliance Review

First pass: Sonnet — Fast, broad coverage, identifies areas of concern
Deep dive: GPT-5 — Regulatory specificity, correct citations, highest precision
Recommendations: Opus — Actionable remediation, design-level scope statements

Prompt Used

Analyze this architecture document for regulatory compliance assumptions.

The document describes wash sale tracking for IRS tax compliance. Your task: identify
assumptions the design makes about regulatory requirements, edge cases where the
implementation might diverge from IRS rules, and places where the simplifications
described could lead to incorrect tax reporting.

Focus on:
1. IRS rule interpretation gaps
2. Edge cases the detection model might miss
3. Calculation edge cases that could produce incorrect disallowed amounts
4. Cross-account/cross-entity issues
5. Timing/settlement date issues
6. Corporate action interactions with wash sale tracking

For each finding, specify:
- The assumption or gap
- The relevant IRS rule or section (if known)
- How it could lead to incorrect tax reporting
- Severity (Critical/High/Medium/Low)

5.1 KiB Raw Permalink Blame History