Finding 28: Regulatory compliance analysis on wash sale tracking
- GPT-5 most comprehensive on IRS-specific rules (18 findings, 9600 reasoning tokens) - Sonnet fast first-pass (14 findings in 25s) - Opus high-density actionable (11 findings with clear remediation) - New insight: domain expertise tasks favor GPT-5 reasoning depth - Updated model assignment for compliance review workflow
This commit is contained in:
@@ -0,0 +1,107 @@
|
||||
# Finding 28: Regulatory Compliance Assumption-Finding on Wash Sale Tracking
|
||||
|
||||
**Date:** 2026-05-11
|
||||
**Task:** Regulatory compliance assumption-finding on tax law specification
|
||||
**Document:** `wash-sale-tracking.md` (127 lines)
|
||||
**Topic:** IRC §1091 wash sale detection, disallowed loss computation, holding period tacking
|
||||
|
||||
## Summary
|
||||
|
||||
GPT-5 is most comprehensive on domain-specific IRS rules; Opus produces high-density actionable recommendations; Sonnet is strong on structural compliance gaps.
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Time | Output tokens | Reasoning tokens | Findings | Critical | High | Medium |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| GPT-5 | ~160s | 2,832 | 9,600 | 18 | 2 | 11 | 5 |
|
||||
| Claude Sonnet 4.6 | ~25s | 1,614 | (internal) | 14 | 3 | 5 | 6 |
|
||||
| Claude Opus 4.6 | ~77s | 2,688 | (internal) | 11 | 2 | 4 | 5 |
|
||||
|
||||
## Common Ground (All 3 Identified)
|
||||
|
||||
- "Substantially identical" definition too narrow (misses options, convertibles, ADRs)
|
||||
- No cross-account/cross-entity detection (IRS rule applies to spouse, controlled corp)
|
||||
- Trade date vs settlement date ambiguity
|
||||
- Short sale wash sale rules not addressed
|
||||
- Corporate action interactions (splits, mergers creating replacement lots)
|
||||
- DRIP purchases should trigger wash sale detection
|
||||
|
||||
## GPT-5 Unique Findings
|
||||
|
||||
1. **IRA/Roth IRA permanent disallowance** — If replacement is in IRA, loss is PERMANENTLY disallowed (no basis adjustment permitted), not just deferred. The doc's model is incorrect for IRA replacements.
|
||||
2. **Per-share vs lot-level granularity** — Doc says "replacement lot's effective cost basis" but IRS requires per-share matching within lots.
|
||||
3. **Multiple replacement lot allocation order** — IRS requires FIFO chronological matching; doc's simple min() formula doesn't handle multiple replacements.
|
||||
4. **Pre-sale purchase "still held" requirement** — Shares bought 30 days before sale must still be held ON sale date to count as replacement.
|
||||
5. **Replacement chain propagation** — Successive wash sales need to layer basis adjustments through multiple lots.
|
||||
6. **§475(f) mark-to-market trader carve-out** — MTM traders are exempt from wash sale rules.
|
||||
7. **Global caps across multiple replacement lots** — Sum of disallowed losses can't exceed total loss.
|
||||
8. **Conversions/exercise/assignment as acquisitions** — Option exercise counts as acquisition.
|
||||
|
||||
## Sonnet Unique Findings
|
||||
|
||||
1. Constructive sales and short-against-the-box interactions
|
||||
2. Related party trust/estate transactions (IRC §1091 extends to trusts)
|
||||
3. Year-end timing boundary issues (December losses with January replacements span tax years)
|
||||
4. Broker 1099-B reconciliation differences creating user confusion
|
||||
5. Fractional share rounding errors compounding over many wash sales
|
||||
|
||||
## Opus Unique Findings
|
||||
|
||||
1. **Holding period tacking should ADD periods, not just backdate** — Doc says "backdated" but correct rule is sum of holding periods
|
||||
2. Same-day transaction ordering race condition
|
||||
3. (Note: One finding referenced wrong document — rare analytical error)
|
||||
|
||||
## Key Insights
|
||||
|
||||
### Domain Expertise Matters
|
||||
|
||||
This experiment tested a domain-specific analytical task (IRS tax law) rather than general architectural analysis. The results show:
|
||||
|
||||
1. **GPT-5's reasoning tokens excel at regulatory specificity.** It consistently cited correct IRS publication sections, code numbers, and regulatory references. The 9,600 reasoning tokens appear spent on VERIFICATION.
|
||||
|
||||
2. **Opus is MORE reliable on regulatory tasks than general architecture review.** 10/11 findings were precise and actionable. Domain specificity constrains Opus's tendency toward exploratory analysis.
|
||||
|
||||
3. **Sonnet is well-suited for compliance first-pass.** At 25s for 14 findings, provides good breadth for identifying areas of concern.
|
||||
|
||||
### Severity Distribution
|
||||
|
||||
| Model | Critical | High | Medium | Total |
|
||||
|---|---|---|---|---|
|
||||
| GPT-5 | 2 | 11 | 5 | 18 |
|
||||
| Sonnet | 3 | 5 | 6 | 14 |
|
||||
| Opus | 2 | 4 | 5 | 11 |
|
||||
|
||||
Sonnet rated MORE findings as Critical despite finding fewer issues overall, suggesting different severity calibration.
|
||||
|
||||
## Practical Implication
|
||||
|
||||
### Updated Model Assignment for Compliance Review
|
||||
|
||||
1. **First pass: Sonnet** — Fast, broad coverage, identifies areas of concern
|
||||
2. **Deep dive: GPT-5** — Regulatory specificity, correct citations, highest precision
|
||||
3. **Recommendations: Opus** — Actionable remediation, design-level scope statements
|
||||
|
||||
## Prompt Used
|
||||
|
||||
```
|
||||
Analyze this architecture document for regulatory compliance assumptions.
|
||||
|
||||
The document describes wash sale tracking for IRS tax compliance. Your task: identify
|
||||
assumptions the design makes about regulatory requirements, edge cases where the
|
||||
implementation might diverge from IRS rules, and places where the simplifications
|
||||
described could lead to incorrect tax reporting.
|
||||
|
||||
Focus on:
|
||||
1. IRS rule interpretation gaps
|
||||
2. Edge cases the detection model might miss
|
||||
3. Calculation edge cases that could produce incorrect disallowed amounts
|
||||
4. Cross-account/cross-entity issues
|
||||
5. Timing/settlement date issues
|
||||
6. Corporate action interactions with wash sale tracking
|
||||
|
||||
For each finding, specify:
|
||||
- The assumption or gap
|
||||
- The relevant IRS rule or section (if known)
|
||||
- How it could lead to incorrect tax reporting
|
||||
- Severity (Critical/High/Medium/Low)
|
||||
```
|
||||
Reference in New Issue
Block a user