diff --git a/findings/2026-05-10-security-boundary-analysis.md b/findings/2026-05-10-security-boundary-analysis.md new file mode 100644 index 0000000..6126431 --- /dev/null +++ b/findings/2026-05-10-security-boundary-analysis.md @@ -0,0 +1,165 @@ +# Security Boundary Analysis: A New Analytical Lens + +**Date:** 2026-05-10 +**Task:** Identify security boundary violations and trust assumption gaps in gargoyle's +`system-overview.md` (323 lines) — a high-level architecture document describing +component interactions, ports, bounded contexts, and domain invariants. +**How we used them:** Same document (full text) + same focused analytical question to +both models via HAI proxy. Highly structured prompt specifying 5 categories of security +analysis: trust boundary crossings, privilege escalation paths, data integrity assumptions, +authentication/authorization gaps, and audit trail exploitability. Required specific +output format per finding. No tools, no project context beyond the document itself. + +| Model | Time | Output tokens | Findings | Critical | High | Medium | Low | +|---|---|---|---|---|---|---|---| +| Claude Sonnet 4 | ~18s | 1,343 | 10 | 2 | 4 | 4 | 0 | +| Claude Opus | ~180s* | 3,370 | 15 | 3 | 9 | 3 | 0 | + +*Opus time approximate due to timeout during collection + +## What They Found — Common Ground (both identified) + +Both models independently identified these core security concerns: + +1. **MarketDataIngestion trust (Finding 1 in both):** The shared tick distribution + mechanism lacks integrity verification — downstream components trust that ticks + are authentic without validation. A compromised adapter could inject false prices + affecting all users simultaneously. + +2. **BrokerAdapter fill trust:** Fill events are treated as authoritative ("ground truth") + without authentication. A compromised adapter could inject fabricated fills, + corrupting position state across all downstream components. + +3. **Instrument ID manipulation:** The resolution from raw ticker to `instrument_id` + at the ingestion boundary could be manipulated to cause trades for wrong securities. + +4. **User isolation enforcement:** User instances are described as "isolated" but the + enforcement mechanism isn't specified. Both models questioned whether process-level + isolation prevents cross-user message injection. + +5. **Kill switch authorization:** The kill switch can be engaged by "Operator/System" + but authentication/authorization requirements aren't specified. + +6. **Audit trail integrity:** The `decision_id`/`signal_id` linking is referential + rather than cryptographically bound, allowing potential falsification. + +7. **PortfolioMonitor privilege:** PortfolioMonitor can issue "close-only" orders + directly to OrderManager without described authorization verification. + +8. **Reconciliation trust:** The reconciliation process trusts broker-provided data + without verification of authenticity. + +## Opus Unique Findings (not in Sonnet) + +1. **Signal reconnaissance via audit blindspot:** Since "signals are never persisted" + and only "relevant" signals are logged, a malicious strategy could emit thousands + of probing signals that never reach rejection or aggregation thresholds — leaving + no audit trail while conducting reconnaissance on risk limits and portfolio state. + **This is architecturally significant** — it identifies that the audit model has + a structural blind spot by design, not by oversight. + +2. **Paper trading state leakage:** The design's explicit "substitutability" principle + means the system cannot distinguish paper simulator from production broker. Any + configuration error or migration bug could cause simulated fills to pollute + production records. + +3. **Corporate action injection:** The "Lot adjusted" event for corporate actions lacks + specified authorization — an attacker with access could inject fake splits/dividends + to manipulate positions and cost basis. + +4. **Administrative action audit gap:** The domain events table lists trading events + but not administrative operations (risk limit changes, kill switch state, reconciliation + overrides). Operator actions could be modified without any audit trail. + +5. **Cross-user event observation:** Although users have "isolated" instances, they all + subscribe to the same shared tick event stream. Depending on implementation, a + compromised instance might observe other users' tick consumption patterns through + timing attacks. + +## Sonnet Unique Findings (not in Opus) + +1. **Cross-user fill routing:** Sonnet explicitly identified that BrokerAdapter must + correctly route fills to the right user's OrderManager, but there's no verification + mechanism described. A compromised adapter could send User A's fills to User B. + *Opus mentioned adapter compromise for injection but didn't specifically address + the routing problem.* + +## Quality Assessment + +- **Claude Opus** produced the most findings (15 vs 10) with notably deeper reasoning + about second-order effects. The signal reconnaissance finding (#5 in Opus) is the + most architecturally significant discovery — it identifies that a core design choice + (transient signals for performance) creates a structural security vulnerability that + can't be fixed without changing the persistence model. The paper trading state + leakage finding shows reasoning about how substitutability principles can backfire. + Opus also uniquely identified the administrative audit gap — a compliance-critical + finding for a trading system. + +- **Claude Sonnet 4** was remarkably fast (18s vs ~180s) and found 10 solid issues. + The cross-user fill routing finding shows good boundary reasoning. Sonnet's findings + were well-structured and actionable. However, Sonnet stayed closer to the explicit + trust boundaries in the document, while Opus reasoned about implications of design + principles (like substitutability and signal transience). + +## Key Insight — Security Boundary Analysis as a Task Type + +This is a genuinely NEW analytical lens not previously tested. Unlike: +- **Assumption-finding:** What must be true for this to work? +- **Race condition identification:** What timing interleavings cause problems? +- **Design coherence:** Does the document contradict itself? + +Security boundary analysis asks: **Where can a malicious or compromised component +affect things beyond its intended scope?** + +This requires reasoning about: +1. What each component can see (data visibility) +2. What each component can do (action scope) +3. How trust is established (or not) at boundaries +4. What happens when trust assumptions are violated + +The models performed well because the document explicitly describes component +boundaries and interactions. Both models successfully identified that "isolation" +is claimed but not specified, that "ground truth" status doesn't mean "verified +authentic," and that audit coverage has structural gaps. + +## Comparison to Previous Task Types + +| Analytical lens | Primary reasoning mode | Opus strength? | Sonnet strength? | +|---|---|---|---| +| Assumption-finding | What's unstated | ✓ | ✓ | +| Race conditions | Temporal interleavings | ✓ | ✗ (per Finding #13) | +| Design coherence | Self-contradiction | ✓ | Mixed | +| Security boundaries | Adversarial scope | ✓✓ | ✓ | + +Security boundary analysis appears to favor reasoning models (Opus) because it +requires modeling adversarial behavior and reasoning about trust transitivity. +However, Sonnet performed better here than on race conditions — suggesting that +security analysis is more about boundary enumeration than temporal reasoning. + +## Practical Implications + +1. **Security boundary analysis is viable for architecture review.** Both models + produced actionable findings that would matter for a real trading system. + +2. **The audit blindspot finding is worth pursuing.** Opus's insight that transient + signals create a reconnaissance capability is a genuine security design flaw. + This should be added to gargoyle's security review backlog. + +3. **Run this lens on other architecture docs.** The technique worked well on a + system overview. Would it work on more detailed component docs? Would findings + overlap with assumption-finding, or remain distinct? + +4. **Opus's depth justifies the time cost for security-critical analysis.** The + 10x time difference produced 50% more findings AND higher-order insights. + For security review, the extra investment is worth it. + +## Next Experiments + +1. Run security boundary analysis on a detailed component doc (e.g., order-execution.md) + to see if findings overlap with Finding #12's assumption analysis. + +2. Test whether adding an explicit "adversarial actor model" to the prompt changes + the findings (e.g., "assume the Strategy Worker author is malicious"). + +3. Compare against GPT-5 when available — does reasoning-token-heavy analysis + produce different security insights than Opus's internal reasoning?