From a65c471a3f873871bc4e1e6097c6926e4286a6d6 Mon Sep 17 00:00:00 2001 From: claw Date: Thu, 7 May 2026 12:47:03 -0700 Subject: [PATCH] finding 41: temporal ordering dependency analysis on kill-switch.md New analytical lens testing whether models can identify sequential operations where order matters but isn't mechanically enforced. GPT-5 finds systemic gaps (WHY ordering matters), Opus finds inverted dangers (WHICH direction is dangerous), Sonnet identifies themes without unique depth. --- ...1-temporal-ordering-dependency-analysis.md | 122 ++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 findings/2026-05-07-41-temporal-ordering-dependency-analysis.md diff --git a/findings/2026-05-07-41-temporal-ordering-dependency-analysis.md b/findings/2026-05-07-41-temporal-ordering-dependency-analysis.md new file mode 100644 index 0000000..4d1b6ca --- /dev/null +++ b/findings/2026-05-07-41-temporal-ordering-dependency-analysis.md @@ -0,0 +1,122 @@ +# Finding 41: Temporal Ordering Dependency Analysis + +**Date:** 2026-05-07 +**Document:** gargoyle `kill-switch.md` (293 lines) +**Analytical lens:** Temporal ordering dependencies — places where operations are described +sequentially but nothing mechanically enforces that ordering + +## Experiment Design + +**Task:** Identify places where the document assumes operations happen in a specific +sequence but doesn't mechanically enforce it, and where reordering (due to crashes, +async events, operator timing, or message ordering) would violate correctness. + +**Key distinction from race condition analysis:** This is about SEQUENTIAL operations +where order matters but isn't guaranteed — not about truly concurrent events. + +**Prompt structure:** Specified 5 focus areas (multi-step engagement/disengagement, +cross-component coordination, recovery/restart, operator timing, event vs state ordering). +Required per-finding format (dependency, assumed ordering, enforcement gap, violation +scenario, impact). Excluded single-component bugs, hardware failures, pure race conditions. + +**Models:** GPT-5, Claude Opus 4.6, Claude Sonnet 4.6 via HAI proxy. No tools, no +project context beyond the document itself. Single prompt, no conversation history. + +## Results + +| Model | Time | Output tokens | Reasoning tokens | Findings | +|---|---|---|---|---| +| GPT-5 | 122s | 12,437 | 9,856 | 12 | +| Claude Opus 4.6 | 70s | 2,903 | (internal) | 9 | +| Claude Sonnet 4.6 | 24s | 1,231 | (internal) | 7 | + +## Common Ground (all 3 identified) + +- Persistence write before decision engine termination (crash recovery gap) +- Decision engine termination vs acceptance policy update ordering +- Acceptance policy change before order cancellations +- Event emission vs state change visibility +- Application restart: components starting before kill state loads +- Restrict→liquidate transition without state verification enforcement + +## GPT-5 Unique Findings + +1. **Acceptance policy authority conflict**: OM's "blind enforcement" + "other sources can set + policy" = no lock/lease/priority prevents a risk monitor clearing a transient alert and + setting OM back to "open" during engagement. Ordering matters BECAUSE no coordination exists. +2. **Global vs per-user write ordering**: OM doesn't distinguish sources. Per-user "open" + arriving after global "engage" wins (last-writer-wins). No composite policy computation. +3. **Disengage: policy open before operator release**: If release fires before OM applies + open policy, restarted engine's orders are rejected — false start. +4. **Cancel-all position snapshot timing**: Liquidation "load all positions" doesn't wait + for fill/cancel events from step 1 to be ingested — sizing is wrong. +5. **Event ordering: per-user disengage vs global engage**: User lifecycle processes events + in delivery order, not logical order — could remove user from pending-release during + active global kill. +6. **OM-unavailable: pending cancel-all not re-driven on recovery**: If OM was down during + engagement, cancels never fired and nothing persists a "cancel-all pending" instruction. + +## Claude Opus Unique Findings + +1. **Close-only policy accepts dying engine's close orders** (THE insight): Everyone worries + about reject-all blocking cancels, but LIQUIDATE mode's close-only policy is the dangerous + one — a strategy generating a close signal during the termination window gets it accepted. + "A dying-but-not-yet-dead decision engine makes a trading decision that passes the relaxed + acceptance policy." Safety mechanism becomes vulnerability. +2. **OM restart: queued messages vs policy initialization**: Messages from the (terminated) + decision engine sit in OM's inbound queue. Queue drain may begin before policy initialization + completes, allowing zombie orders through. +3. **Cold-start reconciliation gate vs market data**: After release, policy is "open" but + reconciliation hasn't verified positions. Market data feed (which survived engagement) + delivers ticks; engine may generate signals against stale positions before reconciliation + completes. + +## Claude Sonnet Assessment + +- Identified all common-ground themes (7 findings) with correct structure +- No unique findings beyond what the other models found +- Vaguer violation scenarios ("brief window" without specific interleavings) +- Muted severity assessments ("inconsistent audit trail" where others saw "financial exposure") +- Good for quick sanity check; not for deep temporal analysis + +## Key Insights + +### GPT-5 reasons about WHY ordering matters (not just THAT it matters) + +GPT-5's distinctive contribution: several findings (#5, #6, #11, #12) identify temporal +issues arising not from incorrect sequencing of correctly-designed operations, but from +the ABSENCE of mechanisms that would make ordering irrelevant. + +"Global always wins" is a temporal ordering assumption only because no composite policy +computation exists. If OM computed effective policy from all sources, ordering wouldn't +matter. GPT-5 identifies these "ordering matters because architecture is incomplete" +findings — a level deeper than "A must happen before B." + +### Opus's inversion insight + +Opus finding #2 (close-only accepting dying engine's close orders) inverts the obvious +concern direction. Everyone thinks "reject-all is dangerous for ordering" (blocks cancels). +Opus finds LIQUIDATE mode is more dangerous temporally because it permits a SUBSET of +automated orders — and a dying engine could produce exactly that subset. Consistent with +Opus's pattern: finding where safety mechanisms become vulnerabilities. + +### Task-type positioning + +"Temporal ordering dependency analysis" sits between assumption-finding and race conditions: +- Closer to assumptions (what must be true about sequencing?) → Sonnet performs adequately +- Further from races (what happens with truly concurrent events?) → Sonnet doesn't fail +- GPT-5 and Opus both excel but at different aspects (systemic gaps vs inverted dangers) + +## Practical Implications + +For temporal ordering analysis on architecture docs: +- **GPT-5**: exhaustive coverage + systemic insights (why does ordering matter?) +- **Opus**: inverted/non-obvious ordering dangers (which direction is actually dangerous?) +- **Sonnet**: adequate sanity check but zero unique insights +- Total unique findings after deduplication: ~15 distinct temporal dependencies from 293 lines + +## Model Hierarchy for This Task Type + +1. GPT-5 — broadest, identifies systemic ordering issues + missing mechanisms (12 findings) +2. Opus — fewer but includes the architecturally most significant insight (9 findings) +3. Sonnet — correct themes, no unique depth, fast/cheap (7 findings)