# Finding #65: Temporal Correctness Analysis **Date:** 2026-05-10 **Analytical Lens:** Temporal Correctness Analysis (NEW) **Document:** gargoyle's `aggregation.md` (239 lines) ## Summary Tested a new analytical lens: identifying problems where time-dependent mechanisms (timers, timeouts, windows, expiration) may behave incorrectly under real-world temporal conditions. ## Results | Model | Time | Output tokens | Issues found | Critical | High | Medium | Low | |---|---|---|---|---|---|---|---| | Claude Opus 4 | 38s | 1,042 | 8 | 1 | 3 | 3 | 1 | | Claude Sonnet 4 | 22s | 938 | 7 | 0 | 2 | 4 | 1 | ## Common Ground (both models identified) 1. **Timer references invalid after restart** — When aggregator crashes while timers are active, timer messages may fire after restart but reference invalid groups. 2. **No monotonic clock specification** — Document specifies "fixed duration" timeouts but doesn't clarify clock type. System time changes could affect timeout accuracy. 3. **"First signal" timestamp ambiguity** — Doesn't specify whether timer starts from signal creation timestamp or arrival time. 4. **Window boundary conditions undefined** — Pattern-complete predicates with time constraints have undefined inclusive/exclusive boundaries. 5. **Duration metric uses unspecified clock** — Telemetry duration could produce nonsensical values if system clock changes during measurement. ## Opus Unique Findings 1. **Crash loses temporal context** (CRITICAL) — After crash/restart, cannot reconstruct how much time remained on active windows. Timing state is lost, not just signal data. 2. **No time coordination between aggregator and strategy** (HIGH) — Clock skew between processes in distributed deployment makes pattern timing unreliable. 3. **Capacity vs timeout race condition** (MEDIUM) — Unclear evaluation order affects audit trail reliability. ## Sonnet Unique Findings 1. **Timeout drift in repeated operations** (MEDIUM) — Processing overhead accumulates causing calibration guidance to become inaccurate over time. 2. **Strategy-aggregator restart coordination gap** (MEDIUM) — Non-atomic restart creates brief data inconsistency windows. ## Key Insight "Temporal correctness" is distinct from "race conditions": - **Race conditions:** What if events arrive in unexpected ORDER? - **Temporal correctness:** What if TIME itself behaves unexpectedly? Both lenses find different things and should be used together on any document with time-dependent mechanisms. ## Model Strengths - **Opus:** Stronger on cross-component temporal coupling, found the critical temporal context loss issue - **Sonnet:** Efficient at identifying core timer hazards, better on operational concerns like drift accumulation ## Limitations GPT-5 was not tested due to credential availability. Future experiments should include GPT-5 to compare reasoning model performance on this lens. ## Prompt Categories The prompt specified 6 categories of temporal issues: 1. Clock anomalies (skew, NTP, DST, suspend/resume) 2. Timer reliability (delivery, post-crash, cancelled messages) 3. Window boundary conditions (inclusive/exclusive) 4. Ordering under delay (arrival vs timestamp) 5. Duration drift (cumulative timing jitter) 6. Temporal coupling (synchronized clock assumptions)