3.3 KiB
3.3 KiB
Finding #65: Temporal Correctness Analysis
Date: 2026-05-10
Analytical Lens: Temporal Correctness Analysis (NEW)
Document: gargoyle's aggregation.md (239 lines)
Summary
Tested a new analytical lens: identifying problems where time-dependent mechanisms (timers, timeouts, windows, expiration) may behave incorrectly under real-world temporal conditions.
Results
| Model | Time | Output tokens | Issues found | Critical | High | Medium | Low |
|---|---|---|---|---|---|---|---|
| Claude Opus 4 | 38s | 1,042 | 8 | 1 | 3 | 3 | 1 |
| Claude Sonnet 4 | 22s | 938 | 7 | 0 | 2 | 4 | 1 |
Common Ground (both models identified)
- Timer references invalid after restart — When aggregator crashes while timers are active, timer messages may fire after restart but reference invalid groups.
- No monotonic clock specification — Document specifies "fixed duration" timeouts but doesn't clarify clock type. System time changes could affect timeout accuracy.
- "First signal" timestamp ambiguity — Doesn't specify whether timer starts from signal creation timestamp or arrival time.
- Window boundary conditions undefined — Pattern-complete predicates with time constraints have undefined inclusive/exclusive boundaries.
- Duration metric uses unspecified clock — Telemetry duration could produce nonsensical values if system clock changes during measurement.
Opus Unique Findings
- Crash loses temporal context (CRITICAL) — After crash/restart, cannot reconstruct how much time remained on active windows. Timing state is lost, not just signal data.
- No time coordination between aggregator and strategy (HIGH) — Clock skew between processes in distributed deployment makes pattern timing unreliable.
- Capacity vs timeout race condition (MEDIUM) — Unclear evaluation order affects audit trail reliability.
Sonnet Unique Findings
- Timeout drift in repeated operations (MEDIUM) — Processing overhead accumulates causing calibration guidance to become inaccurate over time.
- Strategy-aggregator restart coordination gap (MEDIUM) — Non-atomic restart creates brief data inconsistency windows.
Key Insight
"Temporal correctness" is distinct from "race conditions":
- Race conditions: What if events arrive in unexpected ORDER?
- Temporal correctness: What if TIME itself behaves unexpectedly?
Both lenses find different things and should be used together on any document with time-dependent mechanisms.
Model Strengths
- Opus: Stronger on cross-component temporal coupling, found the critical temporal context loss issue
- Sonnet: Efficient at identifying core timer hazards, better on operational concerns like drift accumulation
Limitations
GPT-5 was not tested due to credential availability. Future experiments should include GPT-5 to compare reasoning model performance on this lens.
Prompt Categories
The prompt specified 6 categories of temporal issues:
- Clock anomalies (skew, NTP, DST, suspend/resume)
- Timer reliability (delivery, post-crash, cancelled messages)
- Window boundary conditions (inclusive/exclusive)
- Ordering under delay (arrival vs timestamp)
- Duration drift (cumulative timing jitter)
- Temporal coupling (synchronized clock assumptions)