Finding #65: Temporal Correctness Analysis

Date: 2026-05-10 Analytical Lens: Temporal Correctness Analysis (NEW) Document: gargoyle's aggregation.md (239 lines)

Summary

Tested a new analytical lens: identifying problems where time-dependent mechanisms (timers, timeouts, windows, expiration) may behave incorrectly under real-world temporal conditions.

Results

Model	Time	Output tokens	Issues found	Critical	High	Medium	Low
Claude Opus 4	38s	1,042	8	1	3	3	1
Claude Sonnet 4	22s	938	7	0	2	4	1

Common Ground (both models identified)

Timer references invalid after restart — When aggregator crashes while timers are active, timer messages may fire after restart but reference invalid groups.
No monotonic clock specification — Document specifies "fixed duration" timeouts but doesn't clarify clock type. System time changes could affect timeout accuracy.
"First signal" timestamp ambiguity — Doesn't specify whether timer starts from signal creation timestamp or arrival time.
Window boundary conditions undefined — Pattern-complete predicates with time constraints have undefined inclusive/exclusive boundaries.
Duration metric uses unspecified clock — Telemetry duration could produce nonsensical values if system clock changes during measurement.

Opus Unique Findings

Crash loses temporal context (CRITICAL) — After crash/restart, cannot reconstruct how much time remained on active windows. Timing state is lost, not just signal data.
No time coordination between aggregator and strategy (HIGH) — Clock skew between processes in distributed deployment makes pattern timing unreliable.
Capacity vs timeout race condition (MEDIUM) — Unclear evaluation order affects audit trail reliability.

Sonnet Unique Findings

Timeout drift in repeated operations (MEDIUM) — Processing overhead accumulates causing calibration guidance to become inaccurate over time.
Strategy-aggregator restart coordination gap (MEDIUM) — Non-atomic restart creates brief data inconsistency windows.

Key Insight

"Temporal correctness" is distinct from "race conditions":

Race conditions: What if events arrive in unexpected ORDER?
Temporal correctness: What if TIME itself behaves unexpectedly?

Both lenses find different things and should be used together on any document with time-dependent mechanisms.

Model Strengths

Opus: Stronger on cross-component temporal coupling, found the critical temporal context loss issue
Sonnet: Efficient at identifying core timer hazards, better on operational concerns like drift accumulation

Limitations

GPT-5 was not tested due to credential availability. Future experiments should include GPT-5 to compare reasoning model performance on this lens.

Prompt Categories

The prompt specified 6 categories of temporal issues:

Clock anomalies (skew, NTP, DST, suspend/resume)
Timer reliability (delivery, post-crash, cancelled messages)
Window boundary conditions (inclusive/exclusive)
Ordering under delay (arrival vs timestamp)
Duration drift (cumulative timing jitter)
Temporal coupling (synchronized clock assumptions)

3.3 KiB Raw Blame History