Files
model-research/findings/2026-05-10-65-temporal-correctness-analysis.md
T

3.3 KiB

Finding #65: Temporal Correctness Analysis

Date: 2026-05-10 Analytical Lens: Temporal Correctness Analysis (NEW) Document: gargoyle's aggregation.md (239 lines)

Summary

Tested a new analytical lens: identifying problems where time-dependent mechanisms (timers, timeouts, windows, expiration) may behave incorrectly under real-world temporal conditions.

Results

Model Time Output tokens Issues found Critical High Medium Low
Claude Opus 4 38s 1,042 8 1 3 3 1
Claude Sonnet 4 22s 938 7 0 2 4 1

Common Ground (both models identified)

  1. Timer references invalid after restart — When aggregator crashes while timers are active, timer messages may fire after restart but reference invalid groups.
  2. No monotonic clock specification — Document specifies "fixed duration" timeouts but doesn't clarify clock type. System time changes could affect timeout accuracy.
  3. "First signal" timestamp ambiguity — Doesn't specify whether timer starts from signal creation timestamp or arrival time.
  4. Window boundary conditions undefined — Pattern-complete predicates with time constraints have undefined inclusive/exclusive boundaries.
  5. Duration metric uses unspecified clock — Telemetry duration could produce nonsensical values if system clock changes during measurement.

Opus Unique Findings

  1. Crash loses temporal context (CRITICAL) — After crash/restart, cannot reconstruct how much time remained on active windows. Timing state is lost, not just signal data.
  2. No time coordination between aggregator and strategy (HIGH) — Clock skew between processes in distributed deployment makes pattern timing unreliable.
  3. Capacity vs timeout race condition (MEDIUM) — Unclear evaluation order affects audit trail reliability.

Sonnet Unique Findings

  1. Timeout drift in repeated operations (MEDIUM) — Processing overhead accumulates causing calibration guidance to become inaccurate over time.
  2. Strategy-aggregator restart coordination gap (MEDIUM) — Non-atomic restart creates brief data inconsistency windows.

Key Insight

"Temporal correctness" is distinct from "race conditions":

  • Race conditions: What if events arrive in unexpected ORDER?
  • Temporal correctness: What if TIME itself behaves unexpectedly?

Both lenses find different things and should be used together on any document with time-dependent mechanisms.

Model Strengths

  • Opus: Stronger on cross-component temporal coupling, found the critical temporal context loss issue
  • Sonnet: Efficient at identifying core timer hazards, better on operational concerns like drift accumulation

Limitations

GPT-5 was not tested due to credential availability. Future experiments should include GPT-5 to compare reasoning model performance on this lens.

Prompt Categories

The prompt specified 6 categories of temporal issues:

  1. Clock anomalies (skew, NTP, DST, suspend/resume)
  2. Timer reliability (delivery, post-crash, cancelled messages)
  3. Window boundary conditions (inclusive/exclusive)
  4. Ordering under delay (arrival vs timestamp)
  5. Duration drift (cumulative timing jitter)
  6. Temporal coupling (synchronized clock assumptions)