diff --git a/findings/2026-05-10-65-temporal-correctness-analysis.md b/findings/2026-05-10-65-temporal-correctness-analysis.md new file mode 100644 index 0000000..0a09ff2 --- /dev/null +++ b/findings/2026-05-10-65-temporal-correctness-analysis.md @@ -0,0 +1,79 @@ +# Finding #65: Temporal Correctness Analysis + +**Date:** 2026-05-10 +**Analytical Lens:** Temporal Correctness Analysis (NEW) +**Document:** gargoyle's `aggregation.md` (239 lines) + +## Summary + +Tested a new analytical lens: identifying problems where time-dependent mechanisms +(timers, timeouts, windows, expiration) may behave incorrectly under real-world +temporal conditions. + +## Results + +| Model | Time | Output tokens | Issues found | Critical | High | Medium | Low | +|---|---|---|---|---|---|---|---| +| Claude Opus 4 | 38s | 1,042 | 8 | 1 | 3 | 3 | 1 | +| Claude Sonnet 4 | 22s | 938 | 7 | 0 | 2 | 4 | 1 | + +## Common Ground (both models identified) + +1. **Timer references invalid after restart** — When aggregator crashes while timers + are active, timer messages may fire after restart but reference invalid groups. +2. **No monotonic clock specification** — Document specifies "fixed duration" timeouts + but doesn't clarify clock type. System time changes could affect timeout accuracy. +3. **"First signal" timestamp ambiguity** — Doesn't specify whether timer starts from + signal creation timestamp or arrival time. +4. **Window boundary conditions undefined** — Pattern-complete predicates with time + constraints have undefined inclusive/exclusive boundaries. +5. **Duration metric uses unspecified clock** — Telemetry duration could produce + nonsensical values if system clock changes during measurement. + +## Opus Unique Findings + +1. **Crash loses temporal context** (CRITICAL) — After crash/restart, cannot + reconstruct how much time remained on active windows. Timing state is lost, not + just signal data. +2. **No time coordination between aggregator and strategy** (HIGH) — Clock skew + between processes in distributed deployment makes pattern timing unreliable. +3. **Capacity vs timeout race condition** (MEDIUM) — Unclear evaluation order affects + audit trail reliability. + +## Sonnet Unique Findings + +1. **Timeout drift in repeated operations** (MEDIUM) — Processing overhead accumulates + causing calibration guidance to become inaccurate over time. +2. **Strategy-aggregator restart coordination gap** (MEDIUM) — Non-atomic restart + creates brief data inconsistency windows. + +## Key Insight + +"Temporal correctness" is distinct from "race conditions": +- **Race conditions:** What if events arrive in unexpected ORDER? +- **Temporal correctness:** What if TIME itself behaves unexpectedly? + +Both lenses find different things and should be used together on any document with +time-dependent mechanisms. + +## Model Strengths + +- **Opus:** Stronger on cross-component temporal coupling, found the critical temporal + context loss issue +- **Sonnet:** Efficient at identifying core timer hazards, better on operational + concerns like drift accumulation + +## Limitations + +GPT-5 was not tested due to credential availability. Future experiments should include +GPT-5 to compare reasoning model performance on this lens. + +## Prompt Categories + +The prompt specified 6 categories of temporal issues: +1. Clock anomalies (skew, NTP, DST, suspend/resume) +2. Timer reliability (delivery, post-crash, cancelled messages) +3. Window boundary conditions (inclusive/exclusive) +4. Ordering under delay (arrival vs timestamp) +5. Duration drift (cumulative timing jitter) +6. Temporal coupling (synchronized clock assumptions)