Add finding #65: Temporal correctness analysis (new lens)

2026-05-10 14:50:56 -07:00
parent 398f33aad4
commit c1eb97ed6c
1 changed files with 79 additions and 0 deletions
@@ -0,0 +1,79 @@
+# Finding #65: Temporal Correctness Analysis
+
+**Date:** 2026-05-10
+**Analytical Lens:** Temporal Correctness Analysis (NEW)
+**Document:** gargoyle's `aggregation.md` (239 lines)
+
+## Summary
+
+Tested a new analytical lens: identifying problems where time-dependent mechanisms
+(timers, timeouts, windows, expiration) may behave incorrectly under real-world 
+temporal conditions.
+
+## Results
+
+| Model | Time | Output tokens | Issues found | Critical | High | Medium | Low |
+|---|---|---|---|---|---|---|---|
+| Claude Opus 4 | 38s | 1,042 | 8 | 1 | 3 | 3 | 1 |
+| Claude Sonnet 4 | 22s | 938 | 7 | 0 | 2 | 4 | 1 |
+
+## Common Ground (both models identified)
+
+1. **Timer references invalid after restart** — When aggregator crashes while timers 
+   are active, timer messages may fire after restart but reference invalid groups.
+2. **No monotonic clock specification** — Document specifies "fixed duration" timeouts 
+   but doesn't clarify clock type. System time changes could affect timeout accuracy.
+3. **"First signal" timestamp ambiguity** — Doesn't specify whether timer starts from
+   signal creation timestamp or arrival time.
+4. **Window boundary conditions undefined** — Pattern-complete predicates with time
+   constraints have undefined inclusive/exclusive boundaries.
+5. **Duration metric uses unspecified clock** — Telemetry duration could produce
+   nonsensical values if system clock changes during measurement.
+
+## Opus Unique Findings
+
+1. **Crash loses temporal context** (CRITICAL) — After crash/restart, cannot 
+   reconstruct how much time remained on active windows. Timing state is lost, not
+   just signal data.
+2. **No time coordination between aggregator and strategy** (HIGH) — Clock skew 
+   between processes in distributed deployment makes pattern timing unreliable.
+3. **Capacity vs timeout race condition** (MEDIUM) — Unclear evaluation order affects
+   audit trail reliability.
+
+## Sonnet Unique Findings
+
+1. **Timeout drift in repeated operations** (MEDIUM) — Processing overhead accumulates
+   causing calibration guidance to become inaccurate over time.
+2. **Strategy-aggregator restart coordination gap** (MEDIUM) — Non-atomic restart 
+   creates brief data inconsistency windows.
+
+## Key Insight
+
+"Temporal correctness" is distinct from "race conditions":
+- **Race conditions:** What if events arrive in unexpected ORDER?
+- **Temporal correctness:** What if TIME itself behaves unexpectedly?
+
+Both lenses find different things and should be used together on any document with
+time-dependent mechanisms.
+
+## Model Strengths
+
+- **Opus:** Stronger on cross-component temporal coupling, found the critical temporal
+  context loss issue
+- **Sonnet:** Efficient at identifying core timer hazards, better on operational 
+  concerns like drift accumulation
+
+## Limitations
+
+GPT-5 was not tested due to credential availability. Future experiments should include
+GPT-5 to compare reasoning model performance on this lens.
+
+## Prompt Categories
+
+The prompt specified 6 categories of temporal issues:
+1. Clock anomalies (skew, NTP, DST, suspend/resume)
+2. Timer reliability (delivery, post-crash, cancelled messages)
+3. Window boundary conditions (inclusive/exclusive)
+4. Ordering under delay (arrival vs timestamp)
+5. Duration drift (cumulative timing jitter)
+6. Temporal coupling (synchronized clock assumptions)