Add finding #65: Temporal correctness analysis (new lens)
This commit is contained in:
@@ -0,0 +1,79 @@
|
||||
# Finding #65: Temporal Correctness Analysis
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Analytical Lens:** Temporal Correctness Analysis (NEW)
|
||||
**Document:** gargoyle's `aggregation.md` (239 lines)
|
||||
|
||||
## Summary
|
||||
|
||||
Tested a new analytical lens: identifying problems where time-dependent mechanisms
|
||||
(timers, timeouts, windows, expiration) may behave incorrectly under real-world
|
||||
temporal conditions.
|
||||
|
||||
## Results
|
||||
|
||||
| Model | Time | Output tokens | Issues found | Critical | High | Medium | Low |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| Claude Opus 4 | 38s | 1,042 | 8 | 1 | 3 | 3 | 1 |
|
||||
| Claude Sonnet 4 | 22s | 938 | 7 | 0 | 2 | 4 | 1 |
|
||||
|
||||
## Common Ground (both models identified)
|
||||
|
||||
1. **Timer references invalid after restart** — When aggregator crashes while timers
|
||||
are active, timer messages may fire after restart but reference invalid groups.
|
||||
2. **No monotonic clock specification** — Document specifies "fixed duration" timeouts
|
||||
but doesn't clarify clock type. System time changes could affect timeout accuracy.
|
||||
3. **"First signal" timestamp ambiguity** — Doesn't specify whether timer starts from
|
||||
signal creation timestamp or arrival time.
|
||||
4. **Window boundary conditions undefined** — Pattern-complete predicates with time
|
||||
constraints have undefined inclusive/exclusive boundaries.
|
||||
5. **Duration metric uses unspecified clock** — Telemetry duration could produce
|
||||
nonsensical values if system clock changes during measurement.
|
||||
|
||||
## Opus Unique Findings
|
||||
|
||||
1. **Crash loses temporal context** (CRITICAL) — After crash/restart, cannot
|
||||
reconstruct how much time remained on active windows. Timing state is lost, not
|
||||
just signal data.
|
||||
2. **No time coordination between aggregator and strategy** (HIGH) — Clock skew
|
||||
between processes in distributed deployment makes pattern timing unreliable.
|
||||
3. **Capacity vs timeout race condition** (MEDIUM) — Unclear evaluation order affects
|
||||
audit trail reliability.
|
||||
|
||||
## Sonnet Unique Findings
|
||||
|
||||
1. **Timeout drift in repeated operations** (MEDIUM) — Processing overhead accumulates
|
||||
causing calibration guidance to become inaccurate over time.
|
||||
2. **Strategy-aggregator restart coordination gap** (MEDIUM) — Non-atomic restart
|
||||
creates brief data inconsistency windows.
|
||||
|
||||
## Key Insight
|
||||
|
||||
"Temporal correctness" is distinct from "race conditions":
|
||||
- **Race conditions:** What if events arrive in unexpected ORDER?
|
||||
- **Temporal correctness:** What if TIME itself behaves unexpectedly?
|
||||
|
||||
Both lenses find different things and should be used together on any document with
|
||||
time-dependent mechanisms.
|
||||
|
||||
## Model Strengths
|
||||
|
||||
- **Opus:** Stronger on cross-component temporal coupling, found the critical temporal
|
||||
context loss issue
|
||||
- **Sonnet:** Efficient at identifying core timer hazards, better on operational
|
||||
concerns like drift accumulation
|
||||
|
||||
## Limitations
|
||||
|
||||
GPT-5 was not tested due to credential availability. Future experiments should include
|
||||
GPT-5 to compare reasoning model performance on this lens.
|
||||
|
||||
## Prompt Categories
|
||||
|
||||
The prompt specified 6 categories of temporal issues:
|
||||
1. Clock anomalies (skew, NTP, DST, suspend/resume)
|
||||
2. Timer reliability (delivery, post-crash, cancelled messages)
|
||||
3. Window boundary conditions (inclusive/exclusive)
|
||||
4. Ordering under delay (arrival vs timestamp)
|
||||
5. Duration drift (cumulative timing jitter)
|
||||
6. Temporal coupling (synchronized clock assumptions)
|
||||
Reference in New Issue
Block a user