b72c14f370
Based on actual review findings: - Replay determinism: DateTime.utc_now() in apply, random in state - Event design: OrderUpdated with changes map (CRUD-in-disguise) - Projections as source of truth - Suggesting event 'fixes' instead of compensating events - Missing idempotency in handlers Added Elixir/OTP specific patterns (handle_continue for replay, Process dictionary for test isolation). Anti-patterns table for quick flagging.
149 lines
6.4 KiB
Markdown
149 lines
6.4 KiB
Markdown
# DDD & Event Sourcing Review Checklist
|
|
|
|
What LLMs get subtly wrong in event-sourced systems. Based on actual review findings.
|
|
|
|
---
|
|
|
|
## Replay Determinism (CRITICAL)
|
|
|
|
Models frequently introduce non-determinism into replay paths:
|
|
|
|
### Timestamps
|
|
- [ ] **No `DateTime.utc_now()` in apply/reduce functions** — timestamp must come from the event
|
|
- [ ] **No `System.monotonic_time()` in state reconstruction** — time-based decisions use event clock
|
|
- [ ] If you need "when did this happen", the event carries that data
|
|
|
|
**LLM mistake:** Adding `updated_at: DateTime.utc_now()` to state in an apply function. This means replay produces different state than the original run.
|
|
|
|
### Randomness
|
|
- [ ] **No `:rand` calls in apply functions** — random values must be computed at command time and stored in event
|
|
- [ ] **No UUID generation during replay** — IDs assigned at command/event creation, never reconstruction
|
|
|
|
**LLM mistake:** Generating order IDs in the apply function instead of the command handler.
|
|
|
|
### External Calls
|
|
- [ ] **No HTTP/DB/external calls in apply functions** — apply is pure: event in, state out
|
|
- [ ] **No side effects in apply** — logging at debug level is acceptable, nothing else
|
|
- [ ] External data needed for decisions must be fetched BEFORE emitting the event
|
|
|
|
**LLM mistake:** Fetching current price during apply to validate an order event. The price should be in the event.
|
|
|
|
---
|
|
|
|
## Event Design
|
|
|
|
### Self-Contained Events
|
|
- [ ] **Event contains all data needed to understand what happened** — don't rely on external lookups
|
|
- [ ] **Include denormalized data that might change** — `{product_id, product_name, price_at_time}` not just `{product_id}`
|
|
- [ ] **Actor/causation metadata** — who triggered this, correlation_id for tracing
|
|
|
|
**LLM mistake:** `%OrderPlaced{order_id: "123", product_ids: [...]}` — missing prices, quantities, everything needed to understand the order without external lookups.
|
|
|
|
### Event Naming
|
|
- [ ] **Past tense** — `OrderPlaced`, `PaymentReceived`, `ItemShipped`
|
|
- [ ] **Not commands** — never `PlaceOrder` or `ProcessPayment` as event names
|
|
- [ ] **Specific over generic** — `OrderItemQuantityAdjusted` not `OrderUpdated`
|
|
|
|
**LLM mistake:** Creating `OrderUpdated` events with a `changes` map. This is CRUD-in-disguise, not event sourcing.
|
|
|
|
### Event Immutability
|
|
- [ ] **Never suggest "fixing" or "updating" an event** — events are immutable facts
|
|
- [ ] **Compensating events for corrections** — `OrderCorrected`, `AmountAdjusted`
|
|
- [ ] **Schema versioning for evolution** — old events must remain readable forever
|
|
|
|
**LLM mistake:** Suggesting a migration that modifies existing events to fix a bug. The correct answer is always a compensating event or projection rebuild.
|
|
|
|
---
|
|
|
|
## Aggregate Boundaries
|
|
|
|
### Size
|
|
- [ ] **Aggregate = consistency boundary, not data grouping** — what MUST be consistent in one transaction?
|
|
- [ ] **Smaller is better** — large aggregates = contention, scaling pain
|
|
- [ ] **Reference other aggregates by ID** — never embed full objects
|
|
|
|
**LLM mistake:** Making `Portfolio` contain a list of `Position` aggregates. Each position should be its own aggregate referenced by ID.
|
|
|
|
### Invariants
|
|
- [ ] **Business rules inside the aggregate** — not in application services
|
|
- [ ] **Always valid after any operation** — reject operations that would violate invariants
|
|
- [ ] **Constructor enforces required fields** — no invalid aggregate instances
|
|
|
|
**LLM mistake:** Putting validation in a service layer: "OrderService checks if order has at least one item." The Order aggregate should reject an empty order.
|
|
|
|
---
|
|
|
|
## Projections (Read Models)
|
|
|
|
### Not Source of Truth
|
|
- [ ] **Projections are derived, disposable** — can always rebuild from events
|
|
- [ ] **If projection is wrong, fix the projection logic and rebuild** — don't "fix" projection data directly
|
|
- [ ] **One projection per query need** — don't share if requirements differ
|
|
|
|
**LLM mistake:** Treating a projection table as canonical and syncing events to match it.
|
|
|
|
### Eventual Consistency
|
|
- [ ] **Read models may lag behind writes** — UI must handle this
|
|
- [ ] **Don't return projection state immediately after command** — it might not be updated yet
|
|
- [ ] **Idempotent handlers** — same event delivered twice produces same result
|
|
|
|
**LLM mistake:** API endpoint that does `append_event(...)` then immediately `query_projection(...)` and returns it. Race condition.
|
|
|
|
---
|
|
|
|
## Idempotency
|
|
|
|
### Event Handling
|
|
- [ ] **Idempotency keys for commands** — especially payments, orders
|
|
- [ ] **Check for duplicate events before processing** — at-least-once delivery is common
|
|
- [ ] **Make apply functions idempotent** — applying same event twice = same state
|
|
|
|
**LLM mistake:** An event handler that increments a counter without checking if this event was already processed.
|
|
|
|
### Command Handling
|
|
- [ ] **Use `append_if_absent` patterns** — check before write, atomically
|
|
- [ ] **Return success for duplicate valid commands** — don't error on retry
|
|
|
|
---
|
|
|
|
## Process Managers / Sagas
|
|
|
|
- [ ] **Long-running coordination across aggregates** — don't do this in application services
|
|
- [ ] **Own state machine with explicit states** — what step are we on?
|
|
- [ ] **Handle timeouts** — what if a step never completes?
|
|
- [ ] **Compensating actions for failures** — if step 3 fails, undo steps 1-2
|
|
|
|
**LLM mistake:** Multi-aggregate coordination in a service with direct calls and no failure handling:
|
|
```elixir
|
|
def transfer(from, to, amount) do
|
|
Wallet.debit(from, amount) # What if this succeeds but next fails?
|
|
Wallet.credit(to, amount)
|
|
end
|
|
```
|
|
|
|
---
|
|
|
|
## Elixir/OTP Specific
|
|
|
|
### GenServer State Recovery
|
|
- [ ] **Use `handle_continue` for replay** — not `init/1` directly
|
|
- [ ] **Keep apply functions pure** — use a reducer pattern
|
|
- [ ] **Trap exits if cleanup needed** — but prefer stateless design
|
|
|
|
### Process Dictionary for Test Isolation
|
|
- [ ] **`Process.get/put` for store references** — allows per-test isolation
|
|
- [ ] **Set in GenServer init, read in public API** — callers don't pass store around
|
|
|
|
---
|
|
|
|
## Anti-Patterns to Flag Immediately
|
|
|
|
| Pattern | Problem |
|
|
|---------|---------|
|
|
| `DateTime.utc_now()` in apply | Non-deterministic replay |
|
|
| `OrderUpdated` with `changes` map | CRUD-in-disguise |
|
|
| Projection used as source of truth | Data inconsistency |
|
|
| Event "fix" migration | Violates immutability |
|
|
| Multi-aggregate in one transaction | Wrong boundaries |
|
|
| External call in apply | Side effects break replay |
|