Files
ddd-patterns/DDD-CHECKLIST.md
Aaron Weiker b72c14f370 refocus on LLM mistakes, not textbook definitions
Based on actual review findings:
- Replay determinism: DateTime.utc_now() in apply, random in state
- Event design: OrderUpdated with changes map (CRUD-in-disguise)
- Projections as source of truth
- Suggesting event 'fixes' instead of compensating events
- Missing idempotency in handlers

Added Elixir/OTP specific patterns (handle_continue for replay,
Process dictionary for test isolation).

Anti-patterns table for quick flagging.
2026-05-11 00:29:58 -07:00

6.4 KiB

DDD & Event Sourcing Review Checklist

What LLMs get subtly wrong in event-sourced systems. Based on actual review findings.


Replay Determinism (CRITICAL)

Models frequently introduce non-determinism into replay paths:

Timestamps

  • No DateTime.utc_now() in apply/reduce functions — timestamp must come from the event
  • No System.monotonic_time() in state reconstruction — time-based decisions use event clock
  • If you need "when did this happen", the event carries that data

LLM mistake: Adding updated_at: DateTime.utc_now() to state in an apply function. This means replay produces different state than the original run.

Randomness

  • No :rand calls in apply functions — random values must be computed at command time and stored in event
  • No UUID generation during replay — IDs assigned at command/event creation, never reconstruction

LLM mistake: Generating order IDs in the apply function instead of the command handler.

External Calls

  • No HTTP/DB/external calls in apply functions — apply is pure: event in, state out
  • No side effects in apply — logging at debug level is acceptable, nothing else
  • External data needed for decisions must be fetched BEFORE emitting the event

LLM mistake: Fetching current price during apply to validate an order event. The price should be in the event.


Event Design

Self-Contained Events

  • Event contains all data needed to understand what happened — don't rely on external lookups
  • Include denormalized data that might change{product_id, product_name, price_at_time} not just {product_id}
  • Actor/causation metadata — who triggered this, correlation_id for tracing

LLM mistake: %OrderPlaced{order_id: "123", product_ids: [...]} — missing prices, quantities, everything needed to understand the order without external lookups.

Event Naming

  • Past tenseOrderPlaced, PaymentReceived, ItemShipped
  • Not commands — never PlaceOrder or ProcessPayment as event names
  • Specific over genericOrderItemQuantityAdjusted not OrderUpdated

LLM mistake: Creating OrderUpdated events with a changes map. This is CRUD-in-disguise, not event sourcing.

Event Immutability

  • Never suggest "fixing" or "updating" an event — events are immutable facts
  • Compensating events for correctionsOrderCorrected, AmountAdjusted
  • Schema versioning for evolution — old events must remain readable forever

LLM mistake: Suggesting a migration that modifies existing events to fix a bug. The correct answer is always a compensating event or projection rebuild.


Aggregate Boundaries

Size

  • Aggregate = consistency boundary, not data grouping — what MUST be consistent in one transaction?
  • Smaller is better — large aggregates = contention, scaling pain
  • Reference other aggregates by ID — never embed full objects

LLM mistake: Making Portfolio contain a list of Position aggregates. Each position should be its own aggregate referenced by ID.

Invariants

  • Business rules inside the aggregate — not in application services
  • Always valid after any operation — reject operations that would violate invariants
  • Constructor enforces required fields — no invalid aggregate instances

LLM mistake: Putting validation in a service layer: "OrderService checks if order has at least one item." The Order aggregate should reject an empty order.


Projections (Read Models)

Not Source of Truth

  • Projections are derived, disposable — can always rebuild from events
  • If projection is wrong, fix the projection logic and rebuild — don't "fix" projection data directly
  • One projection per query need — don't share if requirements differ

LLM mistake: Treating a projection table as canonical and syncing events to match it.

Eventual Consistency

  • Read models may lag behind writes — UI must handle this
  • Don't return projection state immediately after command — it might not be updated yet
  • Idempotent handlers — same event delivered twice produces same result

LLM mistake: API endpoint that does append_event(...) then immediately query_projection(...) and returns it. Race condition.


Idempotency

Event Handling

  • Idempotency keys for commands — especially payments, orders
  • Check for duplicate events before processing — at-least-once delivery is common
  • Make apply functions idempotent — applying same event twice = same state

LLM mistake: An event handler that increments a counter without checking if this event was already processed.

Command Handling

  • Use append_if_absent patterns — check before write, atomically
  • Return success for duplicate valid commands — don't error on retry

Process Managers / Sagas

  • Long-running coordination across aggregates — don't do this in application services
  • Own state machine with explicit states — what step are we on?
  • Handle timeouts — what if a step never completes?
  • Compensating actions for failures — if step 3 fails, undo steps 1-2

LLM mistake: Multi-aggregate coordination in a service with direct calls and no failure handling:

def transfer(from, to, amount) do
  Wallet.debit(from, amount)  # What if this succeeds but next fails?
  Wallet.credit(to, amount)
end

Elixir/OTP Specific

GenServer State Recovery

  • Use handle_continue for replay — not init/1 directly
  • Keep apply functions pure — use a reducer pattern
  • Trap exits if cleanup needed — but prefer stateless design

Process Dictionary for Test Isolation

  • Process.get/put for store references — allows per-test isolation
  • Set in GenServer init, read in public API — callers don't pass store around

Anti-Patterns to Flag Immediately

Pattern Problem
DateTime.utc_now() in apply Non-deterministic replay
OrderUpdated with changes map CRUD-in-disguise
Projection used as source of truth Data inconsistency
Event "fix" migration Violates immutability
Multi-aggregate in one transaction Wrong boundaries
External call in apply Side effects break replay