diff --git a/DDD-CHECKLIST.md b/DDD-CHECKLIST.md index 8d8c476..a021295 100644 --- a/DDD-CHECKLIST.md +++ b/DDD-CHECKLIST.md @@ -1,165 +1,148 @@ # DDD & Event Sourcing Review Checklist -Patterns and gotchas for domain-driven design and event sourcing. Focus on what models get subtly wrong. +What LLMs get subtly wrong in event-sourced systems. Based on actual review findings. --- -## Aggregate Design +## Replay Determinism (CRITICAL) -### Boundaries -- [ ] Aggregate defines a **consistency boundary**, not a data grouping -- [ ] Ask: "What MUST be consistent in a single transaction?" - that's the aggregate -- [ ] Prefer smaller aggregates; large aggregates = contention and scaling pain -- [ ] Reference other aggregates by ID, never by embedding the object +Models frequently introduce non-determinism into replay paths: -**Common mistake:** Making aggregates too large. An `Order` aggregate doesn't need to contain `Customer` - it needs a `customer_id`. +### Timestamps +- [ ] **No `DateTime.utc_now()` in apply/reduce functions** — timestamp must come from the event +- [ ] **No `System.monotonic_time()` in state reconstruction** — time-based decisions use event clock +- [ ] If you need "when did this happen", the event carries that data -### Invariants -- [ ] Business rules enforced **inside** the aggregate, not in services -- [ ] Aggregate is always valid after any operation (or rejects the operation) -- [ ] Constructor enforces required fields - no invalid aggregate should exist +**LLM mistake:** Adding `updated_at: DateTime.utc_now()` to state in an apply function. This means replay produces different state than the original run. -**Common mistake:** Putting validation in application services. If `Order` requires at least one line item, the `Order` aggregate enforces this, not `OrderService`. +### Randomness +- [ ] **No `:rand` calls in apply functions** — random values must be computed at command time and stored in event +- [ ] **No UUID generation during replay** — IDs assigned at command/event creation, never reconstruction -### Identity -- [ ] Aggregate root has a stable identity (UUID preferred over DB sequence) -- [ ] Identity assigned at creation, never changes -- [ ] Child entities within aggregate may have local IDs (only unique within aggregate) +**LLM mistake:** Generating order IDs in the apply function instead of the command handler. + +### External Calls +- [ ] **No HTTP/DB/external calls in apply functions** — apply is pure: event in, state out +- [ ] **No side effects in apply** — logging at debug level is acceptable, nothing else +- [ ] External data needed for decisions must be fetched BEFORE emitting the event + +**LLM mistake:** Fetching current price during apply to validate an order event. The price should be in the event. --- -## Event Sourcing +## Event Design -### Event Immutability -- [ ] Events are **facts that happened** - never modify, never delete -- [ ] Bad data? Emit a **compensating event** (e.g., `OrderCorrected`, `AmountAdjusted`) -- [ ] Event schema versioned; old events must remain readable forever +### Self-Contained Events +- [ ] **Event contains all data needed to understand what happened** — don't rely on external lookups +- [ ] **Include denormalized data that might change** — `{product_id, product_name, price_at_time}` not just `{product_id}` +- [ ] **Actor/causation metadata** — who triggered this, correlation_id for tracing -**Critical:** Models sometimes suggest "fixing" events in the store. This is always wrong. Events are immutable historical facts. +**LLM mistake:** `%OrderPlaced{order_id: "123", product_ids: [...]}` — missing prices, quantities, everything needed to understand the order without external lookups. ### Event Naming -- [ ] Events are **past tense**: `OrderPlaced`, `PaymentReceived`, `ItemShipped` -- [ ] Commands are **imperative**: `PlaceOrder`, `ProcessPayment`, `ShipItem` -- [ ] Events describe what happened, not what to do +- [ ] **Past tense** — `OrderPlaced`, `PaymentReceived`, `ItemShipped` +- [ ] **Not commands** — never `PlaceOrder` or `ProcessPayment` as event names +- [ ] **Specific over generic** — `OrderItemQuantityAdjusted` not `OrderUpdated` -### Event Content -- [ ] Include all data needed to understand the event in isolation -- [ ] Don't reference external state that might change -- [ ] Include actor/causation metadata (who, why, correlation_id) -- [ ] Avoid large payloads - events aren't documents +**LLM mistake:** Creating `OrderUpdated` events with a `changes` map. This is CRUD-in-disguise, not event sourcing. -**Common mistake:** Storing just IDs and looking up current state. Event should be self-contained: `ItemAdded { product_id, name, price, quantity }` not just `ItemAdded { product_id }`. +### Event Immutability +- [ ] **Never suggest "fixing" or "updating" an event** — events are immutable facts +- [ ] **Compensating events for corrections** — `OrderCorrected`, `AmountAdjusted` +- [ ] **Schema versioning for evolution** — old events must remain readable forever -### Aggregate Reconstruction -- [ ] Replay all events to rebuild current state -- [ ] `apply/2` is pure: event in, state out, no side effects -- [ ] Handle unknown event types gracefully (forward compatibility) +**LLM mistake:** Suggesting a migration that modifies existing events to fix a bug. The correct answer is always a compensating event or projection rebuild. --- -## Domain vs Integration Events +## Aggregate Boundaries -### Domain Events -- Internal to bounded context -- Rich, contains domain concepts -- Can evolve with internal refactoring -- Consumed by projections, process managers within same context +### Size +- [ ] **Aggregate = consistency boundary, not data grouping** — what MUST be consistent in one transaction? +- [ ] **Smaller is better** — large aggregates = contention, scaling pain +- [ ] **Reference other aggregates by ID** — never embed full objects -### Integration Events -- Cross bounded context communication -- Stable contract, versioned schema -- Minimal data (anti-corruption layer transforms as needed) -- Published to message bus/event broker +**LLM mistake:** Making `Portfolio` contain a list of `Position` aggregates. Each position should be its own aggregate referenced by ID. -**Common mistake:** Publishing internal domain events directly to other services. Use an anti-corruption layer to transform to integration events. +### Invariants +- [ ] **Business rules inside the aggregate** — not in application services +- [ ] **Always valid after any operation** — reject operations that would violate invariants +- [ ] **Constructor enforces required fields** — no invalid aggregate instances ---- - -## Eventual Consistency - -### Expectations -- [ ] Read models may lag behind writes (milliseconds to seconds typically) -- [ ] UI must handle "command accepted" != "projection updated" -- [ ] Idempotent event handlers (same event delivered twice = same result) - -### Patterns -- [ ] Optimistic UI: show expected state, reconcile when projection catches up -- [ ] Polling/subscription for read model updates -- [ ] Correlation IDs to track command → event → projection flow - -**Common mistake:** Returning the "updated" read model immediately after a command. The projection might not have processed the event yet. +**LLM mistake:** Putting validation in a service layer: "OrderService checks if order has at least one item." The Order aggregate should reject an empty order. --- ## Projections (Read Models) -### Design -- [ ] Projections are disposable - can always rebuild from events -- [ ] One projection per query need (don't share if requirements differ) -- [ ] Denormalized for query performance - no joins at read time -- [ ] Include projection version/position for consistency checks +### Not Source of Truth +- [ ] **Projections are derived, disposable** — can always rebuild from events +- [ ] **If projection is wrong, fix the projection logic and rebuild** — don't "fix" projection data directly +- [ ] **One projection per query need** — don't share if requirements differ -### Rebuilding -- [ ] Support full rebuild from event stream -- [ ] Handle schema migrations via rebuild or versioned projections -- [ ] Track last processed event position for resume +**LLM mistake:** Treating a projection table as canonical and syncing events to match it. -**Common mistake:** Treating projections as source of truth. If projection is wrong, fix the projection logic and rebuild - don't "fix" the projection data directly. +### Eventual Consistency +- [ ] **Read models may lag behind writes** — UI must handle this +- [ ] **Don't return projection state immediately after command** — it might not be updated yet +- [ ] **Idempotent handlers** — same event delivered twice produces same result + +**LLM mistake:** API endpoint that does `append_event(...)` then immediately `query_projection(...)` and returns it. Race condition. --- -## Snapshotting +## Idempotency -### When to Snapshot -- [ ] Only when replay performance is a problem (measure first) -- [ ] Typical threshold: 100-1000 events before snapshot becomes worthwhile -- [ ] Snapshot frequency based on write volume and rebuild tolerance +### Event Handling +- [ ] **Idempotency keys for commands** — especially payments, orders +- [ ] **Check for duplicate events before processing** — at-least-once delivery is common +- [ ] **Make apply functions idempotent** — applying same event twice = same state -### Implementation -- [ ] Snapshot = serialized aggregate state at event N -- [ ] Load snapshot, then replay events after snapshot -- [ ] Snapshot schema must evolve with aggregate (version snapshots) -- [ ] Keep events forever - snapshots are optimization, not replacement +**LLM mistake:** An event handler that increments a counter without checking if this event was already processed. -**Common mistake:** Snapshotting too early (premature optimization) or deleting events after snapshot (data loss). +### Command Handling +- [ ] **Use `append_if_absent` patterns** — check before write, atomically +- [ ] **Return success for duplicate valid commands** — don't error on retry --- ## Process Managers / Sagas -### Coordination -- [ ] Long-running processes spanning multiple aggregates/contexts -- [ ] Maintain own state (what step are we on, what's pending) -- [ ] React to events, emit commands -- [ ] Handle failures: compensating actions, retries, timeouts +- [ ] **Long-running coordination across aggregates** — don't do this in application services +- [ ] **Own state machine with explicit states** — what step are we on? +- [ ] **Handle timeouts** — what if a step never completes? +- [ ] **Compensating actions for failures** — if step 3 fails, undo steps 1-2 -### State Machine -- [ ] Explicit states and transitions -- [ ] Idempotent transitions (receiving same event twice is safe) -- [ ] Timeout handling for stuck processes - -**Common mistake:** Putting multi-aggregate coordination in application services with direct calls. Use process managers for anything that spans consistency boundaries. +**LLM mistake:** Multi-aggregate coordination in a service with direct calls and no failure handling: +```elixir +def transfer(from, to, amount) do + Wallet.debit(from, amount) # What if this succeeds but next fails? + Wallet.credit(to, amount) +end +``` --- -## Value Objects +## Elixir/OTP Specific -- [ ] Immutable (no setters, all state set in constructor) -- [ ] Equality by value, not reference (`Money(100, :USD) == Money(100, :USD)`) -- [ ] Self-validating (invalid value object cannot be constructed) -- [ ] No identity - two VOs with same values are interchangeable +### GenServer State Recovery +- [ ] **Use `handle_continue` for replay** — not `init/1` directly +- [ ] **Keep apply functions pure** — use a reducer pattern +- [ ] **Trap exits if cleanup needed** — but prefer stateless design -**Examples:** `Money`, `EmailAddress`, `DateRange`, `Coordinates`, `OrderLineItem` - -**Common mistake:** Making value objects mutable or giving them database IDs. +### Process Dictionary for Test Isolation +- [ ] **`Process.get/put` for store references** — allows per-test isolation +- [ ] **Set in GenServer init, read in public API** — callers don't pass store around --- -## Anti-Patterns to Flag +## Anti-Patterns to Flag Immediately -- **Anemic domain model**: Aggregates with only getters/setters, logic in services -- **God aggregate**: Everything in one aggregate, massive contention -- **Event sourcing the UI**: Storing UI state changes as domain events -- **Synchronous projections**: Blocking command until projection updates -- **Shared kernel abuse**: Too much shared code between bounded contexts -- **CRUD in disguise**: `EntityUpdated` events that just dump all fields +| Pattern | Problem | +|---------|---------| +| `DateTime.utc_now()` in apply | Non-deterministic replay | +| `OrderUpdated` with `changes` map | CRUD-in-disguise | +| Projection used as source of truth | Data inconsistency | +| Event "fix" migration | Violates immutability | +| Multi-aggregate in one transaction | Wrong boundaries | +| External call in apply | Side effects break replay | diff --git a/README.md b/README.md index 2707066..7230bab 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,32 @@ # DDD & Event Sourcing Patterns -Review guidance for domain-driven design and event sourcing. Focus on subtle mistakes models make and patterns they get wrong. - -## Contents - -- `DDD-CHECKLIST.md` - The review checklist covering: - - Aggregate Design (boundaries, invariants, identity) - - Event Sourcing (immutability, naming, content, reconstruction) - - Domain vs Integration Events - - Eventual Consistency - - Projections (read models) - - Snapshotting - - Process Managers / Sagas - - Value Objects - - Anti-Patterns to Flag +What LLMs get subtly wrong in event-sourced systems. Based on actual code review findings. ## Why This Exists -Models know the textbook definitions but often get the nuances wrong: -- Making aggregates too large (data grouping vs consistency boundary) +LLMs know the textbook definitions of event sourcing and DDD. But they make subtle mistakes: + +- Adding `DateTime.utc_now()` in apply functions (breaks replay determinism) +- Creating `OrderUpdated` events with a changes map (CRUD-in-disguise) +- Treating projections as source of truth - Suggesting "fixes" to immutable events -- Mixing domain and integration events -- Expecting synchronous read model updates -- Snapshotting prematurely +- Making aggregates too large (data grouping vs consistency boundary) +- Multi-aggregate coordination in services without saga patterns This checklist catches those mistakes. +## Contents + +- `DDD-CHECKLIST.md` - Review prompts covering: + - Replay Determinism (CRITICAL) + - Event Design + - Aggregate Boundaries + - Projections + - Idempotency + - Process Managers / Sagas + - Elixir/OTP Specific + - Anti-patterns table + ## Integration ```yaml