# Temporal: Architectural Analysis **Repo:** github.com/temporalio/temporal **Size:** 181M, 2,645 Go files, 8,958 commits, 290 contributors **Category:** Durable execution engine / workflow orchestrator --- ## Repository Shape ``` temporal/ ├── api/ # Generated protobuf service definitions ├── chasm/ # NEW: Component-based HSM Architecture ├── client/ # Internal service clients ├── cmd/ # Entry points (server, tools) ├── common/ # Shared infrastructure (massive) │ ├── backoff/ │ ├── channel/ │ ├── clock/ │ ├── dynamicconfig/ # 566 runtime-configurable settings │ ├── goro/ # Goroutine lifecycle management │ ├── log/ │ ├── metrics/ │ ├── namespace/ │ ├── persistence/ # Multi-backend storage abstraction │ ├── quotas/ # Rate limiting infrastructure │ ├── softassert/ # Production assertions (log, don't crash) │ └── tasks/ # Scheduler primitives (IWRR, FIFO, etc.) ├── components/ # Feature modules (callbacks, nexus, schedulers) ├── service/ │ ├── frontend/ # gRPC API handlers │ ├── history/ # Workflow state machine execution │ │ ├── hsm/ # Hierarchical State Machine framework │ │ └── queues/ # Task queue processing │ ├── matching/ # Task dispatch / worker routing │ └── worker/ # System workflows └── tests/ # Integration / functional tests ``` ### Import Hierarchy (most depended-upon) 1. `common` — 7,257 imports (the foundation) 2. `api` — 1,731 (protobuf contracts) 3. `service` — 1,693 (business logic) 4. `chasm` — 497 (rapidly growing new framework) 5. `tests` — 125 (integration harness) --- ## Key Architectural Patterns ### 1. Hierarchical State Machines (HSM) **PR #5494 (Mar 2024, 51 review comments):** The HSM framework is Temporal's core abstraction. Every workflow execution is a tree of state machines — the workflow itself, its activities, child workflows, timers, callbacks, nexus operations. ```go type StateMachine[S comparable] interface { TaskRegenerator State() S SetState(S) } type Transition[S comparable, SM StateMachine[S], E any] struct { Sources []S Destination S apply func(SM, E) (TransitionOutput, error) } ``` **The key insight:** Type-safe state transitions with source validation. `Transition.Apply()` checks `slices.Contains(t.Sources, sm.State())` before allowing the state change. Invalid transitions return `ErrInvalidTransition` rather than silently corrupting state. **From PR discussion (tdeebswihart):** > "I wish we'd gone with the standard `fsm` name here. > HSM keeps making me think of Hardware Security > Modules." **From PR discussion (bergundy, the author):** > "I don't consider this a final approach but I do think > it's a step in the right direction. We need to model > more state machines on top of this to form a more > solid API." This is explicit about being iterative. The framework shipped "not final" and evolved through real usage. ### 2. CHASM (Component Architecture for State Machines) **PR #6987 (Dec 2024–Jan 2025, 60 review comments):** CHASM replaces the old ad-hoc component system. It's a framework for building HSM-based components with: - Declarative field definitions - Mutable vs immutable contexts (type-enforced) - Parent-child component relationships - Task generation from transitions **Key discussion points:** **bergundy (author):** "I would put this in a top level `chasm` directory. There's likely going to be some chasm related code in other services." **yycptt:** "Having the implementation in the top level package instead of service/history feels weird." (Responded with re-export strategy.) **Sushisource:** "I think I prefer them separate, because what happens if you mutate something and then say 'not ready'? That would be some weird violation that shouldn't be possible, and separate contexts enforces that at the type level." → **Decision: Split MutableContext vs Context at the type level** to make invalid operations unrepresentable. This is the "making wrong things impossible" philosophy in action. ### 3. Goroutine Lifecycle (goro.Handle) **PR #1892 (Sep 2021, 15 review comments):** Introduced to fix a **double-close panic** in the task writer. The pattern is strikingly similar to CockroachDB's Handle (introduced 2025), but predates it by 3.5 years. ```go type Handle struct { context context.Context cancel context.CancelFunc done chan struct{} err atomic.Value } ``` **From PR discussion (mmcshane, author):** > "One thing you might not guess about Stop() is that > it removes itself from the parent matching engine. I > don't like this 'remove yourself' behavior because it > puts the control logic in the wrong place (i.e. in > the controlled object rather than the controller)." **Reviewer (paulnpdev):** > "If an expert questions what the code is doing, it > deserves a comment." This principle — "if a reviewer needs to ask, the code needs a comment" — is enforced through review culture. ### 4. Soft Assertions (softassert) **PR #7411 (Mar 2025, 46 review comments):** Production code that logs errors for invariant violations but doesn't crash: ```go softassert.That(logger, object.state == "ready", "object is not ready") ``` **From PR (stephanos):** > "**Why not panic?** Maybe in the future. For now, > we're happy with finding these failed assertions in > functional tests." This is Temporal's version of CockroachDB's `errors.AssertionFailed` — a way to mark "this should never happen" without crashing production. The key difference: CockroachDB promotes these to errors that may crash; Temporal logs them and continues. ### 5. Dynamic Configuration (566 settings) Temporal's most extreme pattern: **566 runtime- configurable settings** with type-safe resolution and namespace-scoped overrides. ```go var AdminEnableListHistoryTasks = NewGlobalBoolSetting( "admin.enableListHistoryTasks", true, `Description here`, ) ``` Settings use generics for type safety and resolve with precedence: task queue → namespace → global. The `Collection` uses `weak.Pointer` for cache invalidation (Go 1.24 feature) and `goro.Group` for background polling — showing how internal packages compose. ### 6. Persistence Plugin System (init registration) ```go func init() { sql.RegisterPlugin(PluginName, &plugin{ driver: &driver.PQDriver{}, }) } ``` Classic Go plugin pattern using `init()` + global registry. Supports: PostgreSQL (lib/pq + pgx), MySQL, SQLite, Cassandra. The init-time registration means import order matters (the `cmd/` packages import the plugins they want). ### 7. uber/fx Dependency Injection Temporal uses uber/fx for service construction. Each service has an `fx.go` that declares providers and consumers: ```go type GrpcServerOptionsParams struct { fx.In Logger log.Logger RPCFactory common.RPCFactory RetryableInterceptor *interceptor.RetryableInterceptor NamespaceRateLimitInterceptor interceptor.NamespaceRateLimitInterceptor `optional:"true"` } ``` This is unusual for Go — most projects avoid DI frameworks. Temporal justifies it because the service graph is genuinely complex (4 services × multiple backends × configurable interceptors). --- ## Code Quality Markers | Metric | Count | |--------|-------| | TODOs (non-test) | 738 | | FIXMEs | 0 | | HACKs | 5 | | Mock files | 152 | | Test files | 785 | | Integration tests | 113 | | Generic usages | 1,928 | **TODO style:** `// TODO: description` (no owner tag). Compare to CockroachDB's `// TODO(username):` — Temporal doesn't track WHO is responsible for a TODO. --- ## Patterns Unique to Temporal ### ShutdownOnce (safe multi-close) ```go func (c *ShutdownOnceImpl) Shutdown() { if atomic.CompareAndSwapInt32( &c.status, shutdownOnceStatusOpen, shutdownOnceStatusClosed, ) { close(c.channel) } } ``` CAS-based channel close that's safe to call multiple times. Solves the "close of closed channel" panic that plagues concurrent shutdown code. ### Interleaved Weighted Round Robin Scheduler Custom task scheduler that interleaves tasks from different channels based on configurable weights. Uses dynamic config for weight updates without restart. This is their answer to fair scheduling across namespaces with different SLAs. ### serviceerror Package (domain error types) Instead of wrapping standard errors, Temporal defines domain-specific error types that map directly to gRPC status codes: - `StickyWorkerUnavailable` - `ShardOwnershipLost` - `TaskAlreadyStarted` - `CurrentBranchChanged` Each is a struct implementing `error` with specific fields needed for retry/recovery decisions. --- ## Cross-Ecosystem Observations ### Temporal vs CockroachDB | Concern | Temporal | CockroachDB | |---------|----------|-------------| | Goroutine mgmt | goro.Handle (2021) | stop.Handle (2025) | | Assertions | softassert (log) | AssertionFailed (error) | | Config | 566 dynamic settings | Cluster settings | | DI | uber/fx | Manual wiring | | State machines | First-class HSM framework | Ad-hoc per component | | Error types | Domain structs → gRPC | Sentinel + wrapping | | TODO style | No owner | `TODO(username)` | ### Temporal vs Prometheus | Concern | Temporal | Prometheus | |---------|----------|------------| | Plugin system | init() registration | init() registration | | Logging | Custom log package | promslog (slog) | | Interfaces | Heavy use | Minimal, targeted | | Generics | 1,928 usages | Minimal | | Global state | Avoided (fx wiring) | Accepted for hot paths | ### Key Differences from CockroachDB 1. **uber/fx is a conscious choice** — Temporal's service graph is complex enough to justify a DI framework. CockroachDB explicitly avoids frameworks. 2. **HSM is THE architecture** — Everything in Temporal is a state machine. CockroachDB has state machines but doesn't have a unified framework for them. 3. **CHASM splits mutable/immutable at the type level** — This is Temporal's strongest pattern. Making mutation impossible in read paths via the type system. 4. **goro.Handle predates CockroachDB's Handle by 3.5 years** — Same problem (goroutine lifecycle), same solution (context + done channel + safe multi-stop), invented independently. --- ## Lessons for Code Review 1. **"If a reviewer needs to ask, the code needs a comment"** — Temporal's review culture promotes comments that explain non-obvious decisions. 2. **Separate mutable from immutable contexts at the type level** — Don't rely on documentation to prevent mutation in read paths. 3. **Soft assertions > panics in distributed systems** — Log the invariant violation, continue serving. Crash later in tests. 4. **Domain error types beat generic wrapping** when errors drive retry/routing decisions. A struct with specific fields is more useful than `fmt.Errorf`. 5. **DI frameworks are justified when the service graph is genuinely complex** — 4 services × multiple backends × configurable interceptors × optional features = real complexity. 6. **HSM frameworks centralize correctness** — Moving state transition validation into a framework means every component gets it right by construction instead of by discipline.