Files
python-patterns/patterns/data-models.md
T
2026-06-01 21:42:05 +00:00

3.8 KiB
Raw Blame History

Data Models

Treat data models as explicit boundary objects, and keep simple internal state in small value-like types.

Why

A lot of messy Python comes from making one class do four jobs at once:

  • validate external input
  • represent internal state
  • serialize output
  • own persistence or business behavior

Mature codebases usually split those concerns.

The recurring pattern is:

  • use explicit boundary models for validated input and output
  • keep serialization explicit
  • use lightweight value-like classes for simple internal state
  • avoid turning every data container into a god object

The pattern

  1. Use dedicated boundary models where data enters or leaves the system.
  2. Validate and serialize explicitly.
  3. Use lightweight declarative classes for data-heavy, behavior-light state.
  4. Keep persistence, transport, and business logic from collapsing into one model.

When to use

Use validated boundary models when:

  • data enters from HTTP, files, queues, or user input
  • you need a predictable serialization contract
  • callers need one clear place for validation and defaults

Use small value-like objects when:

  • the object mainly carries state
  • immutability helps reasoning
  • behavior is narrow and local

When not to use

Do not make every internal object a heavyweight validation model.

Do not scatter custom to_dict() logic across the codebase.

Do not let one model become schema, ORM wrapper, validator, service object, and side-effect manager at the same time.

Preferred shapes

Explicit boundary model

class UserIn(BaseModel):
    email: str
    name: str

user = UserIn.model_validate(payload)
serialized = user.model_dump()

Why this works:

  • validation is explicit
  • serialization is explicit
  • the models job is clear

Small value-like internal class

@dataclass(frozen=True)
class Duration:
    start: Instant
    stop: Instant

Why this works:

  • state is simple
  • mutation is constrained
  • the class stays easy to trust

Counterexamples

Hand-rolled serialization everywhere

class User:
    def to_dict(self):
        return {
            "id": str(self.id),
            "name": self.name,
            "created_at": self.created_at.isoformat(),
        }

Every model now invents its own boundary behavior.

One class owns every concern

class OrderModel:
    def validate(self): ...
    def save(self): ...
    def send_webhook(self): ...
    def render_html(self): ...

That is not a model. That is a junk drawer.

Source signals

Pydantic

  • pydantic/main.py:253-264 says BaseModel.__init__ parses and validates input data and raises ValidationError on bad input.
  • pydantic/main.py:455-519 defines model_dump(...) as an explicit serialization step with caller-controlled include/exclude behavior.
  • pydantic/main.py:721-768 defines model_validate(...) -> Self as a named boundary-crossing API.
  • docs/index.md:82-107 pairs model creation with model_dump() instead of treating instances as already wire-ready.
  • docs/index.md:109-124 shows invalid external input failing loudly with ValidationError.

Attrs

  • docs/examples.md:24-44 shows @define creating lightweight typed classes with generated constructor, repr, and equality behavior.
  • docs/examples.md:143-205 uses keyword-only fields to keep construction explicit at the call site.
  • docs/examples.md:209-220 uses asdict(...) as an intentional conversion step.

Pytest

  • src/_pytest/timing.py:24-64 models Instant and Duration as frozen dataclasses for simple internal timing state.

Bottom line

Boundary models should validate and serialize cleanly.

Internal models should stay small and honest.

If one model starts owning every concern in the system, split it before it turns to mud.