Files
python-patterns/patterns/data-models.md
T

3.6 KiB
Raw Blame History

Data Models

Treat data models as explicit boundary objects, and keep simple internal state in small value-like types.

Why

A lot of messy Python comes from making one class do four jobs at once:

  • validate external input
  • represent internal state
  • serialize output
  • own persistence or business behavior

Mature codebases usually split those concerns:

  1. Use dedicated boundary models where data enters or leaves the system.
  2. Validate and serialize explicitly.
  3. Use lightweight declarative classes for data-heavy, behavior-light state.
  4. Keep persistence, transport, and business logic from collapsing into one model.

When to use

Use validated boundary models when:

  • data enters from HTTP, files, queues, or user input
  • you need a predictable serialization contract
  • callers need one clear place for validation and defaults

Use small value-like objects when:

  • the object mainly carries state
  • immutability helps reasoning
  • behavior is narrow and local

When not to use

Do not treat a boundary model as proof that every internal object should also become a heavyweight validation model.

Do not let one model become schema, ORM wrapper, validator, service object, and side-effect manager at the same time.

Preferred shapes

Explicit boundary model

class UserIn(BaseModel):
    email: str
    name: str

user = UserIn.model_validate(payload)
serialized = user.model_dump()

Why this works:

  • validation is explicit
  • serialization is explicit
  • the models job is clear

Small value-like internal class

@dataclass(frozen=True)
class Duration:
    start: Instant
    stop: Instant

Why this works:

  • state is simple
  • mutation is constrained
  • the class stays easy to trust

Counterexamples

Hand-rolled serialization everywhere

class User:
    def to_dict(self):
        return {
            "id": str(self.id),
            "name": self.name,
            "created_at": self.created_at.isoformat(),
        }

Every model now invents its own boundary behavior.

One class owns every concern

class OrderModel:
    def validate(self): ...
    def save(self): ...
    def send_webhook(self): ...
    def render_html(self): ...

That is not a model. That is a junk drawer.

Source signals

Pydantic

  • pydantic/main.py:253-264 says BaseModel.__init__ parses and validates input data and raises ValidationError on bad input.
  • pydantic/main.py:455-519 defines model_dump(...) as an explicit serialization step with caller-controlled include/exclude behavior.
  • pydantic/main.py:721-768 defines model_validate(...) -> Self as a named boundary-crossing API with explicit validation options.
  • docs/index.md:68-107 shows raw external data being coerced into typed fields and then dumped back out explicitly.
  • docs/index.md:109-152 shows invalid external input failing with structured per-field validation errors instead of ad hoc strings.

Attrs

  • docs/examples.md:24-44 shows @define creating lightweight typed classes with generated constructor, repr, and equality behavior.
  • docs/examples.md:143-205 uses keyword-only fields to keep construction explicit at the call site.
  • docs/examples.md:209-220 uses asdict(...) as an intentional conversion step.

Pytest

  • src/_pytest/timing.py:24-64 models Instant and Duration as frozen dataclasses for simple internal timing state.
  • comparison/pydantic-vs-python.md

Bottom line

Boundary models should validate and serialize cleanly.

Internal models should stay small and honest.

If one model starts owning every concern in the system, split it before it turns to mud.