3.6 KiB
Data Models
Treat data models as explicit boundary objects, and keep simple internal state in small value-like types.
Why
A lot of messy Python comes from making one class do four jobs at once:
- validate external input
- represent internal state
- serialize output
- own persistence or business behavior
Mature codebases usually split those concerns:
- Use dedicated boundary models where data enters or leaves the system.
- Validate and serialize explicitly.
- Use lightweight declarative classes for data-heavy, behavior-light state.
- Keep persistence, transport, and business logic from collapsing into one model.
When to use
Use validated boundary models when:
- data enters from HTTP, files, queues, or user input
- you need a predictable serialization contract
- callers need one clear place for validation and defaults
Use small value-like objects when:
- the object mainly carries state
- immutability helps reasoning
- behavior is narrow and local
When not to use
Do not treat a boundary model as proof that every internal object should also become a heavyweight validation model.
Do not let one model become schema, ORM wrapper, validator, service object, and side-effect manager at the same time.
Preferred shapes
Explicit boundary model
class UserIn(BaseModel):
email: str
name: str
user = UserIn.model_validate(payload)
serialized = user.model_dump()
Why this works:
- validation is explicit
- serialization is explicit
- the model’s job is clear
Small value-like internal class
@dataclass(frozen=True)
class Duration:
start: Instant
stop: Instant
Why this works:
- state is simple
- mutation is constrained
- the class stays easy to trust
Counterexamples
Hand-rolled serialization everywhere
class User:
def to_dict(self):
return {
"id": str(self.id),
"name": self.name,
"created_at": self.created_at.isoformat(),
}
Every model now invents its own boundary behavior.
One class owns every concern
class OrderModel:
def validate(self): ...
def save(self): ...
def send_webhook(self): ...
def render_html(self): ...
That is not a model. That is a junk drawer.
Source signals
Pydantic
pydantic/main.py:253-264saysBaseModel.__init__parses and validates input data and raisesValidationErroron bad input.pydantic/main.py:455-519definesmodel_dump(...)as an explicit serialization step with caller-controlled include/exclude behavior.pydantic/main.py:721-768definesmodel_validate(...) -> Selfas a named boundary-crossing API with explicit validation options.docs/index.md:68-107shows raw external data being coerced into typed fields and then dumped back out explicitly.docs/index.md:109-152shows invalid external input failing with structured per-field validation errors instead of ad hoc strings.
Attrs
docs/examples.md:24-44shows@definecreating lightweight typed classes with generated constructor, repr, and equality behavior.docs/examples.md:143-205uses keyword-only fields to keep construction explicit at the call site.docs/examples.md:209-220usesasdict(...)as an intentional conversion step.
Pytest
src/_pytest/timing.py:24-64modelsInstantandDurationas frozen dataclasses for simple internal timing state.
Related comparison
comparison/pydantic-vs-python.md
Bottom line
Boundary models should validate and serialize cleanly.
Internal models should stay small and honest.
If one model starts owning every concern in the system, split it before it turns to mud.