Files

134 lines
3.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Data Models
Treat data models as explicit boundary objects, and keep simple internal state in small value-like types.
## Why
A lot of messy Python comes from making one class do four jobs at once:
- validate external input
- represent internal state
- serialize output
- own persistence or business behavior
Mature codebases usually split those concerns:
1. Use dedicated boundary models where data enters or leaves the system.
2. Validate and serialize explicitly.
3. Use lightweight declarative classes for data-heavy, behavior-light state.
4. Keep persistence, transport, and business logic from collapsing into one model.
## When to use
Use validated boundary models when:
- data enters from HTTP, files, queues, or user input
- you need a predictable serialization contract
- callers need one clear place for validation and defaults
Use small value-like objects when:
- the object mainly carries state
- immutability helps reasoning
- behavior is narrow and local
## When not to use
Do not treat a boundary model as proof that every internal object should also become a heavyweight validation model.
Do not let one model become schema, ORM wrapper, validator, service object, and side-effect manager at the same time.
## Preferred shapes
### Explicit boundary model
```python
class UserIn(BaseModel):
email: str
name: str
user = UserIn.model_validate(payload)
serialized = user.model_dump()
```
Why this works:
- validation is explicit
- serialization is explicit
- the models job is clear
### Small value-like internal class
```python
@dataclass(frozen=True)
class Duration:
start: Instant
stop: Instant
```
Why this works:
- state is simple
- mutation is constrained
- the class stays easy to trust
## Counterexamples
### Hand-rolled serialization everywhere
```python
class User:
def to_dict(self):
return {
"id": str(self.id),
"name": self.name,
"created_at": self.created_at.isoformat(),
}
```
Every model now invents its own boundary behavior.
### One class owns every concern
```python
class OrderModel:
def validate(self): ...
def save(self): ...
def send_webhook(self): ...
def render_html(self): ...
```
That is not a model. That is a junk drawer.
## Source signals
### Pydantic
- `pydantic/main.py:253-264` says `BaseModel.__init__` parses and validates input data and raises `ValidationError` on bad input.
- `pydantic/main.py:455-519` defines `model_dump(...)` as an explicit serialization step with caller-controlled include/exclude behavior.
- `pydantic/main.py:721-768` defines `model_validate(...) -> Self` as a named boundary-crossing API with explicit validation options.
- `docs/index.md:68-107` shows raw external data being coerced into typed fields and then dumped back out explicitly.
- `docs/index.md:109-152` shows invalid external input failing with structured per-field validation errors instead of ad hoc strings.
### Attrs
- `docs/examples.md:24-44` shows `@define` creating lightweight typed classes with generated constructor, repr, and equality behavior.
- `docs/examples.md:143-205` uses keyword-only fields to keep construction explicit at the call site.
- `docs/examples.md:209-220` uses `asdict(...)` as an intentional conversion step.
### Pytest
- `src/_pytest/timing.py:24-64` models `Instant` and `Duration` as frozen dataclasses for simple internal timing state.
## Related comparison
- `comparison/pydantic-vs-python.md`
## Bottom line
Boundary models should validate and serialize cleanly.
Internal models should stay small and honest.
If one model starts owning every concern in the system, split it before it turns to mud.