134 lines
3.6 KiB
Markdown
134 lines
3.6 KiB
Markdown
# Data Models
|
||
|
||
Treat data models as explicit boundary objects, and keep simple internal state in small value-like types.
|
||
|
||
## Why
|
||
|
||
A lot of messy Python comes from making one class do four jobs at once:
|
||
|
||
- validate external input
|
||
- represent internal state
|
||
- serialize output
|
||
- own persistence or business behavior
|
||
|
||
Mature codebases usually split those concerns:
|
||
|
||
1. Use dedicated boundary models where data enters or leaves the system.
|
||
2. Validate and serialize explicitly.
|
||
3. Use lightweight declarative classes for data-heavy, behavior-light state.
|
||
4. Keep persistence, transport, and business logic from collapsing into one model.
|
||
|
||
## When to use
|
||
|
||
Use validated boundary models when:
|
||
|
||
- data enters from HTTP, files, queues, or user input
|
||
- you need a predictable serialization contract
|
||
- callers need one clear place for validation and defaults
|
||
|
||
Use small value-like objects when:
|
||
|
||
- the object mainly carries state
|
||
- immutability helps reasoning
|
||
- behavior is narrow and local
|
||
|
||
## When not to use
|
||
|
||
Do not treat a boundary model as proof that every internal object should also become a heavyweight validation model.
|
||
|
||
Do not let one model become schema, ORM wrapper, validator, service object, and side-effect manager at the same time.
|
||
|
||
## Preferred shapes
|
||
|
||
### Explicit boundary model
|
||
|
||
```python
|
||
class UserIn(BaseModel):
|
||
email: str
|
||
name: str
|
||
|
||
user = UserIn.model_validate(payload)
|
||
serialized = user.model_dump()
|
||
```
|
||
|
||
Why this works:
|
||
|
||
- validation is explicit
|
||
- serialization is explicit
|
||
- the model’s job is clear
|
||
|
||
### Small value-like internal class
|
||
|
||
```python
|
||
@dataclass(frozen=True)
|
||
class Duration:
|
||
start: Instant
|
||
stop: Instant
|
||
```
|
||
|
||
Why this works:
|
||
|
||
- state is simple
|
||
- mutation is constrained
|
||
- the class stays easy to trust
|
||
|
||
## Counterexamples
|
||
|
||
### Hand-rolled serialization everywhere
|
||
|
||
```python
|
||
class User:
|
||
def to_dict(self):
|
||
return {
|
||
"id": str(self.id),
|
||
"name": self.name,
|
||
"created_at": self.created_at.isoformat(),
|
||
}
|
||
```
|
||
|
||
Every model now invents its own boundary behavior.
|
||
|
||
### One class owns every concern
|
||
|
||
```python
|
||
class OrderModel:
|
||
def validate(self): ...
|
||
def save(self): ...
|
||
def send_webhook(self): ...
|
||
def render_html(self): ...
|
||
```
|
||
|
||
That is not a model. That is a junk drawer.
|
||
|
||
## Source signals
|
||
|
||
### Pydantic
|
||
|
||
- `pydantic/main.py:253-264` says `BaseModel.__init__` parses and validates input data and raises `ValidationError` on bad input.
|
||
- `pydantic/main.py:455-519` defines `model_dump(...)` as an explicit serialization step with caller-controlled include/exclude behavior.
|
||
- `pydantic/main.py:721-768` defines `model_validate(...) -> Self` as a named boundary-crossing API with explicit validation options.
|
||
- `docs/index.md:68-107` shows raw external data being coerced into typed fields and then dumped back out explicitly.
|
||
- `docs/index.md:109-152` shows invalid external input failing with structured per-field validation errors instead of ad hoc strings.
|
||
|
||
### Attrs
|
||
|
||
- `docs/examples.md:24-44` shows `@define` creating lightweight typed classes with generated constructor, repr, and equality behavior.
|
||
- `docs/examples.md:143-205` uses keyword-only fields to keep construction explicit at the call site.
|
||
- `docs/examples.md:209-220` uses `asdict(...)` as an intentional conversion step.
|
||
|
||
### Pytest
|
||
|
||
- `src/_pytest/timing.py:24-64` models `Instant` and `Duration` as frozen dataclasses for simple internal timing state.
|
||
|
||
## Related comparison
|
||
|
||
- `comparison/pydantic-vs-python.md`
|
||
|
||
## Bottom line
|
||
|
||
Boundary models should validate and serialize cleanly.
|
||
|
||
Internal models should stay small and honest.
|
||
|
||
If one model starts owning every concern in the system, split it before it turns to mud.
|