Refine Python pattern boundaries and Pydantic guidance

This commit is contained in:
Rodin
2026-06-02 00:53:38 +00:00
parent 60ffec18e4
commit 7007b73099
6 changed files with 273 additions and 83 deletions
+73 -23
View File
@@ -1,56 +1,106 @@
# Pydantic source notes
Repo: `pydantic/pydantic`
Local checkout: `/home/ubuntu/repos/rodin-sources/pydantic`
Local checkout: `/home/ubuntu/repos/rodin-bootstrap/upstream/pydantic`
## Why this repo is useful
- Pydantic is a strong source for boundary-object patterns: validating incoming data, preserving typed state, and serializing back out explicitly.
- It is also useful for validation-hook design because the docs distinguish several validator phases and call out their tradeoffs clearly.
- Pydantic is a strong source for runtime boundary-object patterns: validating incoming data, coercing raw values into typed fields, and serializing back out explicitly.
- It is also useful for validation-hook design because the docs distinguish validator phases and call out where those phases change the guarantees readers should expect.
- It is *not* a generic argument that every Python data object should become a `BaseModel`. The strongest repeated signals are boundary-oriented.
## Models are explicit validation + serialization boundaries
## Model construction is a validation boundary, not plain attribute assignment
### Repeated evidence
- `pydantic/main.py:119-145` defines `BaseModel` as the central abstraction and documents that models carry schema, field metadata, and decorator metadata.
- `pydantic/main.py:201-205` explicitly exposes model-level serializer and validator machinery as core parts of the abstraction.
- `docs/index.md:68-82` shows external data entering through model construction.
- `docs/index.md:82-89` immediately turns the model back into a plain data structure with `model_dump()`.
- `docs/index.md:109-152` shows invalid boundary data raising `ValidationError` with structured per-field errors instead of silently degrading.
- `pydantic/main.py:253-264` says `BaseModel.__init__` creates a model by parsing and validating input data and raises `ValidationError` if the input cannot form a valid model.
- `pydantic/main.py:263-264` routes construction through `self.__pydantic_validator__.validate_python(...)`, which is a much stronger runtime contract than normal Python object initialization.
- `pydantic/main.py:721-768` exposes `model_validate(...) -> Self` as an explicit alternative validation entrypoint with knobs for `strict`, `extra`, `from_attributes`, `by_alias`, and `by_name`.
- `docs/index.md:68-107` shows external data entering through model construction, being coerced into typed fields, and then becoming a typed model instance.
- `docs/index.md:93-107` explicitly explains coercions such as strings to integers, strings to datetimes, and bytes keys to strings.
### Why it matters
Repeated signal: Pydantic models are meant to sit at I/O boundaries. Input is validated/coerced at construction time; output is serialized through an explicit dump step.
Repeated signal: once you choose Pydantic, object construction is no longer just “assign the fields. It is a boundary-crossing operation that parses, coerces, and validates raw input.
### Caveat / counterexample
The strong pattern is not "models are your whole domain model." The evidence here is boundary-oriented: construct from external data, then call `model_dump()` when leaving the boundary again.
That makes Pydantic great for untrusted or external data. It does **not** automatically make it the right default for every small internal value object.
## Field annotations are runtime parsing and schema instructions, not just static hints
### Repeated evidence
- `pydantic/main.py:156-205` exposes typed class-level metadata such as `model_config`, `__pydantic_core_schema__`, `__pydantic_serializer__`, `__pydantic_validator__`, and `__pydantic_fields__`.
- `pydantic/main.py:167-168` documents a synthesized `__init__` signature for the model.
- `docs/index.md:93-104` explains field annotations in runtime terms: requiredness, accepted/coerced input shapes, and typed container expectations.
- `docs/index.md:99-100` ties `PositiveInt` directly to an annotated constrained type, showing that type declarations are part of the runtime contract.
### Why it matters
Repeated signal: in Pydantic, changing a field annotation can change runtime acceptance, coercion, and emitted schema behavior. These annotations are not mere editor decoration.
## Models are strongest as explicit boundary objects
### Repeated evidence
- `docs/index.md:68-82` starts from `external_data` and immediately feeds it into a model.
- `docs/index.md:82-89` then immediately uses `model_dump()` to cross back out of the model into plain data.
- `pydantic/main.py:455-519` defines `model_dump(...)` as an explicit serialization API with include/exclude, alias, unset/default/none filtering, and error-handling controls.
- `pydantic/main.py:521-569` provides `model_dump_json(...)` as the corresponding JSON-mode serialization boundary.
### Why it matters
Repeated signal: Pydantic models are designed to sit at runtime boundaries where input validation and output shaping matter. The repo keeps showing “raw external data in, explicit dump back out,” not “all domain state everywhere should live inside `BaseModel` forever.”
### Caveat / counterexample
The strong pattern is boundary ownership, not model monoculture. If an internal object only needs simple state and no runtime parsing or schema behavior, generic Python types may be clearer.
## Serialization is explicit and configurable
### Repeated evidence
- `docs/index.md:82-89` uses `model_dump()` as the normal way to convert a model to a dictionary.
- `pydantic/main.py:455-519` gives `model_dump(...)` explicit controls for aliasing, partial output, omission of unset/default/none values, round-tripping, and serialization error handling.
- `pydantic/main.py:493-496` shows that serialization errors are configurable behavior, not an afterthought.
### Why it matters
Repeated signal: Pydantic wants serialization to be named and explicit. That is stronger and safer than scattering hand-built dict shaping around the codebase.
## Validation errors are structured boundary output
### Repeated evidence
- `pydantic/main.py:253-257` documents `ValidationError` on invalid model construction.
- `pydantic/main.py:745-749` documents `ValidationError` on `model_validate(...)` as well.
- `docs/index.md:109-152` shows one bad input producing a list of per-field errors with `type`, `loc`, `msg`, `input`, and documentation URL.
### Why it matters
Repeated signal: Pydantic treats invalid input as a structured parsing result, not just a plain exception string. That is part of the contract callers and outer boundaries can build on.
## Validators are narrow and phase-aware
### Repeated evidence
- `docs/concepts/validators.md:91-114` shows an `after` field validator that checks one parsed field and must return the validated value.
- `docs/concepts/validators.md:160-167` explains that `before` validators run prior to internal parsing and therefore receive raw input.
- `docs/concepts/validators.md:220-252` demonstrates a `before` validator that reshapes raw input and then lets normal item validation continue.
- `docs/concepts/validators.md:94-114` shows an `after` field validator checking one parsed field and returning the validated value.
- `docs/concepts/validators.md:160-167` says `before` validators run before internal parsing and validation.
- `docs/concepts/validators.md:177-209` shows a `before` validator reshaping raw input while Pydantic still performs normal type validation afterward.
- `docs/concepts/validators.md:201-206` explicitly warns that `before` validators receive arbitrary raw input and therefore must account for more cases.
### Why it matters
Repeated signal: the best validator hooks are small in scope and explicit about phase:
- `before` for raw-input normalization
- `after` for post-parse invariants
- `after` for parsed-value invariants
This prevents validation logic from becoming an opaque second parser.
This keeps validator logic from becoming a second opaque parser.
## Validator mode choice has real behavioral consequences
## Validator mode choice changes guarantees
### Repeated evidence
- `docs/concepts/validators.md:160-164` warns that `before` validators should avoid careless mutation when raising later, especially with unions.
- `docs/concepts/validators.md:160-164` warns against careless mutation in `before` validators, especially when later raising errors and when unions are involved.
- `docs/concepts/validators.md:254-255` states that `plain` validators terminate validation immediately.
- `docs/concepts/validators.md:273-283` shows the consequence directly: a `PlainValidator` can return `'invalid'` for a field annotated as `int`, and Pydantic will accept it.
- `docs/concepts/validators.md:296-308` repeats the same consequence in decorator form.
### Why it matters
Repeated signal: validator mode is not just an implementation detail. It changes whether core type validation still runs.
Repeated signal: validator mode is not a cosmetic option. It changes whether core type validation still runs.
### Caveat / counterexample
This is the sharpest anti-pattern in the repo: `plain` validators are powerful, but they can bypass the type guarantee a reader expects from the annotation. Use them only when terminating validation is the actual goal.
This is the sharpest anti-pattern in the repo: `plain` validators are powerful, but they can bypass the type guarantee a reader expects from the annotation. Use them only when terminating validation is the real goal.
## Pattern candidates supported by this repo
- use typed models at I/O boundaries
- serialize explicitly with `model_dump()`
- use Pydantic models at runtime input/output boundaries
- treat model construction as a validation step, not plain assignment
- treat field annotations as runtime parsing contracts when using `BaseModel`
- serialize explicitly with `model_dump()` / `model_dump_json()`
- keep validators field-scoped and phase-aware
- treat `plain` validators as an escape hatch, not the default
- treat `plain` validators as an escape hatch, not the default