Files

7.3 KiB

Pydantic source notes

Repo: pydantic/pydantic Local checkout: /home/ubuntu/repos/rodin-bootstrap/upstream/pydantic

Why this repo is useful

  • Pydantic is a strong source for runtime boundary-object patterns: validating incoming data, coercing raw values into typed fields, and serializing back out explicitly.
  • It is also useful for validation-hook design because the docs distinguish validator phases and call out where those phases change the guarantees readers should expect.
  • It is not a generic argument that every Python data object should become a BaseModel. The strongest repeated signals are boundary-oriented.

Model construction is a validation boundary, not plain attribute assignment

Repeated evidence

  • pydantic/main.py:253-264 says BaseModel.__init__ creates a model by parsing and validating input data and raises ValidationError if the input cannot form a valid model.
  • pydantic/main.py:263-264 routes construction through self.__pydantic_validator__.validate_python(...), which is a much stronger runtime contract than normal Python object initialization.
  • pydantic/main.py:721-768 exposes model_validate(...) -> Self as an explicit alternative validation entrypoint with knobs for strict, extra, from_attributes, by_alias, and by_name.
  • docs/index.md:68-107 shows external data entering through model construction, being coerced into typed fields, and then becoming a typed model instance.
  • docs/index.md:93-107 explicitly explains coercions such as strings to integers, strings to datetimes, and bytes keys to strings.

Why it matters

Repeated signal: once you choose Pydantic, object construction is no longer just “assign the fields.” It is a boundary-crossing operation that parses, coerces, and validates raw input.

Caveat / counterexample

That makes Pydantic great for untrusted or external data. It does not automatically make it the right default for every small internal value object.

Field annotations are runtime parsing and schema instructions, not just static hints

Repeated evidence

  • pydantic/main.py:156-205 exposes typed class-level metadata such as model_config, __pydantic_core_schema__, __pydantic_serializer__, __pydantic_validator__, and __pydantic_fields__.
  • pydantic/main.py:167-168 documents a synthesized __init__ signature for the model.
  • docs/index.md:93-104 explains field annotations in runtime terms: requiredness, accepted/coerced input shapes, and typed container expectations.
  • docs/index.md:99-100 ties PositiveInt directly to an annotated constrained type, showing that type declarations are part of the runtime contract.

Why it matters

Repeated signal: in Pydantic, changing a field annotation can change runtime acceptance, coercion, and emitted schema behavior. These annotations are not mere editor decoration.

Models are strongest as explicit boundary objects

Repeated evidence

  • docs/index.md:68-82 starts from external_data and immediately feeds it into a model.
  • docs/index.md:82-89 then immediately uses model_dump() to cross back out of the model into plain data.
  • pydantic/main.py:455-519 defines model_dump(...) as an explicit serialization API with include/exclude, alias, unset/default/none filtering, and error-handling controls.
  • pydantic/main.py:521-569 provides model_dump_json(...) as the corresponding JSON-mode serialization boundary.

Why it matters

Repeated signal: Pydantic models are designed to sit at runtime boundaries where input validation and output shaping matter. The repo keeps showing “raw external data in, explicit dump back out,” not “all domain state everywhere should live inside BaseModel forever.”

Caveat / counterexample

The strong pattern is boundary ownership, not model monoculture. If an internal object only needs simple state and no runtime parsing or schema behavior, generic Python types may be clearer.

Serialization is explicit and configurable

Repeated evidence

  • docs/index.md:82-89 uses model_dump() as the normal way to convert a model to a dictionary.
  • pydantic/main.py:455-519 gives model_dump(...) explicit controls for aliasing, partial output, omission of unset/default/none values, round-tripping, and serialization error handling.
  • pydantic/main.py:493-496 shows that serialization errors are configurable behavior, not an afterthought.

Why it matters

Repeated signal: Pydantic wants serialization to be named and explicit. That is stronger and safer than scattering hand-built dict shaping around the codebase.

Validation errors are structured boundary output

Repeated evidence

  • pydantic/main.py:253-257 documents ValidationError on invalid model construction.
  • pydantic/main.py:745-749 documents ValidationError on model_validate(...) as well.
  • docs/index.md:109-152 shows one bad input producing a list of per-field errors with type, loc, msg, input, and documentation URL.

Why it matters

Repeated signal: Pydantic treats invalid input as a structured parsing result, not just a plain exception string. That is part of the contract callers and outer boundaries can build on.

Validators are narrow and phase-aware

Repeated evidence

  • docs/concepts/validators.md:94-114 shows an after field validator checking one parsed field and returning the validated value.
  • docs/concepts/validators.md:160-167 says before validators run before internal parsing and validation.
  • docs/concepts/validators.md:177-209 shows a before validator reshaping raw input while Pydantic still performs normal type validation afterward.
  • docs/concepts/validators.md:201-206 explicitly warns that before validators receive arbitrary raw input and therefore must account for more cases.

Why it matters

Repeated signal: the best validator hooks are small in scope and explicit about phase:

  • before for raw-input normalization
  • after for parsed-value invariants

This keeps validator logic from becoming a second opaque parser.

Validator mode choice changes guarantees

Repeated evidence

  • docs/concepts/validators.md:160-164 warns against careless mutation in before validators, especially when later raising errors and when unions are involved.
  • docs/concepts/validators.md:254-255 states that plain validators terminate validation immediately.
  • docs/concepts/validators.md:273-283 shows the consequence directly: a PlainValidator can return 'invalid' for a field annotated as int, and Pydantic will accept it.
  • docs/concepts/validators.md:296-308 repeats the same consequence in decorator form.

Why it matters

Repeated signal: validator mode is not a cosmetic option. It changes whether core type validation still runs.

Caveat / counterexample

This is the sharpest anti-pattern in the repo: plain validators are powerful, but they can bypass the type guarantee a reader expects from the annotation. Use them only when terminating validation is the real goal.

Pattern candidates supported by this repo

  • use Pydantic models at runtime input/output boundaries
  • treat model construction as a validation step, not plain assignment
  • treat field annotations as runtime parsing contracts when using BaseModel
  • serialize explicitly with model_dump() / model_dump_json()
  • keep validators field-scoped and phase-aware
  • treat plain validators as an escape hatch, not the default