Initial extracted documentation set

This commit is contained in:
Rodin
2026-06-01 21:42:05 +00:00
commit 60ffec18e4
17 changed files with 1590 additions and 0 deletions
+117
View File
@@ -0,0 +1,117 @@
# Async Boundaries
Keep sync and async APIs as separate, explicit surfaces when their semantics differ.
## Why
Async stops being an implementation detail as soon as it changes:
- how resources are acquired and released
- whether methods must be awaited
- which transport or session types are valid
- how cancellation behaves
- whether the caller needs an event loop
Trying to hide that behind one magical API usually makes things worse. The common failure mode is a fake sync wrapper over async internals: it breaks inside existing event loops, hides resource lifetime, and muddies cancellation.
Mature libraries usually accept the split and make it visible.
## The pattern
1. Expose separate sync and async entrypoints when semantics differ.
2. Keep their shapes parallel where that helps learnability.
3. Keep resource and transport types distinct.
4. Make lifecycle visible with normal sync and async context management.
5. Do not smuggle event-loop control into a sync-looking API.
## When to use
Use this when:
- your library owns network, database, filesystem, or other long-lived resources
- sync and async variants need different transport or session implementations
- callers need predictable lifetime control
- the library must work in many runtime environments
## When not to use
Do not split APIs just for symmetry.
Skip the split when:
- async adds no meaningful semantic difference
- the operation is trivial and one-shot
- a separate async API would mostly duplicate noise
But if the alternative requires hidden `asyncio.run(...)`, loop-detection tricks, or silent runtime switching, the split is probably the cleaner design.
## Preferred shape
```python
class Client:
def get(self, url: str) -> Response:
...
class AsyncClient:
async def get(self, url: str) -> Response:
...
```
Make the surfaces parallel enough to learn once, but distinct enough that their lifecycle stays honest.
## Why this works
- callers immediately know whether `await` is involved
- transport types stay correct
- connection pooling and cleanup remain visible
- cancellation behavior is not hidden behind sync-looking calls
## Counterexamples
### Fake sync wrapper over async internals
```python
def get(url: str) -> Response:
return asyncio.run(_async_get(url))
```
This breaks in environments that already have an event loop and hides lifecycle costs.
### One class with mode flags
```python
client = Client(async_mode=True)
```
Now methods, cleanup, and caller expectations depend on ambient configuration instead of the type.
### Shared transport types that are not really shared
If one path needs `BaseTransport` and the other needs `AsyncBaseTransport`, pretending they are interchangeable is lying to the caller.
## Source signals
### HTTPX
- `httpx/_client.py:594-661` defines `Client` as the sync entrypoint and documents that it can be shared between threads.
- `httpx/_client.py:639-660` types the sync constructor in sync-native terms, including `BaseTransport`.
- `httpx/_client.py:1275-1304` uses normal sync context management for lifecycle.
- `httpx/_client.py:1307-1375` defines `AsyncClient` separately and documents task-sharing semantics.
- `httpx/_client.py:1316-1318` shows async usage with `async with` and `await`.
- `httpx/_client.py:1353-1374` keeps the constructor parallel but switches to async-native types like `AsyncBaseTransport`.
- `httpx/_client.py:1445-1452` initializes `AsyncHTTPTransport` on the async path rather than pretending the sync transport is reusable.
### SQLAlchemy
- `examples/asyncio/async_orm.py:15-18` imports async-specific engine and session primitives.
- `examples/asyncio/async_orm.py:61-67` creates an `async_sessionmaker(...)` and enters explicit async session and transaction scopes.
- `examples/asyncio/async_orm.py:78-104` uses async-native query and commit methods.
- `examples/inheritance/joined.py:16` imports the sync `Session` separately.
- `examples/inheritance/joined.py:93-120` shows the corresponding sync session lifetime and explicit commit boundary.
## Bottom line
If sync and async usage have different semantics, give them different types.
Parallel APIs are good.
Pretending the difference is not there is not.
+140
View File
@@ -0,0 +1,140 @@
# Data Models
Treat data models as explicit boundary objects, and keep simple internal state in small value-like types.
## Why
A lot of messy Python comes from making one class do four jobs at once:
- validate external input
- represent internal state
- serialize output
- own persistence or business behavior
Mature codebases usually split those concerns.
The recurring pattern is:
- use explicit boundary models for validated input and output
- keep serialization explicit
- use lightweight value-like classes for simple internal state
- avoid turning every data container into a god object
## The pattern
1. Use dedicated boundary models where data enters or leaves the system.
2. Validate and serialize explicitly.
3. Use lightweight declarative classes for data-heavy, behavior-light state.
4. Keep persistence, transport, and business logic from collapsing into one model.
## When to use
Use validated boundary models when:
- data enters from HTTP, files, queues, or user input
- you need a predictable serialization contract
- callers need one clear place for validation and defaults
Use small value-like objects when:
- the object mainly carries state
- immutability helps reasoning
- behavior is narrow and local
## When not to use
Do not make every internal object a heavyweight validation model.
Do not scatter custom `to_dict()` logic across the codebase.
Do not let one model become schema, ORM wrapper, validator, service object, and side-effect manager at the same time.
## Preferred shapes
### Explicit boundary model
```python
class UserIn(BaseModel):
email: str
name: str
user = UserIn.model_validate(payload)
serialized = user.model_dump()
```
Why this works:
- validation is explicit
- serialization is explicit
- the models job is clear
### Small value-like internal class
```python
@dataclass(frozen=True)
class Duration:
start: Instant
stop: Instant
```
Why this works:
- state is simple
- mutation is constrained
- the class stays easy to trust
## Counterexamples
### Hand-rolled serialization everywhere
```python
class User:
def to_dict(self):
return {
"id": str(self.id),
"name": self.name,
"created_at": self.created_at.isoformat(),
}
```
Every model now invents its own boundary behavior.
### One class owns every concern
```python
class OrderModel:
def validate(self): ...
def save(self): ...
def send_webhook(self): ...
def render_html(self): ...
```
That is not a model. That is a junk drawer.
## Source signals
### Pydantic
- `pydantic/main.py:253-264` says `BaseModel.__init__` parses and validates input data and raises `ValidationError` on bad input.
- `pydantic/main.py:455-519` defines `model_dump(...)` as an explicit serialization step with caller-controlled include/exclude behavior.
- `pydantic/main.py:721-768` defines `model_validate(...) -> Self` as a named boundary-crossing API.
- `docs/index.md:82-107` pairs model creation with `model_dump()` instead of treating instances as already wire-ready.
- `docs/index.md:109-124` shows invalid external input failing loudly with `ValidationError`.
### Attrs
- `docs/examples.md:24-44` shows `@define` creating lightweight typed classes with generated constructor, repr, and equality behavior.
- `docs/examples.md:143-205` uses keyword-only fields to keep construction explicit at the call site.
- `docs/examples.md:209-220` uses `asdict(...)` as an intentional conversion step.
### Pytest
- `src/_pytest/timing.py:24-64` models `Instant` and `Duration` as frozen dataclasses for simple internal timing state.
## Bottom line
Boundary models should validate and serialize cleanly.
Internal models should stay small and honest.
If one model starts owning every concern in the system, split it before it turns to mud.
+140
View File
@@ -0,0 +1,140 @@
# Error Handling
Use exception types to encode what callers can do next.
## Why
Good Python libraries do not collapse every failure into `RuntimeError` or `Exception`. They shape errors around recovery boundaries:
- one base type for “something in this subsystem failed”
- narrower subtypes when callers need different recovery
- structured fields when the branch depends on data, not wording
That gives callers a clean ladder:
- catch broadly at subsystem boundaries
- catch narrowly when retry/report/ignore differs
- surface better API or CLI errors without parsing strings
## The pattern
1. Define a subsystem-level base exception.
2. Add subtypes only when callers need different handling.
3. Put structured context on the exception when branching depends on it.
4. Translate internal failures at API, CLI, or transport boundaries.
## When to use
Use this when:
- a module or library exposes a public API
- failure modes need different handling
- an outer boundary must turn internal failures into user-facing errors
- retry, ignore, and abort decisions differ by failure kind
## When not to use
Do not build a hierarchy when:
- the code is tiny and has one obvious failure mode
- every failure is handled the same way
- the only distinction is wording, not behavior
- a normal return value like `None` is already the contract
Do not make callers parse exception text. If the distinction matters, make it a type or a field.
## Good shape
```python
class MailError(Exception):
pass
class TemporaryMailError(MailError):
pass
class PermanentMailError(MailError):
pass
class MailRejected(PermanentMailError):
def __init__(self, code: int, reason: str) -> None:
super().__init__(reason)
self.code = code
self.reason = reason
```
Caller:
```python
try:
send_mail(message)
except TemporaryMailError:
retry_later(message)
except PermanentMailError as exc:
mark_failed(message, reason=str(exc))
```
## Counterexamples
### Stringly-typed branching
```python
try:
do_work()
except Exception as exc:
if "timeout" in str(exc).lower():
retry()
```
The recovery rule is hiding in fragile text matching.
### One catch-all with no domain meaning
```python
class AppError(Exception):
pass
raise AppError("not found")
raise AppError("permission denied")
raise AppError("timeout")
```
Callers cannot branch meaningfully.
### Boundary types leaking into the core
```python
from fastapi import HTTPException
def charge_card(card: Card) -> Receipt:
if card.expired:
raise HTTPException(status_code=400, detail="expired card")
```
This couples domain logic to one transport. Raise a domain error here; translate to HTTP at the boundary.
## Source signals
### Stdlib / CPython
- `Lib/smtplib.py:69-71` defines `SMTPException` as the module-wide base type.
- `Lib/smtplib.py:88-100` defines `SMTPResponseException` and stores structured fields on the exception itself: `smtp_code` and `smtp_error`.
- `Lib/smtplib.py:102-125` adds subtype-specific payload like `sender` on `SMTPSenderRefused` and `recipients` on `SMTPRecipientsRefused`.
### HTTPX
- `httpx/_exceptions.py:74-90` defines `HTTPError` as a broad catch point and explicitly documents it as useful around request + `raise_for_status()` flows.
- `httpx/_exceptions.py:107-125` narrows that into `RequestError` and `TransportError` for request-time failures.
- `httpx/_exceptions.py:132-160` further splits timeout handling into `ConnectTimeout`, `ReadTimeout`, `WriteTimeout`, and `PoolTimeout`.
### Click
- `src/click/exceptions.py:35-65` defines `ClickException` with behavior, not just categorization: an exit code and a `show()` renderer.
- `src/click/exceptions.py:68-111` makes `UsageError` a narrower subtype with a different exit code and help-aware output.
## Bottom line
If callers need different behavior, give them different exception types.
If callers need details, attach fields.
If an outer layer needs user-facing output, translate there instead of pushing boundary concerns through the whole codebase.
+127
View File
@@ -0,0 +1,127 @@
# Module Design
Design a small, stable public surface. Keep implementation modules movable behind it.
## Why
Python makes it easy to publish internals by accident:
- every file is importable
- helpers become de facto API once users depend on them
- refactors turn into breaking changes when file layout becomes the contract
Mature libraries push back on that. They usually:
- publish one obvious import surface
- re-export supported names deliberately
- keep internal modules non-authoritative
- use `__all__` when the boundary needs to be explicit
That buys two things:
- simpler imports for callers
- freedom to reorganize internals later
## The pattern
1. Decide what callers should import.
2. Re-export those names from the package boundary.
3. Keep implementation details in internal modules.
4. Use `__all__` when you want an explicit contract.
5. Treat internal file layout as private unless you intentionally publish it.
## When to use
Use this when:
- a package spans multiple modules
- internals will evolve faster than the public API
- you want callers thinking in domain concepts, not filenames
- compatibility matters across releases
## When not to use
Do not build a facade when:
- the package is tiny and direct imports are already clear
- the abstraction boundary is still moving fast
- re-exporting would turn `__init__.py` into a junk drawer
A curated surface is not the same as a flat surface. Keep structure where the concepts are meaningfully different.
## Preferred shapes
### Package facade over internal modules
```python
# mypkg/__init__.py
from ._client import Client
from ._errors import AppError
from ._models import Item
__all__ = ["Client", "AppError", "Item"]
```
Why this works:
- callers learn one stable import path
- internals can move without import churn
- the package advertises its real contract
### Explicit module contract
```python
__all__ = ["Number", "Complex", "Real", "Rational", "Integral"]
```
This says: these names are supported; everything else is implementation detail.
## Counterexamples
### File layout becomes the API by accident
```python
from mypkg.utils import helper_a
from mypkg.impl_v2 import thing
from mypkg.more_helpers import other_thing
```
Refactoring internal modules now breaks users.
### Everything dumped into `__init__.py`
If `__init__.py` exports fifty unrelated names, you did not create a clean facade. You created autocomplete noise.
### Public API mirrors the folder tree too literally
If callers need to know todays internal layout to use the library, the boundary is still underdesigned.
## Source signals
### CPython
- `Lib/numbers.py:8-23` warns that published ABC APIs are hard to change and should be designed carefully.
- `Lib/numbers.py:35` publishes a narrow `__all__` rather than treating every helper as public.
- `Lib/operator.py:13-15` and `Lib/smtplib.py:55-58` do the same in stdlib modules with mixed public/internal names.
### Pytest
- `src/pytest/__init__.py:6-80` builds the top-level `pytest` API by importing from many `_pytest.*` internals.
- `src/pytest/__init__.py:98-186` then pins that facade with an explicit `__all__`.
### HTTPX
- `httpx/__init__.py:1-12` re-exports the package surface from internal modules such as `._client`, `._exceptions`, and `._models`.
- `httpx/__init__.py:29-100` defines the supported top-level export list explicitly.
- `httpx/__init__.py:103-106` rewrites exported objects `__module__` to `httpx`, reinforcing the facade instead of leaking internal filenames.
### Click
- `src/click/__init__.py:10-75` exposes the package through re-exports.
- `src/click/__init__.py:77-126` keeps compatibility shims and deprecations at the boundary instead of freezing old internal layout forever.
## Bottom line
Make the public API intentional.
Callers should depend on your concepts, not your current file tree.
+137
View File
@@ -0,0 +1,137 @@
# Testing
Use fixtures for reusable resource setup, parametrization for behavior matrices, and explicit boundary seams instead of ad hoc mocking.
## Why
Good Python tests optimize for three things at once:
- local readability
- cheap variation across inputs and modes
- reusable setup and cleanup without hiding intent
The mature pattern is not just “use pytest.” It is:
- model resources with fixtures
- make fixture lifetime visible with `yield` when cleanup matters
- use parametrization when one behavior should hold across several inputs
- test through boundary seams like transports instead of patching internals blindly
## The pattern
1. Use fixtures for shared setup and resources.
2. Use `yield` fixtures when setup and teardown both matter.
3. Use parametrization when the assertion shape is the same but inputs vary.
4. Prefer explicit seams over invasive mocking.
5. Keep the test body focused on behavior, not scaffolding.
## When to use
Use fixtures when:
- multiple tests need the same resource wiring
- setup or cleanup would otherwise dominate the body
- the setup is a dependency, not the behavior under test
Use parametrization when:
- one behavior should hold across several inputs or modes
- the data varies but the test story stays the same
Use transport or injected seams when:
- the behavior crosses I/O boundaries
- you want realistic flow without spinning up the whole world
## When not to use
Do not hide essential behavior behind a giant fixture tower.
Do not parametrize cases that deserve different narratives or different assertions.
Do not call fixtures directly like helper functions; if you want a helper, write a helper.
Do not mock deep internals when a cleaner external seam exists.
## Preferred shapes
### Yield fixture for lifecycle
```python
@pytest.fixture
def resource():
obj = make_resource()
yield obj
obj.close()
```
This keeps setup and teardown obvious.
### Parametrization for behavior matrices
```python
@pytest.mark.parametrize("mode", ["prepend", "append", "importlib"])
def test_import_behavior(mode: str) -> None:
...
```
One behavior, several inputs, no duplicated body.
### Boundary seam instead of monkeypatch soup
```python
transport = httpx.MockTransport(handler)
client = httpx.Client(transport=transport)
```
This is usually cleaner than patching internals in three places.
## Counterexamples
### Repeated setup in every test
```python
def test_a():
client = make_client()
tmpdir = make_tmpdir()
seed_db()
def test_b():
client = make_client()
tmpdir = make_tmpdir()
seed_db()
```
The scaffolding overwhelms the behavior.
### Fixture tower opacity
If understanding the test requires opening six fixtures before reading one assertion, the abstraction has gone too far.
### Calling fixtures directly
Pytest explicitly rejects this because fixtures are injected dependencies, not disguised utility functions.
## Source signals
### Pytest
- `src/_pytest/fixtures.py:1378-1440` makes the contract explicit twice: calling a fixture directly is an error, and `yield` fixtures run teardown code after the test outcome.
- `testing/test_threadexception.py:84-91` shows a real `yield` fixture with post-test cleanup work after the `yield`.
- `testing/acceptance_test.py:158-169` uses `@pytest.mark.parametrize(...)` to check one behavior across multiple import modes without cloning the test body.
- `testing/acceptance_test.py:561-574` shows another compact parametrized case where only the example data changes.
### HTTPX
- `httpx/_transports/asgi.py:63-83` exposes `ASGITransport` as an in-process integration seam and even documents `raise_app_exceptions=False` for testing 500 responses.
- `httpx/_transports/mock.py:15-43` exposes `MockTransport` as a first-class request/response seam for tests.
- `httpx/_client.py:639-660` accepts `transport=` directly on `Client`, which is what makes transport substitution a normal testing path instead of a hack.
## Bottom line
A good Python test makes the behavior easy to see and the environment cheap to vary.
Use fixtures for lifetime.
Use parametrization for variation.
Use explicit seams instead of brittle patching.
+119
View File
@@ -0,0 +1,119 @@
# Typing
Use types to describe accepted shapes and behavioral contracts, not to pretend Python is a different language.
## Why
Good Python typing improves APIs in two ways:
- callers can see which shapes are accepted
- maintainers can preserve real boundaries without smearing `Any` everywhere
The mature pattern is not “make everything maximally abstract.” It is:
- use structural typing when capability matters more than inheritance
- use explicit aliases and unions for ergonomic public inputs
- keep public APIs typed even when internals stay dynamic
That gives users a real contract without freezing implementation choices too early.
## The pattern
1. Type public APIs precisely.
2. Prefer `Protocol` when callers care about behavior, not ancestry.
3. Use explicit unions and aliases for user-facing flexibility.
4. Keep dynamic internals from leaking into the public contract.
5. Avoid `Any` unless you truly mean “anything goes.”
## When to use
Use this when:
- multiple implementations can satisfy one behavioral need
- callers naturally have more than one valid input shape
- you want strong editor and type-checker help at public boundaries
- internals are dynamic but the public contract is still stable
## When not to use
Do not use a protocol when a concrete type is the real contract.
Do not use broad unions just to avoid choosing a better API.
Do not over-trust `@runtime_checkable`: CPython is explicit that runtime protocol checks verify only attribute presence, not signature correctness.
## Preferred shapes
### Structural typing for capability-based contracts
```python
class Writer(Protocol):
def write(self, data: bytes) -> int: ...
```
If the caller only needs `write()`, do not require a specific base class.
### Explicit flexible public inputs
```python
URLInput = URL | str
```
This is better than either extreme:
- forcing callers to pre-wrap everything
- accepting `Any` and hoping for the best
## Counterexamples
### Inheritance-only abstraction
```python
class BaseStore:
...
def persist(store: BaseStore) -> None:
...
```
This is too rigid when the function only needs a small capability surface.
### Type surrender
```python
def send(data: Any, options: Any) -> Any:
...
```
The API contract disappeared.
### Runtime protocol overconfidence
If runtime safety matters, attribute-presence checks are not enough. Protocols do most of their work at static-analysis time.
## Source signals
### CPython / typing
- `Lib/typing.py:2132-2157` defines `Protocol` around structural subtyping and explicitly frames it as static duck typing.
- `Lib/typing.py:2155-2157` states that `@runtime_checkable` protocols check only attribute presence, ignoring type signatures.
- `Lib/typing.py:2190-2250` shows `Annotated` as “type plus metadata,” not a new underlying runtime type.
### HTTPX
- `httpx/_client.py:639-660` gives `Client` precise constructor types for auth, params, headers, cookies, timeouts, transports, and `base_url`.
- `httpx/_client.py:1353-1374` mirrors that precision on `AsyncClient` instead of collapsing to untyped arguments.
### Pydantic
- `pydantic/main.py:156-205` exposes typed `ClassVar[...]` metadata for config, fields, serializer, and validator state even though framework internals are dynamic.
- `pydantic/main.py:253-264` makes model construction validate `**data: Any` immediately instead of pretending arbitrary inputs are already safe.
- `pydantic/main.py:721-768` gives `model_validate(...) -> Self` an explicit typed boundary contract.
## Bottom line
Use typing to make public boundaries clearer.
Be flexible where callers need flexibility.
Be precise where contracts matter.
Do not hide uncertainty behind `Any`.