Initial extracted documentation set

2026-06-01 21:42:05 +00:00
commit 60ffec18e4
17 changed files with 1590 additions and 0 deletions
@@ -0,0 +1,117 @@
+# Async Boundaries
+
+Keep sync and async APIs as separate, explicit surfaces when their semantics differ.
+
+## Why
+
+Async stops being an implementation detail as soon as it changes:
+
+- how resources are acquired and released
+- whether methods must be awaited
+- which transport or session types are valid
+- how cancellation behaves
+- whether the caller needs an event loop
+
+Trying to hide that behind one magical API usually makes things worse. The common failure mode is a fake sync wrapper over async internals: it breaks inside existing event loops, hides resource lifetime, and muddies cancellation.
+
+Mature libraries usually accept the split and make it visible.
+
+## The pattern
+
+1. Expose separate sync and async entrypoints when semantics differ.
+2. Keep their shapes parallel where that helps learnability.
+3. Keep resource and transport types distinct.
+4. Make lifecycle visible with normal sync and async context management.
+5. Do not smuggle event-loop control into a sync-looking API.
+
+## When to use
+
+Use this when:
+
+- your library owns network, database, filesystem, or other long-lived resources
+- sync and async variants need different transport or session implementations
+- callers need predictable lifetime control
+- the library must work in many runtime environments
+
+## When not to use
+
+Do not split APIs just for symmetry.
+
+Skip the split when:
+
+- async adds no meaningful semantic difference
+- the operation is trivial and one-shot
+- a separate async API would mostly duplicate noise
+
+But if the alternative requires hidden `asyncio.run(...)`, loop-detection tricks, or silent runtime switching, the split is probably the cleaner design.
+
+## Preferred shape
+
+```python
+class Client:
+    def get(self, url: str) -> Response:
+        ...
+
+class AsyncClient:
+    async def get(self, url: str) -> Response:
+        ...
+```
+
+Make the surfaces parallel enough to learn once, but distinct enough that their lifecycle stays honest.
+
+## Why this works
+
+- callers immediately know whether `await` is involved
+- transport types stay correct
+- connection pooling and cleanup remain visible
+- cancellation behavior is not hidden behind sync-looking calls
+
+## Counterexamples
+
+### Fake sync wrapper over async internals
+
+```python
+def get(url: str) -> Response:
+    return asyncio.run(_async_get(url))
+```
+
+This breaks in environments that already have an event loop and hides lifecycle costs.
+
+### One class with mode flags
+
+```python
+client = Client(async_mode=True)
+```
+
+Now methods, cleanup, and caller expectations depend on ambient configuration instead of the type.
+
+### Shared transport types that are not really shared
+
+If one path needs `BaseTransport` and the other needs `AsyncBaseTransport`, pretending they are interchangeable is lying to the caller.
+
+## Source signals
+
+### HTTPX
+
+- `httpx/_client.py:594-661` defines `Client` as the sync entrypoint and documents that it can be shared between threads.
+- `httpx/_client.py:639-660` types the sync constructor in sync-native terms, including `BaseTransport`.
+- `httpx/_client.py:1275-1304` uses normal sync context management for lifecycle.
+- `httpx/_client.py:1307-1375` defines `AsyncClient` separately and documents task-sharing semantics.
+- `httpx/_client.py:1316-1318` shows async usage with `async with` and `await`.
+- `httpx/_client.py:1353-1374` keeps the constructor parallel but switches to async-native types like `AsyncBaseTransport`.
+- `httpx/_client.py:1445-1452` initializes `AsyncHTTPTransport` on the async path rather than pretending the sync transport is reusable.
+
+### SQLAlchemy
+
+- `examples/asyncio/async_orm.py:15-18` imports async-specific engine and session primitives.
+- `examples/asyncio/async_orm.py:61-67` creates an `async_sessionmaker(...)` and enters explicit async session and transaction scopes.
+- `examples/asyncio/async_orm.py:78-104` uses async-native query and commit methods.
+- `examples/inheritance/joined.py:16` imports the sync `Session` separately.
+- `examples/inheritance/joined.py:93-120` shows the corresponding sync session lifetime and explicit commit boundary.
+
+## Bottom line
+
+If sync and async usage have different semantics, give them different types.
+
+Parallel APIs are good.
+Pretending the difference is not there is not.
@@ -0,0 +1,140 @@
+# Data Models
+
+Treat data models as explicit boundary objects, and keep simple internal state in small value-like types.
+
+## Why
+
+A lot of messy Python comes from making one class do four jobs at once:
+
+- validate external input
+- represent internal state
+- serialize output
+- own persistence or business behavior
+
+Mature codebases usually split those concerns.
+
+The recurring pattern is:
+
+- use explicit boundary models for validated input and output
+- keep serialization explicit
+- use lightweight value-like classes for simple internal state
+- avoid turning every data container into a god object
+
+## The pattern
+
+1. Use dedicated boundary models where data enters or leaves the system.
+2. Validate and serialize explicitly.
+3. Use lightweight declarative classes for data-heavy, behavior-light state.
+4. Keep persistence, transport, and business logic from collapsing into one model.
+
+## When to use
+
+Use validated boundary models when:
+
+- data enters from HTTP, files, queues, or user input
+- you need a predictable serialization contract
+- callers need one clear place for validation and defaults
+
+Use small value-like objects when:
+
+- the object mainly carries state
+- immutability helps reasoning
+- behavior is narrow and local
+
+## When not to use
+
+Do not make every internal object a heavyweight validation model.
+
+Do not scatter custom `to_dict()` logic across the codebase.
+
+Do not let one model become schema, ORM wrapper, validator, service object, and side-effect manager at the same time.
+
+## Preferred shapes
+
+### Explicit boundary model
+
+```python
+class UserIn(BaseModel):
+    email: str
+    name: str
+
+user = UserIn.model_validate(payload)
+serialized = user.model_dump()
+```
+
+Why this works:
+
+- validation is explicit
+- serialization is explicit
+- the model’s job is clear
+
+### Small value-like internal class
+
+```python
+@dataclass(frozen=True)
+class Duration:
+    start: Instant
+    stop: Instant
+```
+
+Why this works:
+
+- state is simple
+- mutation is constrained
+- the class stays easy to trust
+
+## Counterexamples
+
+### Hand-rolled serialization everywhere
+
+```python
+class User:
+    def to_dict(self):
+        return {
+            "id": str(self.id),
+            "name": self.name,
+            "created_at": self.created_at.isoformat(),
+        }
+```
+
+Every model now invents its own boundary behavior.
+
+### One class owns every concern
+
+```python
+class OrderModel:
+    def validate(self): ...
+    def save(self): ...
+    def send_webhook(self): ...
+    def render_html(self): ...
+```
+
+That is not a model. That is a junk drawer.
+
+## Source signals
+
+### Pydantic
+
+- `pydantic/main.py:253-264` says `BaseModel.__init__` parses and validates input data and raises `ValidationError` on bad input.
+- `pydantic/main.py:455-519` defines `model_dump(...)` as an explicit serialization step with caller-controlled include/exclude behavior.
+- `pydantic/main.py:721-768` defines `model_validate(...) -> Self` as a named boundary-crossing API.
+- `docs/index.md:82-107` pairs model creation with `model_dump()` instead of treating instances as already wire-ready.
+- `docs/index.md:109-124` shows invalid external input failing loudly with `ValidationError`.
+
+### Attrs
+
+- `docs/examples.md:24-44` shows `@define` creating lightweight typed classes with generated constructor, repr, and equality behavior.
+- `docs/examples.md:143-205` uses keyword-only fields to keep construction explicit at the call site.
+- `docs/examples.md:209-220` uses `asdict(...)` as an intentional conversion step.
+
+### Pytest
+
+- `src/_pytest/timing.py:24-64` models `Instant` and `Duration` as frozen dataclasses for simple internal timing state.
+
+## Bottom line
+
+Boundary models should validate and serialize cleanly.
+
+Internal models should stay small and honest.
+
+If one model starts owning every concern in the system, split it before it turns to mud.
@@ -0,0 +1,140 @@
+# Error Handling
+
+Use exception types to encode what callers can do next.
+
+## Why
+
+Good Python libraries do not collapse every failure into `RuntimeError` or `Exception`. They shape errors around recovery boundaries:
+
+- one base type for “something in this subsystem failed”
+- narrower subtypes when callers need different recovery
+- structured fields when the branch depends on data, not wording
+
+That gives callers a clean ladder:
+
+- catch broadly at subsystem boundaries
+- catch narrowly when retry/report/ignore differs
+- surface better API or CLI errors without parsing strings
+
+## The pattern
+
+1. Define a subsystem-level base exception.
+2. Add subtypes only when callers need different handling.
+3. Put structured context on the exception when branching depends on it.
+4. Translate internal failures at API, CLI, or transport boundaries.
+
+## When to use
+
+Use this when:
+
+- a module or library exposes a public API
+- failure modes need different handling
+- an outer boundary must turn internal failures into user-facing errors
+- retry, ignore, and abort decisions differ by failure kind
+
+## When not to use
+
+Do not build a hierarchy when:
+
+- the code is tiny and has one obvious failure mode
+- every failure is handled the same way
+- the only distinction is wording, not behavior
+- a normal return value like `None` is already the contract
+
+Do not make callers parse exception text. If the distinction matters, make it a type or a field.
+
+## Good shape
+
+```python
+class MailError(Exception):
+    pass
+
+class TemporaryMailError(MailError):
+    pass
+
+class PermanentMailError(MailError):
+    pass
+
+class MailRejected(PermanentMailError):
+    def __init__(self, code: int, reason: str) -> None:
+        super().__init__(reason)
+        self.code = code
+        self.reason = reason
+```
+
+Caller:
+
+```python
+try:
+    send_mail(message)
+except TemporaryMailError:
+    retry_later(message)
+except PermanentMailError as exc:
+    mark_failed(message, reason=str(exc))
+```
+
+## Counterexamples
+
+### Stringly-typed branching
+
+```python
+try:
+    do_work()
+except Exception as exc:
+    if "timeout" in str(exc).lower():
+        retry()
+```
+
+The recovery rule is hiding in fragile text matching.
+
+### One catch-all with no domain meaning
+
+```python
+class AppError(Exception):
+    pass
+
+raise AppError("not found")
+raise AppError("permission denied")
+raise AppError("timeout")
+```
+
+Callers cannot branch meaningfully.
+
+### Boundary types leaking into the core
+
+```python
+from fastapi import HTTPException
+
+def charge_card(card: Card) -> Receipt:
+    if card.expired:
+        raise HTTPException(status_code=400, detail="expired card")
+```
+
+This couples domain logic to one transport. Raise a domain error here; translate to HTTP at the boundary.
+
+## Source signals
+
+### Stdlib / CPython
+
+- `Lib/smtplib.py:69-71` defines `SMTPException` as the module-wide base type.
+- `Lib/smtplib.py:88-100` defines `SMTPResponseException` and stores structured fields on the exception itself: `smtp_code` and `smtp_error`.
+- `Lib/smtplib.py:102-125` adds subtype-specific payload like `sender` on `SMTPSenderRefused` and `recipients` on `SMTPRecipientsRefused`.
+
+### HTTPX
+
+- `httpx/_exceptions.py:74-90` defines `HTTPError` as a broad catch point and explicitly documents it as useful around request + `raise_for_status()` flows.
+- `httpx/_exceptions.py:107-125` narrows that into `RequestError` and `TransportError` for request-time failures.
+- `httpx/_exceptions.py:132-160` further splits timeout handling into `ConnectTimeout`, `ReadTimeout`, `WriteTimeout`, and `PoolTimeout`.
+
+### Click
+
+- `src/click/exceptions.py:35-65` defines `ClickException` with behavior, not just categorization: an exit code and a `show()` renderer.
+- `src/click/exceptions.py:68-111` makes `UsageError` a narrower subtype with a different exit code and help-aware output.
+
+## Bottom line
+
+If callers need different behavior, give them different exception types.
+
+If callers need details, attach fields.
+
+If an outer layer needs user-facing output, translate there instead of pushing boundary concerns through the whole codebase.
@@ -0,0 +1,127 @@
+# Module Design
+
+Design a small, stable public surface. Keep implementation modules movable behind it.
+
+## Why
+
+Python makes it easy to publish internals by accident:
+
+- every file is importable
+- helpers become de facto API once users depend on them
+- refactors turn into breaking changes when file layout becomes the contract
+
+Mature libraries push back on that. They usually:
+
+- publish one obvious import surface
+- re-export supported names deliberately
+- keep internal modules non-authoritative
+- use `__all__` when the boundary needs to be explicit
+
+That buys two things:
+
+- simpler imports for callers
+- freedom to reorganize internals later
+
+## The pattern
+
+1. Decide what callers should import.
+2. Re-export those names from the package boundary.
+3. Keep implementation details in internal modules.
+4. Use `__all__` when you want an explicit contract.
+5. Treat internal file layout as private unless you intentionally publish it.
+
+## When to use
+
+Use this when:
+
+- a package spans multiple modules
+- internals will evolve faster than the public API
+- you want callers thinking in domain concepts, not filenames
+- compatibility matters across releases
+
+## When not to use
+
+Do not build a facade when:
+
+- the package is tiny and direct imports are already clear
+- the abstraction boundary is still moving fast
+- re-exporting would turn `__init__.py` into a junk drawer
+
+A curated surface is not the same as a flat surface. Keep structure where the concepts are meaningfully different.
+
+## Preferred shapes
+
+### Package facade over internal modules
+
+```python
+# mypkg/__init__.py
+from ._client import Client
+from ._errors import AppError
+from ._models import Item
+
+__all__ = ["Client", "AppError", "Item"]
+```
+
+Why this works:
+
+- callers learn one stable import path
+- internals can move without import churn
+- the package advertises its real contract
+
+### Explicit module contract
+
+```python
+__all__ = ["Number", "Complex", "Real", "Rational", "Integral"]
+```
+
+This says: these names are supported; everything else is implementation detail.
+
+## Counterexamples
+
+### File layout becomes the API by accident
+
+```python
+from mypkg.utils import helper_a
+from mypkg.impl_v2 import thing
+from mypkg.more_helpers import other_thing
+```
+
+Refactoring internal modules now breaks users.
+
+### Everything dumped into `__init__.py`
+
+If `__init__.py` exports fifty unrelated names, you did not create a clean facade. You created autocomplete noise.
+
+### Public API mirrors the folder tree too literally
+
+If callers need to know today’s internal layout to use the library, the boundary is still underdesigned.
+
+## Source signals
+
+### CPython
+
+- `Lib/numbers.py:8-23` warns that published ABC APIs are hard to change and should be designed carefully.
+- `Lib/numbers.py:35` publishes a narrow `__all__` rather than treating every helper as public.
+- `Lib/operator.py:13-15` and `Lib/smtplib.py:55-58` do the same in stdlib modules with mixed public/internal names.
+
+### Pytest
+
+- `src/pytest/__init__.py:6-80` builds the top-level `pytest` API by importing from many `_pytest.*` internals.
+- `src/pytest/__init__.py:98-186` then pins that facade with an explicit `__all__`.
+
+### HTTPX
+
+- `httpx/__init__.py:1-12` re-exports the package surface from internal modules such as `._client`, `._exceptions`, and `._models`.
+- `httpx/__init__.py:29-100` defines the supported top-level export list explicitly.
+- `httpx/__init__.py:103-106` rewrites exported objects’ `__module__` to `httpx`, reinforcing the facade instead of leaking internal filenames.
+
+### Click
+
+- `src/click/__init__.py:10-75` exposes the package through re-exports.
+- `src/click/__init__.py:77-126` keeps compatibility shims and deprecations at the boundary instead of freezing old internal layout forever.
+
+## Bottom line
+
+Make the public API intentional.
+
+Callers should depend on your concepts, not your current file tree.
@@ -0,0 +1,137 @@
+# Testing
+
+Use fixtures for reusable resource setup, parametrization for behavior matrices, and explicit boundary seams instead of ad hoc mocking.
+
+## Why
+
+Good Python tests optimize for three things at once:
+
+- local readability
+- cheap variation across inputs and modes
+- reusable setup and cleanup without hiding intent
+
+The mature pattern is not just “use pytest.” It is:
+
+- model resources with fixtures
+- make fixture lifetime visible with `yield` when cleanup matters
+- use parametrization when one behavior should hold across several inputs
+- test through boundary seams like transports instead of patching internals blindly
+
+## The pattern
+
+1. Use fixtures for shared setup and resources.
+2. Use `yield` fixtures when setup and teardown both matter.
+3. Use parametrization when the assertion shape is the same but inputs vary.
+4. Prefer explicit seams over invasive mocking.
+5. Keep the test body focused on behavior, not scaffolding.
+
+## When to use
+
+Use fixtures when:
+
+- multiple tests need the same resource wiring
+- setup or cleanup would otherwise dominate the body
+- the setup is a dependency, not the behavior under test
+
+Use parametrization when:
+
+- one behavior should hold across several inputs or modes
+- the data varies but the test story stays the same
+
+Use transport or injected seams when:
+
+- the behavior crosses I/O boundaries
+- you want realistic flow without spinning up the whole world
+
+## When not to use
+
+Do not hide essential behavior behind a giant fixture tower.
+
+Do not parametrize cases that deserve different narratives or different assertions.
+
+Do not call fixtures directly like helper functions; if you want a helper, write a helper.
+
+Do not mock deep internals when a cleaner external seam exists.
+
+## Preferred shapes
+
+### Yield fixture for lifecycle
+
+```python
+@pytest.fixture
+def resource():
+    obj = make_resource()
+    yield obj
+    obj.close()
+```
+
+This keeps setup and teardown obvious.
+
+### Parametrization for behavior matrices
+
+```python
+@pytest.mark.parametrize("mode", ["prepend", "append", "importlib"])
+def test_import_behavior(mode: str) -> None:
+    ...
+```
+
+One behavior, several inputs, no duplicated body.
+
+### Boundary seam instead of monkeypatch soup
+
+```python
+transport = httpx.MockTransport(handler)
+client = httpx.Client(transport=transport)
+```
+
+This is usually cleaner than patching internals in three places.
+
+## Counterexamples
+
+### Repeated setup in every test
+
+```python
+def test_a():
+    client = make_client()
+    tmpdir = make_tmpdir()
+    seed_db()
+
+
+def test_b():
+    client = make_client()
+    tmpdir = make_tmpdir()
+    seed_db()
+```
+
+The scaffolding overwhelms the behavior.
+
+### Fixture tower opacity
+
+If understanding the test requires opening six fixtures before reading one assertion, the abstraction has gone too far.
+
+### Calling fixtures directly
+
+Pytest explicitly rejects this because fixtures are injected dependencies, not disguised utility functions.
+
+## Source signals
+
+### Pytest
+
+- `src/_pytest/fixtures.py:1378-1440` makes the contract explicit twice: calling a fixture directly is an error, and `yield` fixtures run teardown code after the test outcome.
+- `testing/test_threadexception.py:84-91` shows a real `yield` fixture with post-test cleanup work after the `yield`.
+- `testing/acceptance_test.py:158-169` uses `@pytest.mark.parametrize(...)` to check one behavior across multiple import modes without cloning the test body.
+- `testing/acceptance_test.py:561-574` shows another compact parametrized case where only the example data changes.
+
+### HTTPX
+
+- `httpx/_transports/asgi.py:63-83` exposes `ASGITransport` as an in-process integration seam and even documents `raise_app_exceptions=False` for testing 500 responses.
+- `httpx/_transports/mock.py:15-43` exposes `MockTransport` as a first-class request/response seam for tests.
+- `httpx/_client.py:639-660` accepts `transport=` directly on `Client`, which is what makes transport substitution a normal testing path instead of a hack.
+
+## Bottom line
+
+A good Python test makes the behavior easy to see and the environment cheap to vary.
+
+Use fixtures for lifetime.
+Use parametrization for variation.
+Use explicit seams instead of brittle patching.
@@ -0,0 +1,119 @@
+# Typing
+
+Use types to describe accepted shapes and behavioral contracts, not to pretend Python is a different language.
+
+## Why
+
+Good Python typing improves APIs in two ways:
+
+- callers can see which shapes are accepted
+- maintainers can preserve real boundaries without smearing `Any` everywhere
+
+The mature pattern is not “make everything maximally abstract.” It is:
+
+- use structural typing when capability matters more than inheritance
+- use explicit aliases and unions for ergonomic public inputs
+- keep public APIs typed even when internals stay dynamic
+
+That gives users a real contract without freezing implementation choices too early.
+
+## The pattern
+
+1. Type public APIs precisely.
+2. Prefer `Protocol` when callers care about behavior, not ancestry.
+3. Use explicit unions and aliases for user-facing flexibility.
+4. Keep dynamic internals from leaking into the public contract.
+5. Avoid `Any` unless you truly mean “anything goes.”
+
+## When to use
+
+Use this when:
+
+- multiple implementations can satisfy one behavioral need
+- callers naturally have more than one valid input shape
+- you want strong editor and type-checker help at public boundaries
+- internals are dynamic but the public contract is still stable
+
+## When not to use
+
+Do not use a protocol when a concrete type is the real contract.
+
+Do not use broad unions just to avoid choosing a better API.
+
+Do not over-trust `@runtime_checkable`: CPython is explicit that runtime protocol checks verify only attribute presence, not signature correctness.
+
+## Preferred shapes
+
+### Structural typing for capability-based contracts
+
+```python
+class Writer(Protocol):
+    def write(self, data: bytes) -> int: ...
+```
+
+If the caller only needs `write()`, do not require a specific base class.
+
+### Explicit flexible public inputs
+
+```python
+URLInput = URL | str
+```
+
+This is better than either extreme:
+
+- forcing callers to pre-wrap everything
+- accepting `Any` and hoping for the best
+
+## Counterexamples
+
+### Inheritance-only abstraction
+
+```python
+class BaseStore:
+    ...
+
+def persist(store: BaseStore) -> None:
+    ...
+```
+
+This is too rigid when the function only needs a small capability surface.
+
+### Type surrender
+
+```python
+def send(data: Any, options: Any) -> Any:
+    ...
+```
+
+The API contract disappeared.
+
+### Runtime protocol overconfidence
+
+If runtime safety matters, attribute-presence checks are not enough. Protocols do most of their work at static-analysis time.
+
+## Source signals
+
+### CPython / typing
+
+- `Lib/typing.py:2132-2157` defines `Protocol` around structural subtyping and explicitly frames it as static duck typing.
+- `Lib/typing.py:2155-2157` states that `@runtime_checkable` protocols check only attribute presence, ignoring type signatures.
+- `Lib/typing.py:2190-2250` shows `Annotated` as “type plus metadata,” not a new underlying runtime type.
+
+### HTTPX
+
+- `httpx/_client.py:639-660` gives `Client` precise constructor types for auth, params, headers, cookies, timeouts, transports, and `base_url`.
+- `httpx/_client.py:1353-1374` mirrors that precision on `AsyncClient` instead of collapsing to untyped arguments.
+
+### Pydantic
+
+- `pydantic/main.py:156-205` exposes typed `ClassVar[...]` metadata for config, fields, serializer, and validator state even though framework internals are dynamic.
+- `pydantic/main.py:253-264` makes model construction validate `**data: Any` immediately instead of pretending arbitrary inputs are already safe.
+- `pydantic/main.py:721-768` gives `model_validate(...) -> Self` an explicit typed boundary contract.
+
+## Bottom line
+
+Use typing to make public boundaries clearer.
+
+Be flexible where callers need flexibility.
+Be precise where contracts matter.
+Do not hide uncertainty behind `Any`.