commit 60ffec18e4cc6b04029256846e66b1e48592823e Author: Rodin Date: Mon Jun 1 21:42:05 2026 +0000 Initial extracted documentation set diff --git a/PROCESS.md b/PROCESS.md new file mode 100644 index 0000000..0c99a2c --- /dev/null +++ b/PROCESS.md @@ -0,0 +1,149 @@ +# Python Patterns Process + +This file documents the workflow used to build and refine this repo so the extraction can be repeated without guesswork. + +## Goal + +Turn repeated patterns from mature Python codebases into concise, prescriptive docs with verifiable citations. + +## Scope split + +Keep this repo at the **language/library design** level. + +Good fits: +- module/public-surface design +- exception design +- sync vs async API boundaries +- typing and protocol design +- data-model design +- Python testing patterns + +Push framework/service-boundary concerns into a separate repo instead of muddying this one. + +## Upstream selection rule + +Choose mature repos that expose different parts of the problem space. + +Current first-wave set: +- `python/cpython` +- `encode/httpx` +- `pytest-dev/pytest` +- `pydantic/pydantic` + +Current refinement set: +- `python-attrs/attrs` +- `pallets/click` +- `sqlalchemy/sqlalchemy` + +Selection criteria: +- respected and maintained +- stable public APIs +- enough repetition to support non-hand-wavy conclusions +- examples/tests/internal structure that reveal tradeoffs, not just happy paths + +## Directory contract + +- `sources/` = raw evidence notes, one file per upstream repo +- `patterns/` = synthesized guidance from repeated evidence +- `smells/` = anti-patterns derived from the same evidence base + +## Step-by-step workflow + +### 1) Split the problem cleanly +Do not mix Python-wide guidance with FastAPI/service conventions in one repo. + +### 2) Clone or otherwise make upstream sources available locally +Work from local checkouts so citations can be verified quickly. + +### 3) Write source notes first +For each upstream repo, create `sources/.md` with: +- why this repo is useful +- repeated patterns +- caveats/counterexamples +- exact `file:line` citations +- pattern candidates supported by the evidence + +Important: source notes are not polished guidance. They are reusable evidence. + +### 4) Synthesize only after evidence exists +Turn strong repeated signals into `patterns/*.md` docs. + +Each pattern doc should usually include: +- what the pattern is +- why it exists +- when to use it +- when not to use it +- preferred shapes +- counterexamples +- source signals/citations + +### 5) Mine anti-patterns explicitly +After the positive patterns are stable enough, write smell docs by inverting the evidence: +- broad catch-all exceptions with no recovery meaning +- hidden resource lifetime +- accidental public APIs +- fake sync wrappers over async code +- overuse of `Any` +- mixed transport/business/persistence responsibilities in one model + +### 6) Run a refinement wave before broadening source coverage +Once the main docs exist, do **not** immediately add more repos. + +Instead: +- improve synthesized docs in fresh contexts +- tighten weak citations +- rewrite source-note files to be denser and more reusable +- reduce duplicated guidance across docs + +That was the right move for this repo after the first bootstrap wave. + +## Fresh-context refinement pattern + +A good refinement split is: +- one fresh pass over `patterns/*.md` +- one fresh pass over `sources/*.md` +- one fresh pass doing citation audit and anti-vagueness cleanup + +The point is to avoid simply echoing earlier wording. + +## Review checklist + +### For source notes +- Does the file separate repeated patterns from one-off examples? +- Are caveats preserved? +- Are citations exact and easy to re-check? +- Does it avoid vague claims like “mature code usually...” unless evidence is shown? + +### For pattern docs +- Does each rule follow from repeated evidence? +- Is the guidance prescriptive without pretending there were no tradeoffs? +- Are counterexamples concrete? +- Is the doc still Python-level rather than framework-specific? + +## Local git workflow used here + +When the repo is ready for human review: +1. initialize a local git repo +2. stage the current documentation set +3. create a single initial commit so review has a stable baseline + +This repo intentionally avoids pushing or creating remotes unless explicitly requested. + +## How to repeat this process next time + +1. Define the scope split first. +2. Pick a small, high-signal upstream set. +3. Build `sources/` before `patterns/`. +4. Synthesize only the strongest topics first. +5. Add smell docs after positive patterns exist. +6. Run a fresh-context refinement wave. +7. Initialize git only once the repo is reviewable. + +## What to avoid + +- writing docs from memory +- mixing Python and framework guidance together +- broadening source coverage before tightening weak docs +- flattening caveats away during synthesis +- leaving citations too vague to verify quickly +- treating source notes as prose polish instead of evidence storage diff --git a/README.md b/README.md new file mode 100644 index 0000000..5a78255 --- /dev/null +++ b/README.md @@ -0,0 +1,71 @@ +# Python Patterns + +**Prescriptive.** Follow these when writing Python code. + +This repo captures reusable Python patterns extracted from mature upstream codebases, then turns them into concise guidance with citations. + +A good pattern doc here includes: +- **Why** — the reasoning, not just the rule +- **When to use** — the trigger conditions +- **When NOT to use** — where the pattern causes harm +- **Preferred shapes** — examples of the intended form +- **Counterexamples** — what to avoid and why +- **Source citations** — verified `file:line` anchors from real codebases + +These docs should be derived from what strong Python codebases actually do, not from generic style-blog advice. + +## Structure + +- `patterns/` — what to do +- `smells/` — what to avoid +- `sources/` — extracted source-study notes and upstream references +- `PROCESS.md` — the repeatable extraction/refinement workflow used to build this repo + +## Current source base + +Primary upstreams mined so far: +- `python/cpython` +- `encode/httpx` +- `pytest-dev/pytest` +- `pydantic/pydantic` +- `python-attrs/attrs` +- `pallets/click` +- `sqlalchemy/sqlalchemy` + +Why this mix works: +- CPython: API boundaries, exceptions, context managers, typing surface design +- HTTPX: package facade design, sync/async split, transport seams, error taxonomy +- pytest: fixture lifetime, parametrization, test ergonomics +- Pydantic: validation and serialization boundaries +- attrs: lightweight value-object/data-model patterns +- Click: CLI-facing exception and public-surface patterns +- SQLAlchemy: explicit persistence/session lifetime and sync-vs-async caveats + +## Current pattern set + +- `patterns/module-design.md` +- `patterns/typing.md` +- `patterns/error-handling.md` +- `patterns/async-boundaries.md` +- `patterns/testing.md` +- `patterns/data-models.md` +- `smells/common-mistakes.md` + +## Reviewing this repo + +Recommended review order: +1. `README.md` +2. `PROCESS.md` +3. `sources/*.md` for evidence quality +4. `patterns/*.md` for synthesis quality +5. `smells/*.md` for anti-pattern coverage + +Questions to ask during review: +- Is each claim grounded in repeated upstream evidence? +- Are caveats preserved instead of flattened away? +- Does the pattern stay at the Python level rather than drifting into framework guidance? +- Are citations specific enough to be re-checked quickly? + +## Extraction rule + +Do not write pattern docs from memory. First collect repeated source examples with `file:line` citations, then synthesize the rule. \ No newline at end of file diff --git a/patterns/async-boundaries.md b/patterns/async-boundaries.md new file mode 100644 index 0000000..4cc7154 --- /dev/null +++ b/patterns/async-boundaries.md @@ -0,0 +1,117 @@ +# Async Boundaries + +Keep sync and async APIs as separate, explicit surfaces when their semantics differ. + +## Why + +Async stops being an implementation detail as soon as it changes: + +- how resources are acquired and released +- whether methods must be awaited +- which transport or session types are valid +- how cancellation behaves +- whether the caller needs an event loop + +Trying to hide that behind one magical API usually makes things worse. The common failure mode is a fake sync wrapper over async internals: it breaks inside existing event loops, hides resource lifetime, and muddies cancellation. + +Mature libraries usually accept the split and make it visible. + +## The pattern + +1. Expose separate sync and async entrypoints when semantics differ. +2. Keep their shapes parallel where that helps learnability. +3. Keep resource and transport types distinct. +4. Make lifecycle visible with normal sync and async context management. +5. Do not smuggle event-loop control into a sync-looking API. + +## When to use + +Use this when: + +- your library owns network, database, filesystem, or other long-lived resources +- sync and async variants need different transport or session implementations +- callers need predictable lifetime control +- the library must work in many runtime environments + +## When not to use + +Do not split APIs just for symmetry. + +Skip the split when: + +- async adds no meaningful semantic difference +- the operation is trivial and one-shot +- a separate async API would mostly duplicate noise + +But if the alternative requires hidden `asyncio.run(...)`, loop-detection tricks, or silent runtime switching, the split is probably the cleaner design. + +## Preferred shape + +```python +class Client: + def get(self, url: str) -> Response: + ... + +class AsyncClient: + async def get(self, url: str) -> Response: + ... +``` + +Make the surfaces parallel enough to learn once, but distinct enough that their lifecycle stays honest. + +## Why this works + +- callers immediately know whether `await` is involved +- transport types stay correct +- connection pooling and cleanup remain visible +- cancellation behavior is not hidden behind sync-looking calls + +## Counterexamples + +### Fake sync wrapper over async internals + +```python +def get(url: str) -> Response: + return asyncio.run(_async_get(url)) +``` + +This breaks in environments that already have an event loop and hides lifecycle costs. + +### One class with mode flags + +```python +client = Client(async_mode=True) +``` + +Now methods, cleanup, and caller expectations depend on ambient configuration instead of the type. + +### Shared transport types that are not really shared + +If one path needs `BaseTransport` and the other needs `AsyncBaseTransport`, pretending they are interchangeable is lying to the caller. + +## Source signals + +### HTTPX + +- `httpx/_client.py:594-661` defines `Client` as the sync entrypoint and documents that it can be shared between threads. +- `httpx/_client.py:639-660` types the sync constructor in sync-native terms, including `BaseTransport`. +- `httpx/_client.py:1275-1304` uses normal sync context management for lifecycle. +- `httpx/_client.py:1307-1375` defines `AsyncClient` separately and documents task-sharing semantics. +- `httpx/_client.py:1316-1318` shows async usage with `async with` and `await`. +- `httpx/_client.py:1353-1374` keeps the constructor parallel but switches to async-native types like `AsyncBaseTransport`. +- `httpx/_client.py:1445-1452` initializes `AsyncHTTPTransport` on the async path rather than pretending the sync transport is reusable. + +### SQLAlchemy + +- `examples/asyncio/async_orm.py:15-18` imports async-specific engine and session primitives. +- `examples/asyncio/async_orm.py:61-67` creates an `async_sessionmaker(...)` and enters explicit async session and transaction scopes. +- `examples/asyncio/async_orm.py:78-104` uses async-native query and commit methods. +- `examples/inheritance/joined.py:16` imports the sync `Session` separately. +- `examples/inheritance/joined.py:93-120` shows the corresponding sync session lifetime and explicit commit boundary. + +## Bottom line + +If sync and async usage have different semantics, give them different types. + +Parallel APIs are good. +Pretending the difference is not there is not. diff --git a/patterns/data-models.md b/patterns/data-models.md new file mode 100644 index 0000000..1fb5664 --- /dev/null +++ b/patterns/data-models.md @@ -0,0 +1,140 @@ +# Data Models + +Treat data models as explicit boundary objects, and keep simple internal state in small value-like types. + +## Why + +A lot of messy Python comes from making one class do four jobs at once: + +- validate external input +- represent internal state +- serialize output +- own persistence or business behavior + +Mature codebases usually split those concerns. + +The recurring pattern is: + +- use explicit boundary models for validated input and output +- keep serialization explicit +- use lightweight value-like classes for simple internal state +- avoid turning every data container into a god object + +## The pattern + +1. Use dedicated boundary models where data enters or leaves the system. +2. Validate and serialize explicitly. +3. Use lightweight declarative classes for data-heavy, behavior-light state. +4. Keep persistence, transport, and business logic from collapsing into one model. + +## When to use + +Use validated boundary models when: + +- data enters from HTTP, files, queues, or user input +- you need a predictable serialization contract +- callers need one clear place for validation and defaults + +Use small value-like objects when: + +- the object mainly carries state +- immutability helps reasoning +- behavior is narrow and local + +## When not to use + +Do not make every internal object a heavyweight validation model. + +Do not scatter custom `to_dict()` logic across the codebase. + +Do not let one model become schema, ORM wrapper, validator, service object, and side-effect manager at the same time. + +## Preferred shapes + +### Explicit boundary model + +```python +class UserIn(BaseModel): + email: str + name: str + +user = UserIn.model_validate(payload) +serialized = user.model_dump() +``` + +Why this works: + +- validation is explicit +- serialization is explicit +- the model’s job is clear + +### Small value-like internal class + +```python +@dataclass(frozen=True) +class Duration: + start: Instant + stop: Instant +``` + +Why this works: + +- state is simple +- mutation is constrained +- the class stays easy to trust + +## Counterexamples + +### Hand-rolled serialization everywhere + +```python +class User: + def to_dict(self): + return { + "id": str(self.id), + "name": self.name, + "created_at": self.created_at.isoformat(), + } +``` + +Every model now invents its own boundary behavior. + +### One class owns every concern + +```python +class OrderModel: + def validate(self): ... + def save(self): ... + def send_webhook(self): ... + def render_html(self): ... +``` + +That is not a model. That is a junk drawer. + +## Source signals + +### Pydantic + +- `pydantic/main.py:253-264` says `BaseModel.__init__` parses and validates input data and raises `ValidationError` on bad input. +- `pydantic/main.py:455-519` defines `model_dump(...)` as an explicit serialization step with caller-controlled include/exclude behavior. +- `pydantic/main.py:721-768` defines `model_validate(...) -> Self` as a named boundary-crossing API. +- `docs/index.md:82-107` pairs model creation with `model_dump()` instead of treating instances as already wire-ready. +- `docs/index.md:109-124` shows invalid external input failing loudly with `ValidationError`. + +### Attrs + +- `docs/examples.md:24-44` shows `@define` creating lightweight typed classes with generated constructor, repr, and equality behavior. +- `docs/examples.md:143-205` uses keyword-only fields to keep construction explicit at the call site. +- `docs/examples.md:209-220` uses `asdict(...)` as an intentional conversion step. + +### Pytest + +- `src/_pytest/timing.py:24-64` models `Instant` and `Duration` as frozen dataclasses for simple internal timing state. + +## Bottom line + +Boundary models should validate and serialize cleanly. + +Internal models should stay small and honest. + +If one model starts owning every concern in the system, split it before it turns to mud. diff --git a/patterns/error-handling.md b/patterns/error-handling.md new file mode 100644 index 0000000..b196eeb --- /dev/null +++ b/patterns/error-handling.md @@ -0,0 +1,140 @@ +# Error Handling + +Use exception types to encode what callers can do next. + +## Why + +Good Python libraries do not collapse every failure into `RuntimeError` or `Exception`. They shape errors around recovery boundaries: + +- one base type for “something in this subsystem failed” +- narrower subtypes when callers need different recovery +- structured fields when the branch depends on data, not wording + +That gives callers a clean ladder: + +- catch broadly at subsystem boundaries +- catch narrowly when retry/report/ignore differs +- surface better API or CLI errors without parsing strings + +## The pattern + +1. Define a subsystem-level base exception. +2. Add subtypes only when callers need different handling. +3. Put structured context on the exception when branching depends on it. +4. Translate internal failures at API, CLI, or transport boundaries. + +## When to use + +Use this when: + +- a module or library exposes a public API +- failure modes need different handling +- an outer boundary must turn internal failures into user-facing errors +- retry, ignore, and abort decisions differ by failure kind + +## When not to use + +Do not build a hierarchy when: + +- the code is tiny and has one obvious failure mode +- every failure is handled the same way +- the only distinction is wording, not behavior +- a normal return value like `None` is already the contract + +Do not make callers parse exception text. If the distinction matters, make it a type or a field. + +## Good shape + +```python +class MailError(Exception): + pass + +class TemporaryMailError(MailError): + pass + +class PermanentMailError(MailError): + pass + +class MailRejected(PermanentMailError): + def __init__(self, code: int, reason: str) -> None: + super().__init__(reason) + self.code = code + self.reason = reason +``` + +Caller: + +```python +try: + send_mail(message) +except TemporaryMailError: + retry_later(message) +except PermanentMailError as exc: + mark_failed(message, reason=str(exc)) +``` + +## Counterexamples + +### Stringly-typed branching + +```python +try: + do_work() +except Exception as exc: + if "timeout" in str(exc).lower(): + retry() +``` + +The recovery rule is hiding in fragile text matching. + +### One catch-all with no domain meaning + +```python +class AppError(Exception): + pass + +raise AppError("not found") +raise AppError("permission denied") +raise AppError("timeout") +``` + +Callers cannot branch meaningfully. + +### Boundary types leaking into the core + +```python +from fastapi import HTTPException + +def charge_card(card: Card) -> Receipt: + if card.expired: + raise HTTPException(status_code=400, detail="expired card") +``` + +This couples domain logic to one transport. Raise a domain error here; translate to HTTP at the boundary. + +## Source signals + +### Stdlib / CPython + +- `Lib/smtplib.py:69-71` defines `SMTPException` as the module-wide base type. +- `Lib/smtplib.py:88-100` defines `SMTPResponseException` and stores structured fields on the exception itself: `smtp_code` and `smtp_error`. +- `Lib/smtplib.py:102-125` adds subtype-specific payload like `sender` on `SMTPSenderRefused` and `recipients` on `SMTPRecipientsRefused`. + +### HTTPX + +- `httpx/_exceptions.py:74-90` defines `HTTPError` as a broad catch point and explicitly documents it as useful around request + `raise_for_status()` flows. +- `httpx/_exceptions.py:107-125` narrows that into `RequestError` and `TransportError` for request-time failures. +- `httpx/_exceptions.py:132-160` further splits timeout handling into `ConnectTimeout`, `ReadTimeout`, `WriteTimeout`, and `PoolTimeout`. + +### Click + +- `src/click/exceptions.py:35-65` defines `ClickException` with behavior, not just categorization: an exit code and a `show()` renderer. +- `src/click/exceptions.py:68-111` makes `UsageError` a narrower subtype with a different exit code and help-aware output. + +## Bottom line + +If callers need different behavior, give them different exception types. + +If callers need details, attach fields. + +If an outer layer needs user-facing output, translate there instead of pushing boundary concerns through the whole codebase. diff --git a/patterns/module-design.md b/patterns/module-design.md new file mode 100644 index 0000000..cd87941 --- /dev/null +++ b/patterns/module-design.md @@ -0,0 +1,127 @@ +# Module Design + +Design a small, stable public surface. Keep implementation modules movable behind it. + +## Why + +Python makes it easy to publish internals by accident: + +- every file is importable +- helpers become de facto API once users depend on them +- refactors turn into breaking changes when file layout becomes the contract + +Mature libraries push back on that. They usually: + +- publish one obvious import surface +- re-export supported names deliberately +- keep internal modules non-authoritative +- use `__all__` when the boundary needs to be explicit + +That buys two things: + +- simpler imports for callers +- freedom to reorganize internals later + +## The pattern + +1. Decide what callers should import. +2. Re-export those names from the package boundary. +3. Keep implementation details in internal modules. +4. Use `__all__` when you want an explicit contract. +5. Treat internal file layout as private unless you intentionally publish it. + +## When to use + +Use this when: + +- a package spans multiple modules +- internals will evolve faster than the public API +- you want callers thinking in domain concepts, not filenames +- compatibility matters across releases + +## When not to use + +Do not build a facade when: + +- the package is tiny and direct imports are already clear +- the abstraction boundary is still moving fast +- re-exporting would turn `__init__.py` into a junk drawer + +A curated surface is not the same as a flat surface. Keep structure where the concepts are meaningfully different. + +## Preferred shapes + +### Package facade over internal modules + +```python +# mypkg/__init__.py +from ._client import Client +from ._errors import AppError +from ._models import Item + +__all__ = ["Client", "AppError", "Item"] +``` + +Why this works: + +- callers learn one stable import path +- internals can move without import churn +- the package advertises its real contract + +### Explicit module contract + +```python +__all__ = ["Number", "Complex", "Real", "Rational", "Integral"] +``` + +This says: these names are supported; everything else is implementation detail. + +## Counterexamples + +### File layout becomes the API by accident + +```python +from mypkg.utils import helper_a +from mypkg.impl_v2 import thing +from mypkg.more_helpers import other_thing +``` + +Refactoring internal modules now breaks users. + +### Everything dumped into `__init__.py` + +If `__init__.py` exports fifty unrelated names, you did not create a clean facade. You created autocomplete noise. + +### Public API mirrors the folder tree too literally + +If callers need to know today’s internal layout to use the library, the boundary is still underdesigned. + +## Source signals + +### CPython + +- `Lib/numbers.py:8-23` warns that published ABC APIs are hard to change and should be designed carefully. +- `Lib/numbers.py:35` publishes a narrow `__all__` rather than treating every helper as public. +- `Lib/operator.py:13-15` and `Lib/smtplib.py:55-58` do the same in stdlib modules with mixed public/internal names. + +### Pytest + +- `src/pytest/__init__.py:6-80` builds the top-level `pytest` API by importing from many `_pytest.*` internals. +- `src/pytest/__init__.py:98-186` then pins that facade with an explicit `__all__`. + +### HTTPX + +- `httpx/__init__.py:1-12` re-exports the package surface from internal modules such as `._client`, `._exceptions`, and `._models`. +- `httpx/__init__.py:29-100` defines the supported top-level export list explicitly. +- `httpx/__init__.py:103-106` rewrites exported objects’ `__module__` to `httpx`, reinforcing the facade instead of leaking internal filenames. + +### Click + +- `src/click/__init__.py:10-75` exposes the package through re-exports. +- `src/click/__init__.py:77-126` keeps compatibility shims and deprecations at the boundary instead of freezing old internal layout forever. + +## Bottom line + +Make the public API intentional. + +Callers should depend on your concepts, not your current file tree. diff --git a/patterns/testing.md b/patterns/testing.md new file mode 100644 index 0000000..38ba635 --- /dev/null +++ b/patterns/testing.md @@ -0,0 +1,137 @@ +# Testing + +Use fixtures for reusable resource setup, parametrization for behavior matrices, and explicit boundary seams instead of ad hoc mocking. + +## Why + +Good Python tests optimize for three things at once: + +- local readability +- cheap variation across inputs and modes +- reusable setup and cleanup without hiding intent + +The mature pattern is not just “use pytest.” It is: + +- model resources with fixtures +- make fixture lifetime visible with `yield` when cleanup matters +- use parametrization when one behavior should hold across several inputs +- test through boundary seams like transports instead of patching internals blindly + +## The pattern + +1. Use fixtures for shared setup and resources. +2. Use `yield` fixtures when setup and teardown both matter. +3. Use parametrization when the assertion shape is the same but inputs vary. +4. Prefer explicit seams over invasive mocking. +5. Keep the test body focused on behavior, not scaffolding. + +## When to use + +Use fixtures when: + +- multiple tests need the same resource wiring +- setup or cleanup would otherwise dominate the body +- the setup is a dependency, not the behavior under test + +Use parametrization when: + +- one behavior should hold across several inputs or modes +- the data varies but the test story stays the same + +Use transport or injected seams when: + +- the behavior crosses I/O boundaries +- you want realistic flow without spinning up the whole world + +## When not to use + +Do not hide essential behavior behind a giant fixture tower. + +Do not parametrize cases that deserve different narratives or different assertions. + +Do not call fixtures directly like helper functions; if you want a helper, write a helper. + +Do not mock deep internals when a cleaner external seam exists. + +## Preferred shapes + +### Yield fixture for lifecycle + +```python +@pytest.fixture +def resource(): + obj = make_resource() + yield obj + obj.close() +``` + +This keeps setup and teardown obvious. + +### Parametrization for behavior matrices + +```python +@pytest.mark.parametrize("mode", ["prepend", "append", "importlib"]) +def test_import_behavior(mode: str) -> None: + ... +``` + +One behavior, several inputs, no duplicated body. + +### Boundary seam instead of monkeypatch soup + +```python +transport = httpx.MockTransport(handler) +client = httpx.Client(transport=transport) +``` + +This is usually cleaner than patching internals in three places. + +## Counterexamples + +### Repeated setup in every test + +```python +def test_a(): + client = make_client() + tmpdir = make_tmpdir() + seed_db() + + +def test_b(): + client = make_client() + tmpdir = make_tmpdir() + seed_db() +``` + +The scaffolding overwhelms the behavior. + +### Fixture tower opacity + +If understanding the test requires opening six fixtures before reading one assertion, the abstraction has gone too far. + +### Calling fixtures directly + +Pytest explicitly rejects this because fixtures are injected dependencies, not disguised utility functions. + +## Source signals + +### Pytest + +- `src/_pytest/fixtures.py:1378-1440` makes the contract explicit twice: calling a fixture directly is an error, and `yield` fixtures run teardown code after the test outcome. +- `testing/test_threadexception.py:84-91` shows a real `yield` fixture with post-test cleanup work after the `yield`. +- `testing/acceptance_test.py:158-169` uses `@pytest.mark.parametrize(...)` to check one behavior across multiple import modes without cloning the test body. +- `testing/acceptance_test.py:561-574` shows another compact parametrized case where only the example data changes. + +### HTTPX + +- `httpx/_transports/asgi.py:63-83` exposes `ASGITransport` as an in-process integration seam and even documents `raise_app_exceptions=False` for testing 500 responses. +- `httpx/_transports/mock.py:15-43` exposes `MockTransport` as a first-class request/response seam for tests. +- `httpx/_client.py:639-660` accepts `transport=` directly on `Client`, which is what makes transport substitution a normal testing path instead of a hack. + +## Bottom line + +A good Python test makes the behavior easy to see and the environment cheap to vary. + +Use fixtures for lifetime. +Use parametrization for variation. +Use explicit seams instead of brittle patching. diff --git a/patterns/typing.md b/patterns/typing.md new file mode 100644 index 0000000..9926f2e --- /dev/null +++ b/patterns/typing.md @@ -0,0 +1,119 @@ +# Typing + +Use types to describe accepted shapes and behavioral contracts, not to pretend Python is a different language. + +## Why + +Good Python typing improves APIs in two ways: + +- callers can see which shapes are accepted +- maintainers can preserve real boundaries without smearing `Any` everywhere + +The mature pattern is not “make everything maximally abstract.” It is: + +- use structural typing when capability matters more than inheritance +- use explicit aliases and unions for ergonomic public inputs +- keep public APIs typed even when internals stay dynamic + +That gives users a real contract without freezing implementation choices too early. + +## The pattern + +1. Type public APIs precisely. +2. Prefer `Protocol` when callers care about behavior, not ancestry. +3. Use explicit unions and aliases for user-facing flexibility. +4. Keep dynamic internals from leaking into the public contract. +5. Avoid `Any` unless you truly mean “anything goes.” + +## When to use + +Use this when: + +- multiple implementations can satisfy one behavioral need +- callers naturally have more than one valid input shape +- you want strong editor and type-checker help at public boundaries +- internals are dynamic but the public contract is still stable + +## When not to use + +Do not use a protocol when a concrete type is the real contract. + +Do not use broad unions just to avoid choosing a better API. + +Do not over-trust `@runtime_checkable`: CPython is explicit that runtime protocol checks verify only attribute presence, not signature correctness. + +## Preferred shapes + +### Structural typing for capability-based contracts + +```python +class Writer(Protocol): + def write(self, data: bytes) -> int: ... +``` + +If the caller only needs `write()`, do not require a specific base class. + +### Explicit flexible public inputs + +```python +URLInput = URL | str +``` + +This is better than either extreme: + +- forcing callers to pre-wrap everything +- accepting `Any` and hoping for the best + +## Counterexamples + +### Inheritance-only abstraction + +```python +class BaseStore: + ... + +def persist(store: BaseStore) -> None: + ... +``` + +This is too rigid when the function only needs a small capability surface. + +### Type surrender + +```python +def send(data: Any, options: Any) -> Any: + ... +``` + +The API contract disappeared. + +### Runtime protocol overconfidence + +If runtime safety matters, attribute-presence checks are not enough. Protocols do most of their work at static-analysis time. + +## Source signals + +### CPython / typing + +- `Lib/typing.py:2132-2157` defines `Protocol` around structural subtyping and explicitly frames it as static duck typing. +- `Lib/typing.py:2155-2157` states that `@runtime_checkable` protocols check only attribute presence, ignoring type signatures. +- `Lib/typing.py:2190-2250` shows `Annotated` as “type plus metadata,” not a new underlying runtime type. + +### HTTPX + +- `httpx/_client.py:639-660` gives `Client` precise constructor types for auth, params, headers, cookies, timeouts, transports, and `base_url`. +- `httpx/_client.py:1353-1374` mirrors that precision on `AsyncClient` instead of collapsing to untyped arguments. + +### Pydantic + +- `pydantic/main.py:156-205` exposes typed `ClassVar[...]` metadata for config, fields, serializer, and validator state even though framework internals are dynamic. +- `pydantic/main.py:253-264` makes model construction validate `**data: Any` immediately instead of pretending arbitrary inputs are already safe. +- `pydantic/main.py:721-768` gives `model_validate(...) -> Self` an explicit typed boundary contract. + +## Bottom line + +Use typing to make public boundaries clearer. + +Be flexible where callers need flexibility. +Be precise where contracts matter. +Do not hide uncertainty behind `Any`. diff --git a/smells/common-mistakes.md b/smells/common-mistakes.md new file mode 100644 index 0000000..17b4edc --- /dev/null +++ b/smells/common-mistakes.md @@ -0,0 +1,198 @@ +# Common Mistakes in Python + +This file captures recurring non-idiomatic patterns, especially code that reads like Java or TypeScript in a Python costume. + +## 1. Boolean-flag methods that hide two behaviors + +### Smell + +```python +def fetch_user(user_id: str, include_deleted: bool = False) -> User | None: + ... +``` + +### Why it is a smell + +Boolean mode flags often mean one function is doing two jobs. They create call sites like `fetch_user(id, True)` that say almost nothing. + +### Better shape + +- split into separate functions when the behaviors are meaningfully different +- or promote the difference to a real enum or config object when there are several modes + +## 2. Broad `except Exception:` without a boundary reason + +### Smell + +```python +try: + do_work() +except Exception: + return None +``` + +### Why it is a smell + +This erases recovery meaning. Mature libraries like HTTPX and Click use exception hierarchies so callers can catch broadly or narrowly depending on what they can recover from. + +### Better shape + +- catch the specific failures you can handle +- use broad catch points only at real process, API, or CLI boundaries +- translate internal failures into a clear outer error contract + +### Source signals + +- `httpx/_exceptions.py:74-90` defines `HTTPError` as a broad catch point. +- `httpx/_exceptions.py:107-160` then narrows request, transport, and timeout failures for recovery. +- `src/click/exceptions.py:35-111` ties exception subtypes to different exit codes and help output. + +## 3. Hidden I/O in constructors or properties + +### Smell + +```python +class User: + def __init__(self, user_id: str): + self.profile = requests.get(...).json() +``` + +### Why it is a smell + +Object creation now performs surprising network I/O, which makes testing, lifetime, and failure handling muddy. + +### Better shape + +- keep I/O in explicit factory or boundary methods +- make resource acquisition visible +- separate data containers from loading logic + +## 4. Import-time side effects + +### Smell + +```python +client = connect_to_db() +load_config_from_network() +``` + +### Why it is a smell + +Importing a module should usually define names, not secretly talk to the outside world. Import-time side effects make startup order brittle and tests unpredictable. + +### Better shape + +- move initialization into explicit startup paths +- create resources in app setup, dependency wiring, or main entrypoints + +## 5. Mixing validation, transport, and business logic in one class + +### Smell + +```python +class OrderModel(BaseModel): + def save(self): ... + def send_webhook(self): ... + def render_html(self): ... +``` + +### Why it is a smell + +Boundary schema, persistence logic, and business behavior cannot evolve independently anymore. + +### Better shape + +- let Pydantic, attrs, or dataclasses own data-shape concerns +- keep boundary translation explicit +- split long-lived behavior into services or domain objects when needed + +### Source signals + +- `pydantic/main.py:253-264` treats model construction as input validation. +- `pydantic/main.py:455-519` makes dumping an explicit serialization step. +- `src/_pytest/timing.py:24-64` uses frozen dataclasses for small internal value objects. + +## 6. Fake sync wrappers around async code + +### Smell + +```python +def get(url: str) -> Response: + return asyncio.run(async_get(url)) +``` + +### Why it is a smell + +This breaks in existing event loops and hides real async lifetime and cancellation semantics. + +### Better shape + +- provide separate sync and async entrypoints when semantics differ +- keep them parallel, not secretly interchangeable + +### Source signals + +- `httpx/_client.py:594-661` and `httpx/_client.py:1307-1375` define separate `Client` and `AsyncClient` types. +- `httpx/_client.py:639-660` vs. `httpx/_client.py:1353-1374` keep the constructor shapes parallel while changing transport types. +- `examples/asyncio/async_orm.py:61-67` vs. `examples/inheritance/joined.py:93-120` show the same split in SQLAlchemy session usage. + +## 7. Overuse of `Any` + +### Smell + +```python +def send(data: Any, options: Any) -> Any: + ... +``` + +### Why it is a smell + +The API contract disappeared. + +### Better shape + +- use concrete types when the contract is specific +- use `Protocol` when callers care about capability +- use explicit unions or aliases for ergonomic flexibility + +### Source signals + +- `Lib/typing.py:2132-2157` frames `Protocol` around structural subtyping. +- `Lib/typing.py:2155-2157` warns that runtime protocol checks ignore signatures. +- `httpx/_client.py:639-660` shows rich public typing instead of `Any`-shaped parameters. + +## 8. Accidental public APIs through file layout + +### Smell + +```python +from mypkg.utils import helper_a +from mypkg.impl_v2 import thing +``` + +### Why it is a smell + +Internal module layout becomes the public contract by accident, which makes refactors painful. + +### Better shape + +- publish a deliberate import surface +- re-export supported names +- use `__all__` when the boundary should be explicit + +### Source signals + +- `src/pytest/__init__.py:6-80` builds a top-level facade over `_pytest.*` internals. +- `src/pytest/__init__.py:98-186` pins that facade with `__all__`. +- `httpx/__init__.py:1-12` and `httpx/__init__.py:29-106` do the same while also rewriting exported objects to appear under `httpx`. + +## Heuristic + +If the code: + +- hides resource lifetime +- hides write boundaries +- hides which failures matter +- hides what the public API really is + +…it is usually fighting Python instead of using it. diff --git a/sources/README.md b/sources/README.md new file mode 100644 index 0000000..6fd4104 --- /dev/null +++ b/sources/README.md @@ -0,0 +1,32 @@ +# Source Notes + +This directory stores the reusable evidence behind the pattern docs. + +## What belongs here + +One note per upstream repo, with: +- why the repo is useful +- repeated patterns, not a reading diary +- caveats and counterexamples +- exact `file:line` anchors +- pattern candidates supported by the evidence + +## Current notes + +- `cpython.md` +- `httpx.md` +- `pytest.md` +- `pydantic.md` +- `attrs.md` +- `click.md` +- `sqlalchemy.md` + +## Quality bar + +A source note is good when it makes later synthesis cheap: +- repeated evidence is separated from one-off examples +- vague claims are avoided +- citations are fast to verify +- caveats survive instead of being flattened away + +Read `../PROCESS.md` for the full repeatable workflow. \ No newline at end of file diff --git a/sources/attrs.md b/sources/attrs.md new file mode 100644 index 0000000..c41ef6f --- /dev/null +++ b/sources/attrs.md @@ -0,0 +1,48 @@ +# Attrs source notes + +Repo: `python-attrs/attrs` +Local checkout: `/home/ubuntu/repos/rodin-sources/attrs` + +## Why this repo is useful +- `attrs` is a strong source for explicit data-carrier design: generated methods, constructor-shape control, and deliberate conversion at boundaries. +- Its docs are especially valuable because they include caveats and failure cases, not just happy-path examples. + +## Declarative fields are the default for data-heavy classes + +### Repeated evidence +- `docs/examples.md:24-44` shows `@define` with typed fields immediately generating constructor, repr, and equality behavior. +- `docs/examples.md:31-44` makes the generated behavior visible at the REPL rather than implicit. +- `docs/examples.md:51-58` shows the same declarative shape without relying on type annotations, via `field()`. + +### Why it matters +Repeated signal: when a class mostly carries data, `attrs` prefers declaring fields and letting the library generate the boilerplate. The code emphasizes structure and invariants over handwritten dunder noise. + +### Caveat / counterexample +`docs/examples.md:60-62` warns that mixing `field()` declarations without annotations flips `attrs` into a no-typing mode and can ignore annotation-only attributes. That is a sharp edge worth preserving in synthesis: declarative does not mean "mix styles freely." + +## Keyword-only fields are used to protect call-site clarity and inheritance + +### Repeated evidence +- `docs/examples.md:147-157` shows `field(kw_only=True)` forcing explicit construction at the call site. +- `docs/examples.md:159-172` shows decorator-level `@define(kw_only=True)` applying the same rule to the whole class. +- `docs/examples.md:176-191` shows the practical inheritance payoff: subclasses can add required fields even when the base class already has defaults. +- `docs/examples.md:193-205` shows the counterexample when `kw_only=True` is omitted: invalid attribute ordering raises a `ValueError`. + +### Why it matters +Repeated signal: keyword-only fields are not just cosmetic. They are a tool for making constructor calls self-describing and for avoiding inheritance-order traps. + +## Serialization is explicit and filterable + +### Repeated evidence +- `docs/examples.md:211-217` shows `asdict(...)` as a deliberate conversion step from object to plain data. +- `docs/examples.md:219-235` shows `asdict(..., filter=...)` excluding sensitive fields like passwords. +- `docs/examples.md:238-253` shows built-in include/exclude helpers for more reusable serialization control. + +### Why it matters +Repeated signal: even value-like objects are not assumed to be wire-ready. `attrs` makes the serialization boundary explicit and gives callers hooks to shape what crosses it. + +## Pattern candidates supported by this repo +- use declarative field definitions for data-carrier classes +- prefer keyword-only construction when call-site clarity or inheritance safety matters +- keep serialization as an explicit step +- preserve caveats about mixed declaration styles and constructor ordering diff --git a/sources/click.md b/sources/click.md new file mode 100644 index 0000000..9058bae --- /dev/null +++ b/sources/click.md @@ -0,0 +1,50 @@ +# Click source notes + +Repo: `pallets/click` +Local checkout: `/home/ubuntu/repos/rodin-sources/click` + +## Why this repo is useful +- Click is a strong source for CLI API design: stable top-level exports, context-passing conventions, and user-facing exception behavior. +- It is especially useful because the implementation ties API design directly to operator experience at the terminal. + +## The package root is a curated facade with compatibility shims + +### Repeated evidence +- `src/click/__init__.py:10-75` re-exports commands, decorators, exceptions, types, and terminal helpers from internal modules. +- `src/click/__init__.py:77-124` uses `__getattr__` to keep deprecated compatibility names (`BaseCommand`, `MultiCommand`, `OptionParser`, `__version__`) working while emitting warnings. + +### Why it matters +Repeated signal: mature libraries often keep the package root stable even while internal layout evolves. Click treats the package root as the user-facing API and places compatibility logic there deliberately. + +### Caveat / counterexample +Compatibility shims are useful, but they are debt. Click's use of deprecation warnings is the important pattern: keep compatibility explicit and time-bounded rather than silently permanent. + +## Command state is passed through context objects, not globals + +### Repeated evidence +- `docs/complex.md:53-61` explains that callbacks do not receive context unless they opt in, and that `Context.invoke` mediates invocation. +- `docs/complex.md:92-99` shows a root command storing application state on `ctx.obj`. +- `docs/complex.md:107-113` states directly that `Context.obj` is the place commands are supposed to remember what they need to pass to children. +- `src/click/decorators.py:51-93` implements `make_pass_decorator(...)` by searching the linked context chain for the nearest object of the desired type and invoking the callback with it. + +### Why it matters +Repeated signal: Click favors explicit, nestable context propagation over module globals or hidden singletons. That matters for complex CLIs with subcommands and plugins. + +### Caveat / counterexample +`docs/complex.md:143-163` points out the interleaved-command problem: plugin layers can replace `ctx.obj`. That is why `make_pass_decorator(...)` exists; plain `pass_obj` is not always enough once commands are nested by third parties. + +## Exceptions encode user-visible behavior, not just categorization + +### Repeated evidence +- `src/click/exceptions.py:35-65` defines `ClickException` with an exit code, cached color behavior, and a `show()` method for terminal rendering. +- `src/click/exceptions.py:68-111` defines `UsageError` with a different exit code and help-aware rendering that prints usage plus a "Try '--help'" hint when context is available. +- `src/click/exceptions.py:114-118` documents `BadParameter` as a subtype that gains parameter context automatically. + +### Why it matters +Repeated signal: in CLI libraries, exceptions often need to carry exit semantics and presentation rules, not just messages. Click's hierarchy is built around what the operator should see next. + +## Pattern candidates supported by this repo +- expose a stable package-level facade over internal modules +- use explicit compatibility shims with deprecation warnings +- pass CLI state through typed/named context objects rather than globals +- design exception types around exit behavior and user guidance diff --git a/sources/cpython.md b/sources/cpython.md new file mode 100644 index 0000000..4992c70 --- /dev/null +++ b/sources/cpython.md @@ -0,0 +1,53 @@ +# CPython source notes + +Repo: `python/cpython` +Local checkout: `/home/ubuntu/repos/rodin-sources/cpython` + +## Why this repo is useful +- CPython is a strong source for patterns that survive long-term compatibility pressure. +- It is especially useful for API-boundary choices and error-shape choices because stdlib modules must stay understandable to a huge caller base. +- Caveat: stdlib code is not stylistically uniform. Treat repeated shapes across modules as signal; do not treat a single module's convention as "the Python way." + +## Public API boundaries are often declared explicitly + +### Repeated evidence +- `Lib/operator.py:13-20` declares a tight `__all__` list instead of exporting every helper or alias in the module namespace. +- `Lib/smtplib.py:55-58` does the same for the SMTP module, listing public exceptions and the `SMTP` client. +- `Lib/warnings.py:3-18` likewise enumerates the supported warning helpers instead of relying on incidental module globals. + +### Why it matters +Repeated signal: when a module has helper names, aliases, or internal scaffolding, CPython often freezes the supported surface explicitly. That makes the public API a maintained decision rather than an accident of file layout. + +### Caveat / counterexample +This is common, not universal. Some stdlib modules still expose names without a curated `__all__`, so the useful pattern is not "always add `__all__`" but "use it when the file contains more than the intended public surface." + +## Exception trees are shaped around recovery needs, not just taxonomy + +### Repeated evidence +- `Lib/smtplib.py:69-71` defines `SMTPException` as the broad catch point for the module. +- `Lib/smtplib.py:73-86` adds semantic subtypes for unsupported commands and disconnected-session failures. +- `Lib/smtplib.py:88-100` defines `SMTPResponseException` and stores structured payload on the exception itself via `smtp_code` and `smtp_error`. +- `Lib/smtplib.py:102-125` and `Lib/smtplib.py:128-142` narrow further into sender, recipient, data, connect, HELO, and auth failures. + +### Why it matters +Repeated signal: CPython exception trees usually give callers two useful options at once: +- catch broadly at the subsystem boundary, or +- branch narrowly on structured failure data when recovery differs. + +This is stronger than a flat set of unrelated exception classes and stronger than raising plain `OSError`/`ValueError` with message parsing. + +## Resource-owning helpers usually make lifetime visible + +### Repeated evidence +- `Lib/contextlib.py:31-43` defines the abstract context-manager protocol directly in the stdlib. +- `Lib/tempfile.py:487-539` wraps temporary files so `__enter__` returns the wrapper and `__exit__` guarantees cleanup. +- `Lib/tempfile.py:758-761` also guards context entry on closed temporary files, showing that lifetime rules are enforced, not just documented. + +### Why it matters +Repeated signal: when a stdlib helper owns cleanup-sensitive state, CPython prefers explicit context-manager boundaries over ambient cleanup assumptions. + +## Pattern candidates supported by this repo +- declare public module surfaces explicitly when helper names would otherwise leak +- build exception hierarchies around caller recovery paths +- attach structured data to exceptions when callers need to branch without string parsing +- make resource lifetime obvious with context-manager boundaries diff --git a/sources/httpx.md b/sources/httpx.md new file mode 100644 index 0000000..9928663 --- /dev/null +++ b/sources/httpx.md @@ -0,0 +1,54 @@ +# HTTPX source notes + +Repo: `encode/httpx` +Local checkout: `/home/ubuntu/repos/rodin-sources/httpx` + +## Why this repo is useful +- HTTPX is a strong source for modern boundary-design patterns in Python libraries: sync/async separation, transport seams, and caller-oriented exception design. +- It is especially useful because the same conceptual API is implemented twice (sync and async), making repeated design choices easy to spot. + +## Sync and async APIs are parallel types, not a mode flag + +### Repeated evidence +- `httpx/_client.py:594-660` defines `Client` as the synchronous entrypoint with `BaseTransport` and thread-sharing semantics. +- `httpx/_client.py:1307-1374` defines `AsyncClient` separately with the same broad constructor shape but `AsyncBaseTransport` and task-sharing semantics. +- `httpx/_client.py:688-696` and `httpx/_client.py:1402-1410` show the same transport-initialization flow in each class, reinforcing that the APIs are intentionally parallel rather than conditionally branching inside one type. + +### Why it matters +Repeated signal: mature Python networking libraries keep sync and async usage obviously separate in the type system while preserving familiar parameter shapes. That lowers cognitive overhead without hiding execution-model differences. + +### Caveat / counterexample +The pattern is not "duplicate everything." HTTPX keeps shared behavior in `BaseClient`; the duplication is at the public entrypoint where transport types and calling style genuinely differ. + +## Exceptions are layered for catch-broadly / recover-narrowly behavior + +### Repeated evidence +- `httpx/_exceptions.py:74-90` makes `HTTPError` the top-level catch point and explicitly documents `try/except httpx.HTTPError` as a supported usage pattern. +- `httpx/_exceptions.py:107-120` defines `RequestError` for failures that occur while issuing a request and explains why request context may be attached later. +- `httpx/_exceptions.py:123-160` narrows transport failures into `TransportError` and timeout-specific subclasses. +- `httpx/_exceptions.py:167-178` continues the layering into network-specific failures. + +### Why it matters +Repeated signal: the exception tree is organized around what callers do next: +- catch one broad library exception for "request failed somehow" +- catch a narrower transport or timeout subtype for retry/backoff behavior +- still access `exc.request` when request context has been attached + +## Testing and embedding happen at the transport boundary + +### Repeated evidence +- `httpx/_transports/asgi.py:63-97` exposes `ASGITransport` as a first-class transport for routing requests directly into an ASGI app. +- `httpx/_transports/asgi.py:78-83` explicitly calls out `raise_app_exceptions=False` for testing 500 responses instead of surfacing app exceptions immediately. +- `httpx/_transports/mock.py:15-43` defines `MockTransport` as a shared sync/async seam that accepts a handler and adapts it through the transport interface. + +### Why it matters +Repeated signal: HTTPX prefers substitutable transports over monkeypatching internal request code. That is a strong pattern for any client library that needs both real I/O and test-time embedding. + +### Caveat / counterexample +Transport seams are great for boundary tests, but they are not a full replacement for end-to-end network tests. The strong pattern is to make the boundary swappable, not to pretend boundary tests cover every production behavior. + +## Pattern candidates supported by this repo +- split sync and async public APIs into separate types +- keep constructor shapes parallel across sync/async variants +- design exception trees around recovery decisions +- expose transport seams for testing, embedding, and alternate runtimes diff --git a/sources/pydantic.md b/sources/pydantic.md new file mode 100644 index 0000000..722cda6 --- /dev/null +++ b/sources/pydantic.md @@ -0,0 +1,56 @@ +# Pydantic source notes + +Repo: `pydantic/pydantic` +Local checkout: `/home/ubuntu/repos/rodin-sources/pydantic` + +## Why this repo is useful +- Pydantic is a strong source for boundary-object patterns: validating incoming data, preserving typed state, and serializing back out explicitly. +- It is also useful for validation-hook design because the docs distinguish several validator phases and call out their tradeoffs clearly. + +## Models are explicit validation + serialization boundaries + +### Repeated evidence +- `pydantic/main.py:119-145` defines `BaseModel` as the central abstraction and documents that models carry schema, field metadata, and decorator metadata. +- `pydantic/main.py:201-205` explicitly exposes model-level serializer and validator machinery as core parts of the abstraction. +- `docs/index.md:68-82` shows external data entering through model construction. +- `docs/index.md:82-89` immediately turns the model back into a plain data structure with `model_dump()`. +- `docs/index.md:109-152` shows invalid boundary data raising `ValidationError` with structured per-field errors instead of silently degrading. + +### Why it matters +Repeated signal: Pydantic models are meant to sit at I/O boundaries. Input is validated/coerced at construction time; output is serialized through an explicit dump step. + +### Caveat / counterexample +The strong pattern is not "models are your whole domain model." The evidence here is boundary-oriented: construct from external data, then call `model_dump()` when leaving the boundary again. + +## Validators are narrow and phase-aware + +### Repeated evidence +- `docs/concepts/validators.md:91-114` shows an `after` field validator that checks one parsed field and must return the validated value. +- `docs/concepts/validators.md:160-167` explains that `before` validators run prior to internal parsing and therefore receive raw input. +- `docs/concepts/validators.md:220-252` demonstrates a `before` validator that reshapes raw input and then lets normal item validation continue. + +### Why it matters +Repeated signal: the best validator hooks are small in scope and explicit about phase: +- `before` for raw-input normalization +- `after` for post-parse invariants + +This prevents validation logic from becoming an opaque second parser. + +## Validator mode choice has real behavioral consequences + +### Repeated evidence +- `docs/concepts/validators.md:160-164` warns that `before` validators should avoid careless mutation when raising later, especially with unions. +- `docs/concepts/validators.md:254-255` states that `plain` validators terminate validation immediately. +- `docs/concepts/validators.md:273-283` shows the consequence directly: a `PlainValidator` can return `'invalid'` for a field annotated as `int`, and Pydantic will accept it. + +### Why it matters +Repeated signal: validator mode is not just an implementation detail. It changes whether core type validation still runs. + +### Caveat / counterexample +This is the sharpest anti-pattern in the repo: `plain` validators are powerful, but they can bypass the type guarantee a reader expects from the annotation. Use them only when terminating validation is the actual goal. + +## Pattern candidates supported by this repo +- use typed models at I/O boundaries +- serialize explicitly with `model_dump()` +- keep validators field-scoped and phase-aware +- treat `plain` validators as an escape hatch, not the default diff --git a/sources/pytest.md b/sources/pytest.md new file mode 100644 index 0000000..8e53502 --- /dev/null +++ b/sources/pytest.md @@ -0,0 +1,50 @@ +# Pytest source notes + +Repo: `pytest-dev/pytest` +Local checkout: `/home/ubuntu/repos/rodin-sources/pytest` + +## Why this repo is useful +- Pytest is a strong source for package-facade patterns and lifecycle-heavy test helper patterns. +- It is especially useful because the public package is deliberately small compared to the internal `_pytest.*` implementation tree. + +## The public package is a curated facade over internal modules + +### Repeated evidence +- `src/pytest/__init__.py:6-92` re-exports the supported testing API from many `_pytest.*` implementation modules. +- `src/pytest/__init__.py:98-186` defines `__all__` explicitly, turning the package root into a maintained public surface. +- `src/pytest/__init__.py:23-30` includes compatibility-minded exports like `yield_fixture`, showing that the facade also absorbs historical API pressure. + +### Why it matters +Repeated signal: large libraries can keep internal structure fluid while giving users one stable import surface. The top-level package behaves like an API contract, not just a mirror of file layout. + +### Caveat / counterexample +This pattern does create maintenance pressure: once the facade exports a name, deprecating or removing it becomes a public compatibility event. Pytest accepts that tradeoff intentionally. + +## Fixture cleanup is modeled as an explicit lifetime boundary + +### Repeated evidence +- `testing/test_monkeypatch.py:17-23` defines a fixture that snapshots global state, `yield`s the resource, then restores state after the test. +- `testing/test_threadexception.py:84-90` uses a `yield` fixture where the teardown action intentionally runs after test execution. +- `doc/en/how-to/fixtures.rst:551-553` marks `yield` fixtures as the recommended path. +- `doc/en/how-to/fixtures.rst:669-677` contrasts them with `addfinalizer`, explicitly framing finalizers as the alternative when needed. + +### Why it matters +Repeated signal: pytest prefers resource lifetime that is legible in source order: setup before `yield`, teardown after `yield`. That shape scales better than hidden cleanup hooks. + +### Caveat / counterexample +`request.addfinalizer(...)` still exists for cases where teardown must be registered dynamically, but pytest's own docs present it as the less straightforward option. That is important evidence that `yield` fixtures are the convention, not just one possible style. + +## Failure expectations are explicit context boundaries + +### Repeated evidence +- `testing/test_monkeypatch.py:31-32`, `testing/test_monkeypatch.py:50-51`, and `testing/test_monkeypatch.py:76-85` repeatedly use `with pytest.raises(...)` around the exact failing operation. +- `testing/acceptance_test.py:513-514` and `testing/test_pluginmanager.py:254-255` show the same shape in broader integration tests. + +### Why it matters +Repeated signal: pytest encourages failure expectations that wrap the smallest relevant operation, keeping the expected failure boundary local and visible. + +## Pattern candidates supported by this repo +- expose a stable top-level facade over private implementation packages +- use explicit `__all__`/re-export curation for public APIs +- model test resource lifetime with `yield` fixtures +- express expected failures with tight context-manager boundaries diff --git a/sources/sqlalchemy.md b/sources/sqlalchemy.md new file mode 100644 index 0000000..da88cec --- /dev/null +++ b/sources/sqlalchemy.md @@ -0,0 +1,49 @@ +# SQLAlchemy source notes + +Repo: `sqlalchemy/sqlalchemy` +Local checkout: `/home/ubuntu/repos/rodin-sources/sqlalchemy` + +## Why this repo is useful +- SQLAlchemy is a strong source for persistence-boundary patterns: explicit session lifetime, transaction visibility, and parallel sync/async APIs. +- It is especially useful because the examples make lifecycle boundaries visible in ordinary calling code rather than hiding them in framework glue. + +## Sync and async persistence APIs are parallel but distinct + +### Repeated evidence +- `examples/asyncio/async_orm.py:15-18` imports async-specific engine and session primitives. +- `examples/asyncio/async_orm.py:51-67` builds an async engine and `async_sessionmaker(...)`, then enters an async session and transaction block explicitly. +- `examples/inheritance/joined.py:7-17` imports sync engine/session primitives separately. +- `examples/inheritance/joined.py:90-120` uses `Session(engine)` with explicit add/commit calls in the synchronous path. + +### Why it matters +Repeated signal: SQLAlchemy does not pretend sync and async persistence are the same execution model. The APIs are conceptually parallel, but the entrypoints stay distinct. + +### Caveat / counterexample +The useful pattern is not "keep two unrelated APIs." `examples/asyncio/async_orm.py:78-79` explicitly notes that async execution uses the same 2.0-style ORM execution concepts as the sync API. The separation is at runtime model and lifecycle, not at overall mental model. + +## Session and transaction lifetime are made visible in calling code + +### Repeated evidence +- `examples/asyncio/async_orm.py:56-59` uses `engine.begin()` blocks for schema setup. +- `examples/asyncio/async_orm.py:65-74` nests `session.begin()` inside `async_session()` so write scope is easy to see. +- `examples/inheritance/joined.py:93-120` uses `with Session(engine) as session:` and makes the write boundary explicit with `session.add(...)` followed by `session.commit()`. +- `examples/inheritance/joined.py:133-135` shows a later mutation followed by another explicit `session.commit()` rather than hidden autoflush-as-commit semantics. + +### Why it matters +Repeated signal: SQLAlchemy favors visible unit-of-work boundaries. You can usually point to the exact lines where a session starts, a transaction begins, and persistence becomes durable. + +## Async examples surface async-specific caveats instead of hiding them + +### Repeated evidence +- `examples/asyncio/async_orm.py:61-63` calls out `expire_on_commit=False` and explains the post-commit attribute-expiration consequence directly. +- `examples/asyncio/async_orm.py:75-80` notes that eager loading should be applied for relationship loading in the async example. +- `examples/asyncio/async_orm.py:106-107` shows the explicit `AsyncAttrs` path for lazy-loaded relationships via `awaitable_attrs`. + +### Why it matters +Repeated signal: the async API is not just a renamed sync API. SQLAlchemy documents where async changes loading and object-lifetime behavior, which is exactly the kind of caveat future synthesis should preserve. + +## Pattern candidates supported by this repo +- keep sync and async persistence entrypoints distinct but conceptually parallel +- make session and transaction scope visible in user code +- use explicit commit boundaries for writes +- preserve async-specific loading/lifecycle caveats rather than smoothing them over