5 steps: Quantify → Extract one → Decision tree → Cross-refs → Hyperlinks. Delegation strategy (per-entry, not per-file). Discovery greps for Go, Elixir, Rust, Python. Hyperlink scripts per language.
20 KiB
name, description
| name | description |
|---|---|
| codebase-analysis | Analyze open source repositories to extract conventions or patterns. Two modes: "conventions" (how a project works architecturally) and "patterns" (how to write idiomatic code in that language/ecosystem). Use when asked to "analyze a repo", "extract patterns from", "what conventions does X use", "how should I write X", "what's idiomatic", "add X to the analysis repos", or "how does X do Y architecturally". Do NOT use for: code review of specific PRs (use pr-review), security audits (use vuln-scout), or reading a single file for a quick answer. |
Codebase Analysis
Extract conventions or idiomatic patterns from open source repos.
Mode
Set MODE when invoking (or infer from request):
| Mode | Question | Output | Repo suffix |
|---|---|---|---|
conventions |
"How does this project work?" | Architecture, governance, unique infra | *-conventions |
patterns |
"How should I write code like this?" | Prescriptive rules for users | *-patterns |
Default: conventions unless the request says "idiomatic",
"how to write", "style guide", or "patterns for users".
Both modes share Phases 1-7. They diverge at Phase 8 (synthesis).
Configuration
Set these in your workspace context (TOOLS.md, AGENTS.md, or pass explicitly when invoking the skill):
| Parameter | Description | Example |
|---|---|---|
CLONE_DIR |
Directory to clone repos into | ~/src/analysis/ |
CLONE_HOST |
Machine with disk + git for cloning | forge, localhost |
GIT_REMOTE |
Where convention repos are pushed | https://git.example.com |
GIT_ORG |
Org/user for convention repos | myorg, username |
GIT_TOKEN_PATH |
Path to auth token for pushing | ~/.credentials/git-token |
Minimum required: CLONE_DIR and GIT_REMOTE. If others are
omitted:
CLONE_HOSTdefaults to localhost (current machine)GIT_ORGdefaults to the authenticated userGIT_TOKEN_PATHuses default git credential helper
Example in TOOLS.md:
## Codebase Analysis
- CLONE_DIR: ~/src/analysis/
- CLONE_HOST: my-dev-server (ssh user@host)
- GIT_REMOTE: https://git.example.com
- GIT_ORG: my-patterns
- GIT_TOKEN_PATH: ~/.credentials/git-token
If not explicitly provided, infer from workspace context (TOOLS.md, shell environment, or git remote configuration).
Naming
*-patterns= prescriptive (how users should write code)*-conventions= descriptive (how a specific codebase works)
A language can have both: go-patterns (write Go like this) AND
golang-conventions (how the Go team builds Go itself).
Thinking Framework
Before starting any analysis, ask:
- What is this project's essence? A trading system is a state machine where the state is money. A workflow engine is a tree of state machines. Name the essence — the patterns follow from it.
- What forces shaped it? Team size, age, performance constraints, backward compatibility obligations. These predict WHERE conventions will be strict vs relaxed.
- What would surprise me? The interesting findings are never "they use interfaces" — it's "they have 566 dynamic config settings" or "zero TODOs in 3.8M of code." Surprise = insight.
Prioritization: What to Dig Into
Not everything is interesting. Focus on patterns that:
- Appear >50 times — this is a conscious convention, not a one-off
- Have a dedicated package — someone thought it was important enough to abstract
- Other projects solve differently — reveals a real design tradeoff
- Have a surprising name — indicates the team had to invent vocabulary for a novel concept
- Were introduced recently with many PR comments — active design decisions with recorded rationale
Skip patterns that are:
- Standard library usage (unless the project wraps/extends it)
- Single-use internal helpers
- Generated code
- Exact copies of well-known open-source patterns without modification
Phases
Phase 1: Shape (5 min)
Clone to CLONE_DIR/<name> on CLONE_HOST. Full clone — never shallow.
Measure: size, files, commits, contributors, top-level dirs.
What matters here: The ratio of test files to production files.
The presence/absence of internal/ vs flat structure. Whether there's
a single pkg/ or many top-level packages. These reveal organizational
philosophy before you read a single line.
Phase 2: What the Codebase Values (10 min)
Find the most-imported internal packages. The top 5 are the project's definition of "foundational."
Ask: Why these? What do they share? Usually: logging, errors, config, and one domain-specific abstraction that IS the project. That domain-specific one is where the real conventions live.
See references/commands.md for grep patterns by language.
Phase 3: Interface Contracts (10 min)
Find interfaces/behaviours/protocols — but don't list them all.
Focus on: Interfaces with >3 implementations (these are real extension points). Interfaces in constructor signatures (these are dependency injection boundaries). Interfaces that appear in BOTH production and test code (these are the testability seams).
Skip: One-method interfaces (usually just for mocking). Interfaces only used in one place (not yet conventions).
Phase 4: Quality Fingerprint (5 min)
Measure: TODO count, FIXME count, HACK count, test count, mock count.
What to notice:
- TODO format reveals discipline:
TODO(owner):= accountability,TODO:= aspirational, version-gated = systematic cleanup - Zero TODOs in a large codebase means active cleanup culture
- High mock count relative to test count suggests heavy DI
- HACK count > 0 is honest; HACK count = 0 in a large project is suspicious (they probably use different words)
Phase 5: Unique Patterns (15 min)
Look for infrastructure NOT in stdlib. Categories:
- Concurrency: goroutine handles, schedulers, shutdown primitives
- Testing: custom assertions, fake registries, golden file systems
- Configuration: dynamic config, feature flags, runtime toggles
- Error handling: custom error types, assertion systems, panic recovery patterns
- Extension: plugin registration, hook systems, middleware chains
The test for uniqueness: Would you be surprised to find this in another project of similar size? If yes → convention worth documenting. If no → standard practice, skip.
Phase 6: Git Archaeology (20 min)
For each unique pattern found in Phase 5:
- Find the commit that introduced it (
git log --diff-filter=A) - Read the commit message — the "why" is usually there
- Check if it replaced something (
git log -S "old_name") - Note the date and author — context for why shortcuts were taken
The insight is always WHY, not WHAT. A bare goroutine with a TODO is uninteresting as a listing. A bare goroutine introduced during a complex 20-file admission control feature, tagged by the author in the same commit, that survived 3 years because nobody touched the function — that's a lesson about how real codebases evolve.
See references/commands.md for git archaeology patterns.
If the repo is on a forge without PR history (self-hosted, mailing list-based): Fall back to commit messages and CHANGELOG. The commit body IS the PR description for these projects. Look for "Reviewed-by" trailers and linked issues.
Phase 7: PR Discussions (20 min)
Find PRs where key patterns were introduced. Read:
- The PR body (author's motivation)
- Review comments (the debate)
- The resolution
What to extract from discussions:
- What the author was defending (= where the real insight is)
- What reviewers pushed back on (= non-obvious tradeoffs)
- Whether it was "merge and iterate" vs "perfect before merge"
- Whether external validation was cited (benchmarks, user feedback)
- The migration strategy (big-bang vs gradual coexistence)
The highest-value finding: When a reviewer says "I wish we'd done X instead" and the author explains why X doesn't work. That tradeoff reasoning is pure expert knowledge.
Phase 8: Synthesis
Produce output based on MODE. Push to GIT_REMOTE.
MODE: conventions
Output: <project>-conventions repo.
analysis.md — the full story:
- Repo shape and organizational philosophy
- Import hierarchy (what it values)
- Key patterns with code examples + origin stories
- PR discussion excerpts (attributed quotes)
- Cross-ecosystem comparisons (prior art, independent invention)
- Quality metrics in context (not bare numbers)
conventions.md — the reference:
For each unique pattern:
- Name and location in source
- Code example (real, not simplified)
- When to use / When NOT to use
- Origin (commit date, author, PR# if available)
Tone: Descriptive. "This project does X because Y."
MODE: patterns
Output: <language>-patterns or <ecosystem>-patterns repo.
Synthesis question: "What should a developer copy from this codebase?" Filter everything through: "If I were writing new code in this language/ecosystem, what rules does this source teach me?"
This is iterative, not one-shot. The method produces quality through decomposition, not through asking one agent to "write a good file." Each step is bounded, mechanical, and verifiable.
The Repeatable Method
Step 1: Quantify (5 min per topic)
For each topic area, run frequency grep commands to find patterns. The goal is COUNTS — how often does this pattern appear?
# Example: error handling in Go
grep -rn "^var Err" --include="*.go" | grep -v test | wc -l → 55
grep -rn "fmt.Errorf.*%w" --include="*.go" | grep -v test | wc -l → 115
grep -rn "errors\.Is\|errors\.As" --include="*.go" | wc -l → 212
Output: a numbered list of pattern names + counts. This IS the table of contents for that topic file.
Step 2: Extract one (5-10 min per pattern)
For EACH pattern from the list, in order:
- Find the best example (grep → pick the clearest one)
- Read 10 lines of surrounding context (understand WHY)
- Write one pattern entry (40-80 lines, all required sections)
- Move to the next pattern
The key constraint: write one pattern entry completely before starting the next. Never read all patterns then write all entries. This prevents context exhaustion and ensures each entry is complete.
Step 3: Decision tree (5 min per topic)
After all patterns are written, add a decision tree at the end. Format: "If X, use pattern A. If Y, use pattern B."
Step 4: Cross-references (2 min per topic)
Add See also: links to related topic files.
Step 5: Hyperlinks (mechanical, scriptable)
Convert all source references to clickable permalinks:
HEAD=$(git rev-parse HEAD)
BASE="https://github.com/OWNER/REPO/blob/${HEAD}"
sed -i -E "s|\`(path/file\.ext):([0-9]+)\`|[\1#L\2](${BASE}/\1#L\2)|g" file.md
Delegation Strategy
When using sub-agents:
- DO: One agent per pattern entry (bounded: read one, write one)
- DO: Give the agent the grep output as input (they don't discover, they deepen a known pattern)
- DO: Include one complete example entry in the prompt as the quality reference
- DON'T: Ask one agent to write an entire topic file
- DON'T: Ask agents to "discover patterns" (they'll find 5 obvious ones and miss 10 important ones)
- DON'T: Let agents choose their own structure (give them the template)
Template for sub-agent task:
Write pattern entry for: [PATTERN NAME]
Source repo: [REPO] at commit [SHA]
Access: [SSH command to get to the source]
Permalink base: [URL]
Grep that found this: [the grep command + sample output]
Reference quality: [paste ONE complete pattern entry as example]
Write to: [output path]
Parallelism
- Step 1 (quantify): run for ALL topics in parallel (just grep)
- Step 2 (extract): run per-pattern entries in parallel (max 5)
- Steps 3-5: sequential (need all entries to exist first)
Done Criteria
A topic file is done when:
- Every pattern from Step 1's list has an entry
- Each entry has ALL required sections (source, why, when to use with before/after, when NOT to use with over-application)
- Decision tree exists at the end
- All source refs are hyperlinked
- PATTERN_COMPLETE sentinel at EOF
- File is 500-1000 lines (if shorter, entries are too shallow)
A language is done when:
- 8-12 topic files exist
- Each topic has 10-15+ patterns
- Total is 5,000-10,000+ lines
- No grep scan reveals patterns not yet documented
- smells.md covers anti-patterns found in the source
Output structure — one file per topic:
patterns/<topic>.md — topics include (but aren't limited to):
- Error handling (sentinel errors, error types, wrapping, multi-error)
- Naming conventions (packages, types, functions, receivers)
- Concurrency patterns (goroutines, channels, mutexes, sync primitives)
- Testing patterns (table-driven, helpers, fixtures, benchmarks, examples)
- Interface/protocol design (size, composition, assertion, extension)
- Module/package organization (layout, internal/, visibility)
- Documentation conventions (godoc, deprecation, package-level)
- Performance idioms (pooling, preallocate, append, zero-alloc)
- Configuration patterns (functional options, config structs, defaults)
- Extension/plugin patterns (registration, middleware, hooks)
- Struct patterns (constructors, zero values, embedding, tags)
- API design (backwards compat, versioning, deprecation strategy)
Start with 8–10 topics for a language stdlib; add more if the source shows distinct patterns in additional areas. Each topic should map to a real problem domain that developers face.
File naming: Use lowercase, hyphenated names that describe the
topic clearly: error-handling.md, testing-advanced.md,
api-conventions.md, concurrency.md.
Each pattern entry requires ALL of these sections:
## N. Pattern Name
Short, linkable heading (no generic names like "Pattern 1").
### Source:
Hyperlinked to the exact file and line on the forge.
Format: [src/io/io.go#L86](https://github.com/golang/go/blob/COMMIT_SHA/src/io/io.go#L86)
Use permalink format (commit SHA) for stability.
Real source example
The actual code from the source, with file:line comments. Not simplified, not invented. This IS the evidence.
### Why
The force that makes this the right choice. Not "because the stdlib does it" — explain the FORCE (testability, allocation cost, readability under diff, composability).
### When to Use
Triggers: — bullet list of specific situations that call for this.
Example — before: — code showing the problem WITHOUT the pattern. This is critical. Readers must recognize their own bad code here.
Example — after: — code showing the same problem WITH the pattern. The before/after pair is what makes patterns teachable.
### When NOT to Use
Don't use this when: — bullet list of boundary conditions.
Over-application example: — code showing what happens when you use this pattern where it doesn't belong. This prevents cargo-culting.
Better alternative: — what to do instead in those cases.
### Anti-pattern (when relevant)
Explicit DON'T: block showing the wrong approach with a comment
explaining why it's wrong, followed by DO: showing the fix.
Each topic file ALSO needs:
- Summary/Decision Tree at the end — "If X, use pattern A. If Y, use pattern B." Readers should be able to skip to the decision tree and find their situation.
- Cross-references — link to related patterns in other topic files. e.g., error-handling links to interfaces when discussing error types.
Quality bar: Each pattern entry should be 40–80 lines including code examples. A topic file with 10 patterns should be 500–900 lines. If entries are shorter than 40 lines, they're missing before/after examples or anti-patterns.
smells.md — anti-patterns found in the source:
- What it looks like (with real code)
- Why it exists (technical debt? deliberate tradeoff? historical?)
- What to do instead (with code showing the fix)
- How to detect it (grep pattern or linter rule)
Tone: Prescriptive. "Write it this way because X."
Key difference from conventions mode: Skip governance, team structure, TODO culture, and project history unless they directly inform HOW to write code. Focus on patterns a user should copy.
Done criteria: You've scanned every major directory in the source. No new patterns emerge from further grep/read. Each topic file has 10–15+ patterns, each with before/after examples, anti-patterns, and decision guidance. Total output for a language stdlib should be 5,000–10,000+ lines across all topic files.
End all output files with <!-- PATTERN_COMPLETE --> sentinel.
Cross-Ecosystem Observations
Always note when a pattern exists in multiple repos. These independent inventions reveal forces that transcend project context:
- Temporal goro.Handle (2021) ↔ CockroachDB stop.Handle (2025)
- Ecto zero TODOs (version-gated) ↔ Oban zero TODOs (2-week cleanup)
- Prometheus init() plugins ↔ Temporal init() plugins
The 4 Categories of Pattern Breaks
When you find convention violations, classify:
- Ship behavior, fix plumbing later — tagged with TODO same commit
- Better tooling exposed limitation — observability, not correctness
- Removal cost > carrying cost — zero-interest debt
- Context needs different pattern — not actually a break
See references/pattern-breaks.md for real examples with git history.
NEVER
- NEVER analyze with a shallow clone and assume full picture — archaeology requires full history
- NEVER present patterns from one file as repo-wide conventions — verify frequency across the codebase first
- NEVER skip PR discussions — code without context is just syntax; the discussion IS the insight
- NEVER report bare numbers ("738 TODOs") — always contextualize (per 1000 files, vs comparable projects, trending up/down)
- NEVER confuse "the maintainer likes X" with "X is the right pattern" — solo-maintained projects reflect one person's taste; team projects reflect negotiated conventions
- NEVER present a pattern as "unique" without checking if stdlib has it or if it's a well-known library pattern
- NEVER list patterns without when-NOT-to-use — that's where the expertise actually lives
- NEVER quote PR discussions without attribution — who said it matters (maintainer vs drive-by contributor)
- NEVER analyze repos <1000 commits — not enough history for meaningful archaeology
- NEVER conflate language patterns with project conventions —
go- patternsis stdlib idiom;temporal-conventionsis project choice
Output Repos
Push to GIT_REMOTE under:
- conventions mode:
GIT_ORG/<project>-conventions - patterns mode:
GIT_ORG/<language>-patterns
See references/commands.md for repo creation and push commands.
Fallbacks
- No PR discussions? Use commit messages as primary source. Many projects (Linux, PostgreSQL) do all review in commit messages and mailing lists.
- Repo too large to clone fully? Clone shallow first, do Phase
1-5, then
git fetch --unshallowonly if Phase 6-7 are needed. - Private repo / no forge API? Skip Phase 7. Phase 6 (local git history) still works.
- <3000 commits? Reduce Phase 6-7 expectations. Younger projects have less archaeology to mine — focus on Phase 5 (unique patterns) and the project's README/docs for rationale.
Execution Notes
- Clone on
CLONE_HOST— needs disk space for full git history gh apior equivalent for forge PR lookups (requires authentication)- One repo at a time for focused analysis
- Markdownlint all output before pushing