master
5 steps: Quantify → Extract one → Decision tree → Cross-refs → Hyperlinks. Delegation strategy (per-entry, not per-file). Discovery greps for Go, Elixir, Rust, Python. Hyperlink scripts per language.
codebase-analysis
An OpenClaw skill for analyzing open source repositories to extract architectural conventions, patterns, and the design decisions behind them.
What It Does
Given a repository URL, this skill produces two artifacts:
analysis.md— the full story: repo shape, import hierarchy, key patterns with code examples, PR discussion excerpts, cross-ecosystem comparisonsconventions.md— a reference of extracted patterns with when to use / when NOT to use each one
Output is pushed to a convention repo (<project>-conventions).
The Methodology
8 phases, each building on the previous:
| Phase | Focus | Time |
|---|---|---|
| 1. Shape | Clone, measure dimensions | 5 min |
| 2. Values | Most-imported packages reveal priorities | 10 min |
| 3. Interfaces | Key abstractions and extension points | 10 min |
| 4. Quality | TODOs, test ratios, discipline markers | 5 min |
| 5. Unique Patterns | Infrastructure not in stdlib | 15 min |
| 6. Git Archaeology | Trace WHY decisions were made | 20 min |
| 7. PR Discussions | Read the actual debates | 20 min |
| 8. Synthesis | Produce analysis + conventions docs | — |
Key Principles
- WHY > WHAT — a pattern listing is boring; the commit history and PR debate that explains why it exists is the real insight
- Cross-ecosystem comparison — note when multiple projects
independently invent the same solution (e.g., Temporal's
goro.Handlein 2021 ↔ CockroachDB'sstop.Handlein 2025) - Contextualize everything — "738 TODOs" means nothing without knowing the repo has 20K files and 700 contributors
- When NOT to use is where expertise lives — every pattern has a shadow; documenting it is the real value
Configuration
Set these in your workspace (TOOLS.md, AGENTS.md, or invocation context):
| Parameter | Required | Description |
|---|---|---|
CLONE_DIR |
Yes | Where to clone repos |
GIT_REMOTE |
Yes | Where convention repos are pushed |
CLONE_HOST |
No | Machine to clone on (default: localhost) |
GIT_ORG |
No | Org/user for repos (default: authenticated user) |
GIT_TOKEN_PATH |
No | Auth token path (default: git credential helper) |
Naming Convention
*-patterns— language-level idioms (how Go/Elixir wants you to write)*-conventions— project-specific (how a codebase chose to do it)
Proven On
This methodology was developed and validated across 5 repos:
- CockroachDB (845M, 117K commits) — stopper handles, error wrapping purity, stale TODOs
- Prometheus (39M, 15K commits) — slog migration, AppenderV2, global vars in hot paths
- Temporal (181M, 9K commits) — HSM framework, CHASM, soft assertions, effect buffers
- Ecto (3.8M, 12K commits) — zero TODOs, protocol extensibility, version-gated cleanup
- Oban (2.1M, 3K commits) — engine behaviour, inline testing, 2-week TODO cleanup
Installation
Drop the skill directory into your OpenClaw workspace:
skills/codebase-analysis/
├── SKILL.md
└── references/
├── commands.md
└── pattern-breaks.md
License
MIT
Languages
Markdown
100%