# codebase-analysis An [OpenClaw](https://github.com/openclaw/openclaw) skill for analyzing open source repositories to extract architectural conventions, patterns, and the design decisions behind them. ## What It Does Given a repository URL, this skill produces two artifacts: - **`analysis.md`** — the full story: repo shape, import hierarchy, key patterns with code examples, PR discussion excerpts, cross-ecosystem comparisons - **`conventions.md`** — a reference of extracted patterns with when to use / when NOT to use each one Output is pushed to a convention repo (`-conventions`). ## The Methodology 8 phases, each building on the previous: | Phase | Focus | Time | |-------|-------|------| | 1. Shape | Clone, measure dimensions | 5 min | | 2. Values | Most-imported packages reveal priorities | 10 min | | 3. Interfaces | Key abstractions and extension points | 10 min | | 4. Quality | TODOs, test ratios, discipline markers | 5 min | | 5. Unique Patterns | Infrastructure not in stdlib | 15 min | | 6. Git Archaeology | Trace WHY decisions were made | 20 min | | 7. PR Discussions | Read the actual debates | 20 min | | 8. Synthesis | Produce analysis + conventions docs | — | ## Key Principles - **WHY > WHAT** — a pattern listing is boring; the commit history and PR debate that explains *why* it exists is the real insight - **Cross-ecosystem comparison** — note when multiple projects independently invent the same solution (e.g., Temporal's `goro.Handle` in 2021 ↔ CockroachDB's `stop.Handle` in 2025) - **Contextualize everything** — "738 TODOs" means nothing without knowing the repo has 20K files and 700 contributors - **When NOT to use is where expertise lives** — every pattern has a shadow; documenting it is the real value ## Configuration Set these in your workspace (TOOLS.md, AGENTS.md, or invocation context): | Parameter | Required | Description | |-----------|----------|-------------| | `CLONE_DIR` | Yes | Where to clone repos | | `GIT_REMOTE` | Yes | Where convention repos are pushed | | `CLONE_HOST` | No | Machine to clone on (default: localhost) | | `GIT_ORG` | No | Org/user for repos (default: authenticated user) | | `GIT_TOKEN_PATH` | No | Auth token path (default: git credential helper) | ## Naming Convention - `*-patterns` — language-level idioms (how Go/Elixir wants you to write) - `*-conventions` — project-specific (how a codebase chose to do it) ## Proven On This methodology was developed and validated across 5 repos: - **CockroachDB** (845M, 117K commits) — stopper handles, error wrapping purity, stale TODOs - **Prometheus** (39M, 15K commits) — slog migration, AppenderV2, global vars in hot paths - **Temporal** (181M, 9K commits) — HSM framework, CHASM, soft assertions, effect buffers - **Ecto** (3.8M, 12K commits) — zero TODOs, protocol extensibility, version-gated cleanup - **Oban** (2.1M, 3K commits) — engine behaviour, inline testing, 2-week TODO cleanup ## Installation Drop the skill directory into your OpenClaw workspace: ``` skills/codebase-analysis/ ├── SKILL.md └── references/ ├── commands.md └── pattern-breaks.md ``` ## License MIT