diff --git a/README.md b/README.md index 7124996..bb9a61d 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,93 @@ # codebase-analysis -Skill for analyzing open source repositories to extract architectural conventions and patterns. 8-phase methodology: clone, imports, interfaces, quality markers, unique patterns, git archaeology, PR discussions, synthesis. \ No newline at end of file +An [OpenClaw](https://github.com/openclaw/openclaw) skill for analyzing +open source repositories to extract architectural conventions, patterns, +and the design decisions behind them. + +## What It Does + +Given a repository URL, this skill produces two artifacts: + +- **`analysis.md`** — the full story: repo shape, import hierarchy, key + patterns with code examples, PR discussion excerpts, cross-ecosystem + comparisons +- **`conventions.md`** — a reference of extracted patterns with when to + use / when NOT to use each one + +Output is pushed to a convention repo (`-conventions`). + +## The Methodology + +8 phases, each building on the previous: + +| Phase | Focus | Time | +|-------|-------|------| +| 1. Shape | Clone, measure dimensions | 5 min | +| 2. Values | Most-imported packages reveal priorities | 10 min | +| 3. Interfaces | Key abstractions and extension points | 10 min | +| 4. Quality | TODOs, test ratios, discipline markers | 5 min | +| 5. Unique Patterns | Infrastructure not in stdlib | 15 min | +| 6. Git Archaeology | Trace WHY decisions were made | 20 min | +| 7. PR Discussions | Read the actual debates | 20 min | +| 8. Synthesis | Produce analysis + conventions docs | — | + +## Key Principles + +- **WHY > WHAT** — a pattern listing is boring; the commit history and + PR debate that explains *why* it exists is the real insight +- **Cross-ecosystem comparison** — note when multiple projects + independently invent the same solution (e.g., Temporal's + `goro.Handle` in 2021 ↔ CockroachDB's `stop.Handle` in 2025) +- **Contextualize everything** — "738 TODOs" means nothing without + knowing the repo has 20K files and 700 contributors +- **When NOT to use is where expertise lives** — every pattern has a + shadow; documenting it is the real value + +## Configuration + +Set these in your workspace (TOOLS.md, AGENTS.md, or invocation +context): + +| Parameter | Required | Description | +|-----------|----------|-------------| +| `CLONE_DIR` | Yes | Where to clone repos | +| `GIT_REMOTE` | Yes | Where convention repos are pushed | +| `CLONE_HOST` | No | Machine to clone on (default: localhost) | +| `GIT_ORG` | No | Org/user for repos (default: authenticated user) | +| `GIT_TOKEN_PATH` | No | Auth token path (default: git credential helper) | + +## Naming Convention + +- `*-patterns` — language-level idioms (how Go/Elixir wants you to write) +- `*-conventions` — project-specific (how a codebase chose to do it) + +## Proven On + +This methodology was developed and validated across 5 repos: + +- **CockroachDB** (845M, 117K commits) — stopper handles, error + wrapping purity, stale TODOs +- **Prometheus** (39M, 15K commits) — slog migration, AppenderV2, + global vars in hot paths +- **Temporal** (181M, 9K commits) — HSM framework, CHASM, soft + assertions, effect buffers +- **Ecto** (3.8M, 12K commits) — zero TODOs, protocol extensibility, + version-gated cleanup +- **Oban** (2.1M, 3K commits) — engine behaviour, inline testing, + 2-week TODO cleanup + +## Installation + +Drop the skill directory into your OpenClaw workspace: + +``` +skills/codebase-analysis/ +├── SKILL.md +└── references/ + ├── commands.md + └── pattern-breaks.md +``` + +## License + +MIT