feat: codebase-analysis skill
8-phase methodology for extracting architectural conventions from open source repositories: 1. Clone & Shape — measure repo dimensions 2. Import Hierarchy — find foundational packages 3. Interface Contracts — key abstractions 4. Error Handling & Quality Markers — TODOs, style 5. Unique Patterns — project-specific infrastructure 6. Git Archaeology — trace WHY decisions were made 7. PR Discussions — read the actual debates 8. Synthesis — produce analysis.md + conventions.md Includes pattern-breaks reference (4 categories of why conventions are violated, with real examples from CockroachDB, Prometheus, Temporal, Ecto, Oban). Output: Gitea repos named <project>-conventions.
This commit is contained in:
@@ -0,0 +1,193 @@
|
||||
---
|
||||
name: codebase-analysis
|
||||
description: >-
|
||||
Analyze open source repositories to extract architectural conventions,
|
||||
patterns, and design decisions. Produces convention docs and pattern
|
||||
files suitable for Gitea repos. Use when asked to "analyze a repo",
|
||||
"extract patterns from", "what conventions does X use", "add X to the
|
||||
analysis repos", "how does X do Y architecturally", or when adding a
|
||||
new project to the conventions collection. Triggers on: codebase
|
||||
analysis, repo analysis, extract conventions, architectural analysis,
|
||||
project conventions, "analyze this repo", "what patterns does X use",
|
||||
"how is X structured". Do NOT use for: code review of specific PRs
|
||||
(use pr-review), security audits (use vuln-scout), or reading a
|
||||
single file for a quick answer.
|
||||
---
|
||||
|
||||
# Codebase Analysis
|
||||
|
||||
Extract architectural conventions from open source repositories through
|
||||
a structured multi-phase process. Output goes to Gitea convention repos.
|
||||
|
||||
## Naming Convention
|
||||
|
||||
- **Language patterns** (`go-patterns`, `elixir-patterns`): stdlib/language
|
||||
idioms verified from source
|
||||
- **Project conventions** (`<project>-conventions`): how a specific codebase
|
||||
does things — e.g., `temporal-conventions`, `cockroachdb-conventions`
|
||||
|
||||
## Phases
|
||||
|
||||
Execute in order. Each phase builds on the previous.
|
||||
|
||||
### Phase 1: Clone & Shape
|
||||
|
||||
Clone the repo on forge (`~/src/analysis/<name>`). Measure:
|
||||
|
||||
```
|
||||
- Total size, file count, commit count, contributor count
|
||||
- Top-level directory structure
|
||||
- Language breakdown (if polyglot)
|
||||
```
|
||||
|
||||
### Phase 2: Import Hierarchy
|
||||
|
||||
Find the most-depended-upon packages. This reveals what the codebase
|
||||
considers foundational:
|
||||
|
||||
```bash
|
||||
# Go: grep imports, extract internal packages, count
|
||||
grep -rh '"<module>/' --include="*.go" | sed ... | sort | uniq -c | sort -rn
|
||||
|
||||
# Elixir: grep aliases/imports
|
||||
grep -rh "alias\|import\|use" --include="*.ex" | ...
|
||||
```
|
||||
|
||||
The top 5-10 packages are where architectural decisions live.
|
||||
|
||||
### Phase 3: Interface Contracts
|
||||
|
||||
Find the key abstractions:
|
||||
|
||||
```bash
|
||||
# Go
|
||||
grep -rn "type.*interface {" --include="*.go" | grep -v test | grep -v mock
|
||||
|
||||
# Elixir
|
||||
grep -rn "@callback\|@behaviour\|defprotocol" --include="*.ex"
|
||||
```
|
||||
|
||||
### Phase 4: Error Handling & Quality Markers
|
||||
|
||||
```bash
|
||||
# Error style
|
||||
grep -rh "fmt.Errorf.*%w" | wc -l # wrapping
|
||||
grep -rh "errors.New\|errors.Wrap" | wc -l # sentinel/pkg
|
||||
|
||||
# Quality markers
|
||||
grep -rn "TODO\|FIXME\|HACK" --include="*.go" | grep -v test | wc -l
|
||||
```
|
||||
|
||||
Note TODO format — does the project use `TODO(owner):`? Plain `TODO:`?
|
||||
Version-gated TODOs?
|
||||
|
||||
### Phase 5: Unique Patterns
|
||||
|
||||
Look for project-specific infrastructure that's NOT in stdlib:
|
||||
- Custom concurrency primitives (handles, schedulers, pools)
|
||||
- Testing infrastructure (custom assertions, harnesses)
|
||||
- Configuration systems (dynamic config, feature flags)
|
||||
- Plugin/extension systems
|
||||
|
||||
### Phase 6: Git Archaeology (requires full clone)
|
||||
|
||||
This is where the real insight lives. For each interesting pattern:
|
||||
|
||||
```bash
|
||||
# When was it introduced?
|
||||
git log --all --oneline --diff-filter=A -- path/to/pattern
|
||||
|
||||
# Who wrote it and why?
|
||||
git log --format="%an%n%ad%n%s%n%b" <commit> -1
|
||||
|
||||
# What did it replace?
|
||||
git log -p -S "old_pattern_name" -- relevant/path
|
||||
```
|
||||
|
||||
### Phase 7: PR Discussions
|
||||
|
||||
Find the PR where a key pattern was introduced:
|
||||
|
||||
```bash
|
||||
gh api "search/issues?q=repo:<org>/<repo>+<search>+type:pr" \
|
||||
--jq '.items[] | {number, title}'
|
||||
```
|
||||
|
||||
Then read:
|
||||
- PR body (the "why" from the author)
|
||||
- Review comments (the debate)
|
||||
- Issue comments (the resolution)
|
||||
|
||||
Key questions to answer:
|
||||
1. What problem motivated this pattern?
|
||||
2. What alternatives were considered?
|
||||
3. What did reviewers push back on?
|
||||
4. How was it migrated in (big-bang vs gradual)?
|
||||
|
||||
### Phase 8: Synthesis & Output
|
||||
|
||||
Produce two artifacts:
|
||||
|
||||
**1. `analysis.md`** — Full architectural analysis:
|
||||
- Repository shape and import hierarchy
|
||||
- Key patterns with code examples
|
||||
- PR discussion summaries
|
||||
- Cross-ecosystem comparisons
|
||||
- Code quality metrics
|
||||
|
||||
**2. `conventions.md`** — Extracted patterns with:
|
||||
- Pattern name
|
||||
- Location in source
|
||||
- Code example
|
||||
- When to use / when NOT to use
|
||||
- Origin story (from PR discussion if available)
|
||||
|
||||
## Output Repos
|
||||
|
||||
Push to Gitea under `rodin/<project>-conventions`:
|
||||
|
||||
```bash
|
||||
GITEA_TOKEN=$(cat ~/.openclaw/credentials/gitea-rodin)
|
||||
# Create repo if needed:
|
||||
curl -s -X POST "https://gitea.weiker.me/api/v1/user/repos" \
|
||||
-H "Authorization: token $GITEA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"name": "<project>-conventions", "auto_init": true, "default_branch": "master", "license": "MIT"}'
|
||||
```
|
||||
|
||||
## Quality Criteria
|
||||
|
||||
Before pushing, verify:
|
||||
- Each pattern has a real code example from the repo
|
||||
- "When to use / when NOT to use" sections exist
|
||||
- PR discussion quotes are attributed
|
||||
- Cross-ecosystem comparisons note prior art
|
||||
- Files end with `<!-- PATTERN_COMPLETE -->` sentinel
|
||||
|
||||
## Cross-Ecosystem Observations
|
||||
|
||||
Always note when a pattern exists in multiple repos. Example: Temporal's
|
||||
`goro.Handle` (2021) predates CockroachDB's `stop.Handle` (2025) —
|
||||
same solution, invented independently. These connections are the most
|
||||
valuable findings.
|
||||
|
||||
## The 4 Categories of Pattern Breaks
|
||||
|
||||
When you find impurity (pattern violations), classify them:
|
||||
|
||||
1. **Ship behavior, fix plumbing later** — time pressure, author knows it's
|
||||
wrong (tagged with TODO)
|
||||
2. **Better tooling exposed the limitation** — old pattern worked but was
|
||||
invisible to profiling/tracing
|
||||
3. **Removal cost > carrying cost** — technically debt but zero interest rate
|
||||
4. **Context needs different pattern** — not actually a break
|
||||
|
||||
See `references/pattern-breaks.md` for detailed examples.
|
||||
|
||||
## Execution Notes
|
||||
|
||||
- Clone on **forge** (`host="node", node="forge"`) — has disk space
|
||||
- Use full clones (not `--depth 1`) for archaeology
|
||||
- `gh api` for GitHub PR lookups (authenticated on all nodes)
|
||||
- One repo at a time for focused analysis
|
||||
- Markdownlint all output before pushing
|
||||
Reference in New Issue
Block a user