feat: codebase-analysis skill
8-phase methodology for extracting architectural conventions from open source repositories: 1. Clone & Shape — measure repo dimensions 2. Import Hierarchy — find foundational packages 3. Interface Contracts — key abstractions 4. Error Handling & Quality Markers — TODOs, style 5. Unique Patterns — project-specific infrastructure 6. Git Archaeology — trace WHY decisions were made 7. PR Discussions — read the actual debates 8. Synthesis — produce analysis.md + conventions.md Includes pattern-breaks reference (4 categories of why conventions are violated, with real examples from CockroachDB, Prometheus, Temporal, Ecto, Oban). Output: Gitea repos named <project>-conventions.
This commit is contained in:
@@ -0,0 +1,193 @@
|
||||
---
|
||||
name: codebase-analysis
|
||||
description: >-
|
||||
Analyze open source repositories to extract architectural conventions,
|
||||
patterns, and design decisions. Produces convention docs and pattern
|
||||
files suitable for Gitea repos. Use when asked to "analyze a repo",
|
||||
"extract patterns from", "what conventions does X use", "add X to the
|
||||
analysis repos", "how does X do Y architecturally", or when adding a
|
||||
new project to the conventions collection. Triggers on: codebase
|
||||
analysis, repo analysis, extract conventions, architectural analysis,
|
||||
project conventions, "analyze this repo", "what patterns does X use",
|
||||
"how is X structured". Do NOT use for: code review of specific PRs
|
||||
(use pr-review), security audits (use vuln-scout), or reading a
|
||||
single file for a quick answer.
|
||||
---
|
||||
|
||||
# Codebase Analysis
|
||||
|
||||
Extract architectural conventions from open source repositories through
|
||||
a structured multi-phase process. Output goes to Gitea convention repos.
|
||||
|
||||
## Naming Convention
|
||||
|
||||
- **Language patterns** (`go-patterns`, `elixir-patterns`): stdlib/language
|
||||
idioms verified from source
|
||||
- **Project conventions** (`<project>-conventions`): how a specific codebase
|
||||
does things — e.g., `temporal-conventions`, `cockroachdb-conventions`
|
||||
|
||||
## Phases
|
||||
|
||||
Execute in order. Each phase builds on the previous.
|
||||
|
||||
### Phase 1: Clone & Shape
|
||||
|
||||
Clone the repo on forge (`~/src/analysis/<name>`). Measure:
|
||||
|
||||
```
|
||||
- Total size, file count, commit count, contributor count
|
||||
- Top-level directory structure
|
||||
- Language breakdown (if polyglot)
|
||||
```
|
||||
|
||||
### Phase 2: Import Hierarchy
|
||||
|
||||
Find the most-depended-upon packages. This reveals what the codebase
|
||||
considers foundational:
|
||||
|
||||
```bash
|
||||
# Go: grep imports, extract internal packages, count
|
||||
grep -rh '"<module>/' --include="*.go" | sed ... | sort | uniq -c | sort -rn
|
||||
|
||||
# Elixir: grep aliases/imports
|
||||
grep -rh "alias\|import\|use" --include="*.ex" | ...
|
||||
```
|
||||
|
||||
The top 5-10 packages are where architectural decisions live.
|
||||
|
||||
### Phase 3: Interface Contracts
|
||||
|
||||
Find the key abstractions:
|
||||
|
||||
```bash
|
||||
# Go
|
||||
grep -rn "type.*interface {" --include="*.go" | grep -v test | grep -v mock
|
||||
|
||||
# Elixir
|
||||
grep -rn "@callback\|@behaviour\|defprotocol" --include="*.ex"
|
||||
```
|
||||
|
||||
### Phase 4: Error Handling & Quality Markers
|
||||
|
||||
```bash
|
||||
# Error style
|
||||
grep -rh "fmt.Errorf.*%w" | wc -l # wrapping
|
||||
grep -rh "errors.New\|errors.Wrap" | wc -l # sentinel/pkg
|
||||
|
||||
# Quality markers
|
||||
grep -rn "TODO\|FIXME\|HACK" --include="*.go" | grep -v test | wc -l
|
||||
```
|
||||
|
||||
Note TODO format — does the project use `TODO(owner):`? Plain `TODO:`?
|
||||
Version-gated TODOs?
|
||||
|
||||
### Phase 5: Unique Patterns
|
||||
|
||||
Look for project-specific infrastructure that's NOT in stdlib:
|
||||
- Custom concurrency primitives (handles, schedulers, pools)
|
||||
- Testing infrastructure (custom assertions, harnesses)
|
||||
- Configuration systems (dynamic config, feature flags)
|
||||
- Plugin/extension systems
|
||||
|
||||
### Phase 6: Git Archaeology (requires full clone)
|
||||
|
||||
This is where the real insight lives. For each interesting pattern:
|
||||
|
||||
```bash
|
||||
# When was it introduced?
|
||||
git log --all --oneline --diff-filter=A -- path/to/pattern
|
||||
|
||||
# Who wrote it and why?
|
||||
git log --format="%an%n%ad%n%s%n%b" <commit> -1
|
||||
|
||||
# What did it replace?
|
||||
git log -p -S "old_pattern_name" -- relevant/path
|
||||
```
|
||||
|
||||
### Phase 7: PR Discussions
|
||||
|
||||
Find the PR where a key pattern was introduced:
|
||||
|
||||
```bash
|
||||
gh api "search/issues?q=repo:<org>/<repo>+<search>+type:pr" \
|
||||
--jq '.items[] | {number, title}'
|
||||
```
|
||||
|
||||
Then read:
|
||||
- PR body (the "why" from the author)
|
||||
- Review comments (the debate)
|
||||
- Issue comments (the resolution)
|
||||
|
||||
Key questions to answer:
|
||||
1. What problem motivated this pattern?
|
||||
2. What alternatives were considered?
|
||||
3. What did reviewers push back on?
|
||||
4. How was it migrated in (big-bang vs gradual)?
|
||||
|
||||
### Phase 8: Synthesis & Output
|
||||
|
||||
Produce two artifacts:
|
||||
|
||||
**1. `analysis.md`** — Full architectural analysis:
|
||||
- Repository shape and import hierarchy
|
||||
- Key patterns with code examples
|
||||
- PR discussion summaries
|
||||
- Cross-ecosystem comparisons
|
||||
- Code quality metrics
|
||||
|
||||
**2. `conventions.md`** — Extracted patterns with:
|
||||
- Pattern name
|
||||
- Location in source
|
||||
- Code example
|
||||
- When to use / when NOT to use
|
||||
- Origin story (from PR discussion if available)
|
||||
|
||||
## Output Repos
|
||||
|
||||
Push to Gitea under `rodin/<project>-conventions`:
|
||||
|
||||
```bash
|
||||
GITEA_TOKEN=$(cat ~/.openclaw/credentials/gitea-rodin)
|
||||
# Create repo if needed:
|
||||
curl -s -X POST "https://gitea.weiker.me/api/v1/user/repos" \
|
||||
-H "Authorization: token $GITEA_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"name": "<project>-conventions", "auto_init": true, "default_branch": "master", "license": "MIT"}'
|
||||
```
|
||||
|
||||
## Quality Criteria
|
||||
|
||||
Before pushing, verify:
|
||||
- Each pattern has a real code example from the repo
|
||||
- "When to use / when NOT to use" sections exist
|
||||
- PR discussion quotes are attributed
|
||||
- Cross-ecosystem comparisons note prior art
|
||||
- Files end with `<!-- PATTERN_COMPLETE -->` sentinel
|
||||
|
||||
## Cross-Ecosystem Observations
|
||||
|
||||
Always note when a pattern exists in multiple repos. Example: Temporal's
|
||||
`goro.Handle` (2021) predates CockroachDB's `stop.Handle` (2025) —
|
||||
same solution, invented independently. These connections are the most
|
||||
valuable findings.
|
||||
|
||||
## The 4 Categories of Pattern Breaks
|
||||
|
||||
When you find impurity (pattern violations), classify them:
|
||||
|
||||
1. **Ship behavior, fix plumbing later** — time pressure, author knows it's
|
||||
wrong (tagged with TODO)
|
||||
2. **Better tooling exposed the limitation** — old pattern worked but was
|
||||
invisible to profiling/tracing
|
||||
3. **Removal cost > carrying cost** — technically debt but zero interest rate
|
||||
4. **Context needs different pattern** — not actually a break
|
||||
|
||||
See `references/pattern-breaks.md` for detailed examples.
|
||||
|
||||
## Execution Notes
|
||||
|
||||
- Clone on **forge** (`host="node", node="forge"`) — has disk space
|
||||
- Use full clones (not `--depth 1`) for archaeology
|
||||
- `gh api` for GitHub PR lookups (authenticated on all nodes)
|
||||
- One repo at a time for focused analysis
|
||||
- Markdownlint all output before pushing
|
||||
@@ -0,0 +1,97 @@
|
||||
# Pattern Breaks: Why Conventions Are Violated
|
||||
|
||||
Based on git archaeology across CockroachDB, Prometheus, Temporal,
|
||||
Ecto, and Oban. Each category with real examples.
|
||||
|
||||
---
|
||||
|
||||
## Category 1: Ship Behavior, Fix Plumbing Later
|
||||
|
||||
**Example:** CockroachDB bare goroutine in kvadmission
|
||||
|
||||
```go
|
||||
// TODO(irfansharif): Use a stopper here instead.
|
||||
go func() {
|
||||
ticker := time.NewTicker(10 * time.Minute)
|
||||
for { select { case <-ticker.C: ... } }
|
||||
}()
|
||||
```
|
||||
|
||||
**History:** Irfan Sharif, Aug 2022. Part of the elastic CPU limiter
|
||||
(20+ file change). Complex new admission control system. The author
|
||||
tagged it with TODO in the SAME commit. Ship the behavior, fix the
|
||||
plumbing later.
|
||||
|
||||
**Why it survives:** Nobody touched the function for any other reason.
|
||||
The goroutine leaks nowhere (process-lifetime). The TODO is correct
|
||||
but not urgent.
|
||||
|
||||
**For review:** Worth noting, not worth blocking the PR.
|
||||
|
||||
---
|
||||
|
||||
## Category 2: Better Tooling Exposed the Limitation
|
||||
|
||||
**Example:** CockroachDB Handle replacing RunAsyncTask
|
||||
|
||||
**History:** Feb 2025, Tobias Grieger. RunAsyncTask was *correct* but
|
||||
made every goroutine look identical in execution traces. The Handle
|
||||
pattern was motivated by profiling/observability, not correctness.
|
||||
|
||||
**Key quote from PR:** "It becomes an ordeal to search for a specific
|
||||
goroutine, since all goroutines started by the Stopper get assigned
|
||||
the same searchable identity."
|
||||
|
||||
**For review:** Track it. This is how progress happens.
|
||||
|
||||
---
|
||||
|
||||
## Category 3: Removal Cost > Carrying Cost
|
||||
|
||||
**Example:** CockroachDB stale "Remove in 22.2" TODO
|
||||
|
||||
```go
|
||||
// TODO(ajwerner): Remove in 22.2.
|
||||
SystemConfigProvider config.SystemConfigProvider
|
||||
```
|
||||
|
||||
**History:** Feb 2022. Bridge for mixed-version clusters during
|
||||
gossip→span-config migration. A struct field that costs one pointer
|
||||
of memory. Removing it means touching StoreConfig initialization
|
||||
across dozens of call sites.
|
||||
|
||||
**Why it survives:** Cost of removal >> cost of carrying. The
|
||||
interest rate on this debt is zero.
|
||||
|
||||
**For review:** Leave it alone.
|
||||
|
||||
---
|
||||
|
||||
## Category 4: Context Needs Different Pattern
|
||||
|
||||
**Example:** Prometheus global vars for scrape timestamp tolerance
|
||||
|
||||
```go
|
||||
var ScrapeTimestampTolerance = 2 * time.Millisecond
|
||||
```
|
||||
|
||||
**History:** Started as "experimental, hidden flag" (Oct 2020). Made
|
||||
public after production showed 10ms tolerance saves real disk space.
|
||||
|
||||
**Why a global:** Set once at startup from a command-line flag. Read
|
||||
thousands of times per second in the scrape loop. Threading a config
|
||||
struct would add parameters to every hot-path function for a value
|
||||
that never changes after init.
|
||||
|
||||
**For review:** Approve it. This IS the right pattern for this context.
|
||||
|
||||
---
|
||||
|
||||
## Summary Table
|
||||
|
||||
| Category | Response in Review | Example |
|
||||
|----------|-------------------|---------|
|
||||
| Time pressure | Note, don't block | kvadmission goroutine |
|
||||
| Better tooling | Track migration | Handle API |
|
||||
| Zero-interest debt | Leave alone | Stale version TODOs |
|
||||
| Different context | Approve | Hot-path globals |
|
||||
Reference in New Issue
Block a user