feat: codebase-analysis skill
8-phase methodology for extracting architectural conventions from open source repositories: 1. Clone & Shape — measure repo dimensions 2. Import Hierarchy — find foundational packages 3. Interface Contracts — key abstractions 4. Error Handling & Quality Markers — TODOs, style 5. Unique Patterns — project-specific infrastructure 6. Git Archaeology — trace WHY decisions were made 7. PR Discussions — read the actual debates 8. Synthesis — produce analysis.md + conventions.md Includes pattern-breaks reference (4 categories of why conventions are violated, with real examples from CockroachDB, Prometheus, Temporal, Ecto, Oban). Output: Gitea repos named <project>-conventions.
This commit is contained in:
@@ -0,0 +1,193 @@
|
|||||||
|
---
|
||||||
|
name: codebase-analysis
|
||||||
|
description: >-
|
||||||
|
Analyze open source repositories to extract architectural conventions,
|
||||||
|
patterns, and design decisions. Produces convention docs and pattern
|
||||||
|
files suitable for Gitea repos. Use when asked to "analyze a repo",
|
||||||
|
"extract patterns from", "what conventions does X use", "add X to the
|
||||||
|
analysis repos", "how does X do Y architecturally", or when adding a
|
||||||
|
new project to the conventions collection. Triggers on: codebase
|
||||||
|
analysis, repo analysis, extract conventions, architectural analysis,
|
||||||
|
project conventions, "analyze this repo", "what patterns does X use",
|
||||||
|
"how is X structured". Do NOT use for: code review of specific PRs
|
||||||
|
(use pr-review), security audits (use vuln-scout), or reading a
|
||||||
|
single file for a quick answer.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Codebase Analysis
|
||||||
|
|
||||||
|
Extract architectural conventions from open source repositories through
|
||||||
|
a structured multi-phase process. Output goes to Gitea convention repos.
|
||||||
|
|
||||||
|
## Naming Convention
|
||||||
|
|
||||||
|
- **Language patterns** (`go-patterns`, `elixir-patterns`): stdlib/language
|
||||||
|
idioms verified from source
|
||||||
|
- **Project conventions** (`<project>-conventions`): how a specific codebase
|
||||||
|
does things — e.g., `temporal-conventions`, `cockroachdb-conventions`
|
||||||
|
|
||||||
|
## Phases
|
||||||
|
|
||||||
|
Execute in order. Each phase builds on the previous.
|
||||||
|
|
||||||
|
### Phase 1: Clone & Shape
|
||||||
|
|
||||||
|
Clone the repo on forge (`~/src/analysis/<name>`). Measure:
|
||||||
|
|
||||||
|
```
|
||||||
|
- Total size, file count, commit count, contributor count
|
||||||
|
- Top-level directory structure
|
||||||
|
- Language breakdown (if polyglot)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 2: Import Hierarchy
|
||||||
|
|
||||||
|
Find the most-depended-upon packages. This reveals what the codebase
|
||||||
|
considers foundational:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Go: grep imports, extract internal packages, count
|
||||||
|
grep -rh '"<module>/' --include="*.go" | sed ... | sort | uniq -c | sort -rn
|
||||||
|
|
||||||
|
# Elixir: grep aliases/imports
|
||||||
|
grep -rh "alias\|import\|use" --include="*.ex" | ...
|
||||||
|
```
|
||||||
|
|
||||||
|
The top 5-10 packages are where architectural decisions live.
|
||||||
|
|
||||||
|
### Phase 3: Interface Contracts
|
||||||
|
|
||||||
|
Find the key abstractions:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Go
|
||||||
|
grep -rn "type.*interface {" --include="*.go" | grep -v test | grep -v mock
|
||||||
|
|
||||||
|
# Elixir
|
||||||
|
grep -rn "@callback\|@behaviour\|defprotocol" --include="*.ex"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 4: Error Handling & Quality Markers
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Error style
|
||||||
|
grep -rh "fmt.Errorf.*%w" | wc -l # wrapping
|
||||||
|
grep -rh "errors.New\|errors.Wrap" | wc -l # sentinel/pkg
|
||||||
|
|
||||||
|
# Quality markers
|
||||||
|
grep -rn "TODO\|FIXME\|HACK" --include="*.go" | grep -v test | wc -l
|
||||||
|
```
|
||||||
|
|
||||||
|
Note TODO format — does the project use `TODO(owner):`? Plain `TODO:`?
|
||||||
|
Version-gated TODOs?
|
||||||
|
|
||||||
|
### Phase 5: Unique Patterns
|
||||||
|
|
||||||
|
Look for project-specific infrastructure that's NOT in stdlib:
|
||||||
|
- Custom concurrency primitives (handles, schedulers, pools)
|
||||||
|
- Testing infrastructure (custom assertions, harnesses)
|
||||||
|
- Configuration systems (dynamic config, feature flags)
|
||||||
|
- Plugin/extension systems
|
||||||
|
|
||||||
|
### Phase 6: Git Archaeology (requires full clone)
|
||||||
|
|
||||||
|
This is where the real insight lives. For each interesting pattern:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# When was it introduced?
|
||||||
|
git log --all --oneline --diff-filter=A -- path/to/pattern
|
||||||
|
|
||||||
|
# Who wrote it and why?
|
||||||
|
git log --format="%an%n%ad%n%s%n%b" <commit> -1
|
||||||
|
|
||||||
|
# What did it replace?
|
||||||
|
git log -p -S "old_pattern_name" -- relevant/path
|
||||||
|
```
|
||||||
|
|
||||||
|
### Phase 7: PR Discussions
|
||||||
|
|
||||||
|
Find the PR where a key pattern was introduced:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gh api "search/issues?q=repo:<org>/<repo>+<search>+type:pr" \
|
||||||
|
--jq '.items[] | {number, title}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Then read:
|
||||||
|
- PR body (the "why" from the author)
|
||||||
|
- Review comments (the debate)
|
||||||
|
- Issue comments (the resolution)
|
||||||
|
|
||||||
|
Key questions to answer:
|
||||||
|
1. What problem motivated this pattern?
|
||||||
|
2. What alternatives were considered?
|
||||||
|
3. What did reviewers push back on?
|
||||||
|
4. How was it migrated in (big-bang vs gradual)?
|
||||||
|
|
||||||
|
### Phase 8: Synthesis & Output
|
||||||
|
|
||||||
|
Produce two artifacts:
|
||||||
|
|
||||||
|
**1. `analysis.md`** — Full architectural analysis:
|
||||||
|
- Repository shape and import hierarchy
|
||||||
|
- Key patterns with code examples
|
||||||
|
- PR discussion summaries
|
||||||
|
- Cross-ecosystem comparisons
|
||||||
|
- Code quality metrics
|
||||||
|
|
||||||
|
**2. `conventions.md`** — Extracted patterns with:
|
||||||
|
- Pattern name
|
||||||
|
- Location in source
|
||||||
|
- Code example
|
||||||
|
- When to use / when NOT to use
|
||||||
|
- Origin story (from PR discussion if available)
|
||||||
|
|
||||||
|
## Output Repos
|
||||||
|
|
||||||
|
Push to Gitea under `rodin/<project>-conventions`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
GITEA_TOKEN=$(cat ~/.openclaw/credentials/gitea-rodin)
|
||||||
|
# Create repo if needed:
|
||||||
|
curl -s -X POST "https://gitea.weiker.me/api/v1/user/repos" \
|
||||||
|
-H "Authorization: token $GITEA_TOKEN" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"name": "<project>-conventions", "auto_init": true, "default_branch": "master", "license": "MIT"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quality Criteria
|
||||||
|
|
||||||
|
Before pushing, verify:
|
||||||
|
- Each pattern has a real code example from the repo
|
||||||
|
- "When to use / when NOT to use" sections exist
|
||||||
|
- PR discussion quotes are attributed
|
||||||
|
- Cross-ecosystem comparisons note prior art
|
||||||
|
- Files end with `<!-- PATTERN_COMPLETE -->` sentinel
|
||||||
|
|
||||||
|
## Cross-Ecosystem Observations
|
||||||
|
|
||||||
|
Always note when a pattern exists in multiple repos. Example: Temporal's
|
||||||
|
`goro.Handle` (2021) predates CockroachDB's `stop.Handle` (2025) —
|
||||||
|
same solution, invented independently. These connections are the most
|
||||||
|
valuable findings.
|
||||||
|
|
||||||
|
## The 4 Categories of Pattern Breaks
|
||||||
|
|
||||||
|
When you find impurity (pattern violations), classify them:
|
||||||
|
|
||||||
|
1. **Ship behavior, fix plumbing later** — time pressure, author knows it's
|
||||||
|
wrong (tagged with TODO)
|
||||||
|
2. **Better tooling exposed the limitation** — old pattern worked but was
|
||||||
|
invisible to profiling/tracing
|
||||||
|
3. **Removal cost > carrying cost** — technically debt but zero interest rate
|
||||||
|
4. **Context needs different pattern** — not actually a break
|
||||||
|
|
||||||
|
See `references/pattern-breaks.md` for detailed examples.
|
||||||
|
|
||||||
|
## Execution Notes
|
||||||
|
|
||||||
|
- Clone on **forge** (`host="node", node="forge"`) — has disk space
|
||||||
|
- Use full clones (not `--depth 1`) for archaeology
|
||||||
|
- `gh api` for GitHub PR lookups (authenticated on all nodes)
|
||||||
|
- One repo at a time for focused analysis
|
||||||
|
- Markdownlint all output before pushing
|
||||||
@@ -0,0 +1,97 @@
|
|||||||
|
# Pattern Breaks: Why Conventions Are Violated
|
||||||
|
|
||||||
|
Based on git archaeology across CockroachDB, Prometheus, Temporal,
|
||||||
|
Ecto, and Oban. Each category with real examples.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Category 1: Ship Behavior, Fix Plumbing Later
|
||||||
|
|
||||||
|
**Example:** CockroachDB bare goroutine in kvadmission
|
||||||
|
|
||||||
|
```go
|
||||||
|
// TODO(irfansharif): Use a stopper here instead.
|
||||||
|
go func() {
|
||||||
|
ticker := time.NewTicker(10 * time.Minute)
|
||||||
|
for { select { case <-ticker.C: ... } }
|
||||||
|
}()
|
||||||
|
```
|
||||||
|
|
||||||
|
**History:** Irfan Sharif, Aug 2022. Part of the elastic CPU limiter
|
||||||
|
(20+ file change). Complex new admission control system. The author
|
||||||
|
tagged it with TODO in the SAME commit. Ship the behavior, fix the
|
||||||
|
plumbing later.
|
||||||
|
|
||||||
|
**Why it survives:** Nobody touched the function for any other reason.
|
||||||
|
The goroutine leaks nowhere (process-lifetime). The TODO is correct
|
||||||
|
but not urgent.
|
||||||
|
|
||||||
|
**For review:** Worth noting, not worth blocking the PR.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Category 2: Better Tooling Exposed the Limitation
|
||||||
|
|
||||||
|
**Example:** CockroachDB Handle replacing RunAsyncTask
|
||||||
|
|
||||||
|
**History:** Feb 2025, Tobias Grieger. RunAsyncTask was *correct* but
|
||||||
|
made every goroutine look identical in execution traces. The Handle
|
||||||
|
pattern was motivated by profiling/observability, not correctness.
|
||||||
|
|
||||||
|
**Key quote from PR:** "It becomes an ordeal to search for a specific
|
||||||
|
goroutine, since all goroutines started by the Stopper get assigned
|
||||||
|
the same searchable identity."
|
||||||
|
|
||||||
|
**For review:** Track it. This is how progress happens.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Category 3: Removal Cost > Carrying Cost
|
||||||
|
|
||||||
|
**Example:** CockroachDB stale "Remove in 22.2" TODO
|
||||||
|
|
||||||
|
```go
|
||||||
|
// TODO(ajwerner): Remove in 22.2.
|
||||||
|
SystemConfigProvider config.SystemConfigProvider
|
||||||
|
```
|
||||||
|
|
||||||
|
**History:** Feb 2022. Bridge for mixed-version clusters during
|
||||||
|
gossip→span-config migration. A struct field that costs one pointer
|
||||||
|
of memory. Removing it means touching StoreConfig initialization
|
||||||
|
across dozens of call sites.
|
||||||
|
|
||||||
|
**Why it survives:** Cost of removal >> cost of carrying. The
|
||||||
|
interest rate on this debt is zero.
|
||||||
|
|
||||||
|
**For review:** Leave it alone.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Category 4: Context Needs Different Pattern
|
||||||
|
|
||||||
|
**Example:** Prometheus global vars for scrape timestamp tolerance
|
||||||
|
|
||||||
|
```go
|
||||||
|
var ScrapeTimestampTolerance = 2 * time.Millisecond
|
||||||
|
```
|
||||||
|
|
||||||
|
**History:** Started as "experimental, hidden flag" (Oct 2020). Made
|
||||||
|
public after production showed 10ms tolerance saves real disk space.
|
||||||
|
|
||||||
|
**Why a global:** Set once at startup from a command-line flag. Read
|
||||||
|
thousands of times per second in the scrape loop. Threading a config
|
||||||
|
struct would add parameters to every hot-path function for a value
|
||||||
|
that never changes after init.
|
||||||
|
|
||||||
|
**For review:** Approve it. This IS the right pattern for this context.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Table
|
||||||
|
|
||||||
|
| Category | Response in Review | Example |
|
||||||
|
|----------|-------------------|---------|
|
||||||
|
| Time pressure | Note, don't block | kvadmission goroutine |
|
||||||
|
| Better tooling | Track migration | Handle API |
|
||||||
|
| Zero-interest debt | Leave alone | Stale version TODOs |
|
||||||
|
| Different context | Approve | Hot-path globals |
|
||||||
Reference in New Issue
Block a user