diff --git a/SKILL.md b/SKILL.md new file mode 100644 index 0000000..bf7a49a --- /dev/null +++ b/SKILL.md @@ -0,0 +1,193 @@ +--- +name: codebase-analysis +description: >- + Analyze open source repositories to extract architectural conventions, + patterns, and design decisions. Produces convention docs and pattern + files suitable for Gitea repos. Use when asked to "analyze a repo", + "extract patterns from", "what conventions does X use", "add X to the + analysis repos", "how does X do Y architecturally", or when adding a + new project to the conventions collection. Triggers on: codebase + analysis, repo analysis, extract conventions, architectural analysis, + project conventions, "analyze this repo", "what patterns does X use", + "how is X structured". Do NOT use for: code review of specific PRs + (use pr-review), security audits (use vuln-scout), or reading a + single file for a quick answer. +--- + +# Codebase Analysis + +Extract architectural conventions from open source repositories through +a structured multi-phase process. Output goes to Gitea convention repos. + +## Naming Convention + +- **Language patterns** (`go-patterns`, `elixir-patterns`): stdlib/language + idioms verified from source +- **Project conventions** (`-conventions`): how a specific codebase + does things — e.g., `temporal-conventions`, `cockroachdb-conventions` + +## Phases + +Execute in order. Each phase builds on the previous. + +### Phase 1: Clone & Shape + +Clone the repo on forge (`~/src/analysis/`). Measure: + +``` +- Total size, file count, commit count, contributor count +- Top-level directory structure +- Language breakdown (if polyglot) +``` + +### Phase 2: Import Hierarchy + +Find the most-depended-upon packages. This reveals what the codebase +considers foundational: + +```bash +# Go: grep imports, extract internal packages, count +grep -rh '"/' --include="*.go" | sed ... | sort | uniq -c | sort -rn + +# Elixir: grep aliases/imports +grep -rh "alias\|import\|use" --include="*.ex" | ... +``` + +The top 5-10 packages are where architectural decisions live. + +### Phase 3: Interface Contracts + +Find the key abstractions: + +```bash +# Go +grep -rn "type.*interface {" --include="*.go" | grep -v test | grep -v mock + +# Elixir +grep -rn "@callback\|@behaviour\|defprotocol" --include="*.ex" +``` + +### Phase 4: Error Handling & Quality Markers + +```bash +# Error style +grep -rh "fmt.Errorf.*%w" | wc -l # wrapping +grep -rh "errors.New\|errors.Wrap" | wc -l # sentinel/pkg + +# Quality markers +grep -rn "TODO\|FIXME\|HACK" --include="*.go" | grep -v test | wc -l +``` + +Note TODO format — does the project use `TODO(owner):`? Plain `TODO:`? +Version-gated TODOs? + +### Phase 5: Unique Patterns + +Look for project-specific infrastructure that's NOT in stdlib: +- Custom concurrency primitives (handles, schedulers, pools) +- Testing infrastructure (custom assertions, harnesses) +- Configuration systems (dynamic config, feature flags) +- Plugin/extension systems + +### Phase 6: Git Archaeology (requires full clone) + +This is where the real insight lives. For each interesting pattern: + +```bash +# When was it introduced? +git log --all --oneline --diff-filter=A -- path/to/pattern + +# Who wrote it and why? +git log --format="%an%n%ad%n%s%n%b" -1 + +# What did it replace? +git log -p -S "old_pattern_name" -- relevant/path +``` + +### Phase 7: PR Discussions + +Find the PR where a key pattern was introduced: + +```bash +gh api "search/issues?q=repo:/++type:pr" \ + --jq '.items[] | {number, title}' +``` + +Then read: +- PR body (the "why" from the author) +- Review comments (the debate) +- Issue comments (the resolution) + +Key questions to answer: +1. What problem motivated this pattern? +2. What alternatives were considered? +3. What did reviewers push back on? +4. How was it migrated in (big-bang vs gradual)? + +### Phase 8: Synthesis & Output + +Produce two artifacts: + +**1. `analysis.md`** — Full architectural analysis: +- Repository shape and import hierarchy +- Key patterns with code examples +- PR discussion summaries +- Cross-ecosystem comparisons +- Code quality metrics + +**2. `conventions.md`** — Extracted patterns with: +- Pattern name +- Location in source +- Code example +- When to use / when NOT to use +- Origin story (from PR discussion if available) + +## Output Repos + +Push to Gitea under `rodin/-conventions`: + +```bash +GITEA_TOKEN=$(cat ~/.openclaw/credentials/gitea-rodin) +# Create repo if needed: +curl -s -X POST "https://gitea.weiker.me/api/v1/user/repos" \ + -H "Authorization: token $GITEA_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"name": "-conventions", "auto_init": true, "default_branch": "master", "license": "MIT"}' +``` + +## Quality Criteria + +Before pushing, verify: +- Each pattern has a real code example from the repo +- "When to use / when NOT to use" sections exist +- PR discussion quotes are attributed +- Cross-ecosystem comparisons note prior art +- Files end with `` sentinel + +## Cross-Ecosystem Observations + +Always note when a pattern exists in multiple repos. Example: Temporal's +`goro.Handle` (2021) predates CockroachDB's `stop.Handle` (2025) — +same solution, invented independently. These connections are the most +valuable findings. + +## The 4 Categories of Pattern Breaks + +When you find impurity (pattern violations), classify them: + +1. **Ship behavior, fix plumbing later** — time pressure, author knows it's + wrong (tagged with TODO) +2. **Better tooling exposed the limitation** — old pattern worked but was + invisible to profiling/tracing +3. **Removal cost > carrying cost** — technically debt but zero interest rate +4. **Context needs different pattern** — not actually a break + +See `references/pattern-breaks.md` for detailed examples. + +## Execution Notes + +- Clone on **forge** (`host="node", node="forge"`) — has disk space +- Use full clones (not `--depth 1`) for archaeology +- `gh api` for GitHub PR lookups (authenticated on all nodes) +- One repo at a time for focused analysis +- Markdownlint all output before pushing diff --git a/references/pattern-breaks.md b/references/pattern-breaks.md new file mode 100644 index 0000000..9e74863 --- /dev/null +++ b/references/pattern-breaks.md @@ -0,0 +1,97 @@ +# Pattern Breaks: Why Conventions Are Violated + +Based on git archaeology across CockroachDB, Prometheus, Temporal, +Ecto, and Oban. Each category with real examples. + +--- + +## Category 1: Ship Behavior, Fix Plumbing Later + +**Example:** CockroachDB bare goroutine in kvadmission + +```go +// TODO(irfansharif): Use a stopper here instead. +go func() { + ticker := time.NewTicker(10 * time.Minute) + for { select { case <-ticker.C: ... } } +}() +``` + +**History:** Irfan Sharif, Aug 2022. Part of the elastic CPU limiter +(20+ file change). Complex new admission control system. The author +tagged it with TODO in the SAME commit. Ship the behavior, fix the +plumbing later. + +**Why it survives:** Nobody touched the function for any other reason. +The goroutine leaks nowhere (process-lifetime). The TODO is correct +but not urgent. + +**For review:** Worth noting, not worth blocking the PR. + +--- + +## Category 2: Better Tooling Exposed the Limitation + +**Example:** CockroachDB Handle replacing RunAsyncTask + +**History:** Feb 2025, Tobias Grieger. RunAsyncTask was *correct* but +made every goroutine look identical in execution traces. The Handle +pattern was motivated by profiling/observability, not correctness. + +**Key quote from PR:** "It becomes an ordeal to search for a specific +goroutine, since all goroutines started by the Stopper get assigned +the same searchable identity." + +**For review:** Track it. This is how progress happens. + +--- + +## Category 3: Removal Cost > Carrying Cost + +**Example:** CockroachDB stale "Remove in 22.2" TODO + +```go +// TODO(ajwerner): Remove in 22.2. +SystemConfigProvider config.SystemConfigProvider +``` + +**History:** Feb 2022. Bridge for mixed-version clusters during +gossip→span-config migration. A struct field that costs one pointer +of memory. Removing it means touching StoreConfig initialization +across dozens of call sites. + +**Why it survives:** Cost of removal >> cost of carrying. The +interest rate on this debt is zero. + +**For review:** Leave it alone. + +--- + +## Category 4: Context Needs Different Pattern + +**Example:** Prometheus global vars for scrape timestamp tolerance + +```go +var ScrapeTimestampTolerance = 2 * time.Millisecond +``` + +**History:** Started as "experimental, hidden flag" (Oct 2020). Made +public after production showed 10ms tolerance saves real disk space. + +**Why a global:** Set once at startup from a command-line flag. Read +thousands of times per second in the scrape loop. Threading a config +struct would add parameters to every hot-path function for a value +that never changes after init. + +**For review:** Approve it. This IS the right pattern for this context. + +--- + +## Summary Table + +| Category | Response in Review | Example | +|----------|-------------------|---------| +| Time pressure | Note, don't block | kvadmission goroutine | +| Better tooling | Track migration | Handle API | +| Zero-interest debt | Leave alone | Stale version TODOs | +| Different context | Approve | Hot-path globals |