Files
codebase-analysis/SKILL.md
T
Rodin 38a9d66072 feat: codebase-analysis skill
8-phase methodology for extracting architectural conventions from
open source repositories:

1. Clone & Shape — measure repo dimensions
2. Import Hierarchy — find foundational packages
3. Interface Contracts — key abstractions
4. Error Handling & Quality Markers — TODOs, style
5. Unique Patterns — project-specific infrastructure
6. Git Archaeology — trace WHY decisions were made
7. PR Discussions — read the actual debates
8. Synthesis — produce analysis.md + conventions.md

Includes pattern-breaks reference (4 categories of why
conventions are violated, with real examples from CockroachDB,
Prometheus, Temporal, Ecto, Oban).

Output: Gitea repos named <project>-conventions.
2026-04-30 11:49:40 -07:00

5.8 KiB

name, description
name description
codebase-analysis Analyze open source repositories to extract architectural conventions, patterns, and design decisions. Produces convention docs and pattern files suitable for Gitea repos. Use when asked to "analyze a repo", "extract patterns from", "what conventions does X use", "add X to the analysis repos", "how does X do Y architecturally", or when adding a new project to the conventions collection. Triggers on: codebase analysis, repo analysis, extract conventions, architectural analysis, project conventions, "analyze this repo", "what patterns does X use", "how is X structured". Do NOT use for: code review of specific PRs (use pr-review), security audits (use vuln-scout), or reading a single file for a quick answer.

Codebase Analysis

Extract architectural conventions from open source repositories through a structured multi-phase process. Output goes to Gitea convention repos.

Naming Convention

  • Language patterns (go-patterns, elixir-patterns): stdlib/language idioms verified from source
  • Project conventions (<project>-conventions): how a specific codebase does things — e.g., temporal-conventions, cockroachdb-conventions

Phases

Execute in order. Each phase builds on the previous.

Phase 1: Clone & Shape

Clone the repo on forge (~/src/analysis/<name>). Measure:

- Total size, file count, commit count, contributor count
- Top-level directory structure
- Language breakdown (if polyglot)

Phase 2: Import Hierarchy

Find the most-depended-upon packages. This reveals what the codebase considers foundational:

# Go: grep imports, extract internal packages, count
grep -rh '"<module>/' --include="*.go" | sed ... | sort | uniq -c | sort -rn

# Elixir: grep aliases/imports
grep -rh "alias\|import\|use" --include="*.ex" | ...

The top 5-10 packages are where architectural decisions live.

Phase 3: Interface Contracts

Find the key abstractions:

# Go
grep -rn "type.*interface {" --include="*.go" | grep -v test | grep -v mock

# Elixir
grep -rn "@callback\|@behaviour\|defprotocol" --include="*.ex"

Phase 4: Error Handling & Quality Markers

# Error style
grep -rh "fmt.Errorf.*%w" | wc -l        # wrapping
grep -rh "errors.New\|errors.Wrap" | wc -l # sentinel/pkg

# Quality markers
grep -rn "TODO\|FIXME\|HACK" --include="*.go" | grep -v test | wc -l

Note TODO format — does the project use TODO(owner):? Plain TODO:? Version-gated TODOs?

Phase 5: Unique Patterns

Look for project-specific infrastructure that's NOT in stdlib:

  • Custom concurrency primitives (handles, schedulers, pools)
  • Testing infrastructure (custom assertions, harnesses)
  • Configuration systems (dynamic config, feature flags)
  • Plugin/extension systems

Phase 6: Git Archaeology (requires full clone)

This is where the real insight lives. For each interesting pattern:

# When was it introduced?
git log --all --oneline --diff-filter=A -- path/to/pattern

# Who wrote it and why?
git log --format="%an%n%ad%n%s%n%b" <commit> -1

# What did it replace?
git log -p -S "old_pattern_name" -- relevant/path

Phase 7: PR Discussions

Find the PR where a key pattern was introduced:

gh api "search/issues?q=repo:<org>/<repo>+<search>+type:pr" \
  --jq '.items[] | {number, title}'

Then read:

  • PR body (the "why" from the author)
  • Review comments (the debate)
  • Issue comments (the resolution)

Key questions to answer:

  1. What problem motivated this pattern?
  2. What alternatives were considered?
  3. What did reviewers push back on?
  4. How was it migrated in (big-bang vs gradual)?

Phase 8: Synthesis & Output

Produce two artifacts:

1. analysis.md — Full architectural analysis:

  • Repository shape and import hierarchy
  • Key patterns with code examples
  • PR discussion summaries
  • Cross-ecosystem comparisons
  • Code quality metrics

2. conventions.md — Extracted patterns with:

  • Pattern name
  • Location in source
  • Code example
  • When to use / when NOT to use
  • Origin story (from PR discussion if available)

Output Repos

Push to Gitea under rodin/<project>-conventions:

GITEA_TOKEN=$(cat ~/.openclaw/credentials/gitea-rodin)
# Create repo if needed:
curl -s -X POST "https://gitea.weiker.me/api/v1/user/repos" \
  -H "Authorization: token $GITEA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "<project>-conventions", "auto_init": true, "default_branch": "master", "license": "MIT"}'

Quality Criteria

Before pushing, verify:

  • Each pattern has a real code example from the repo
  • "When to use / when NOT to use" sections exist
  • PR discussion quotes are attributed
  • Cross-ecosystem comparisons note prior art
  • Files end with <!-- PATTERN_COMPLETE --> sentinel

Cross-Ecosystem Observations

Always note when a pattern exists in multiple repos. Example: Temporal's goro.Handle (2021) predates CockroachDB's stop.Handle (2025) — same solution, invented independently. These connections are the most valuable findings.

The 4 Categories of Pattern Breaks

When you find impurity (pattern violations), classify them:

  1. Ship behavior, fix plumbing later — time pressure, author knows it's wrong (tagged with TODO)
  2. Better tooling exposed the limitation — old pattern worked but was invisible to profiling/tracing
  3. Removal cost > carrying cost — technically debt but zero interest rate
  4. Context needs different pattern — not actually a break

See references/pattern-breaks.md for detailed examples.

Execution Notes

  • Clone on forge (host="node", node="forge") — has disk space
  • Use full clones (not --depth 1) for archaeology
  • gh api for GitHub PR lookups (authenticated on all nodes)
  • One repo at a time for focused analysis
  • Markdownlint all output before pushing