--- name: codebase-analysis description: >- Analyze open source repositories to extract architectural conventions, patterns, and design decisions. Produces convention docs and pattern files suitable for Gitea repos. Use when asked to "analyze a repo", "extract patterns from", "what conventions does X use", "add X to the analysis repos", "how does X do Y architecturally", or when adding a new project to the conventions collection. Triggers on: codebase analysis, repo analysis, extract conventions, architectural analysis, project conventions, "analyze this repo", "what patterns does X use", "how is X structured". Do NOT use for: code review of specific PRs (use pr-review), security audits (use vuln-scout), or reading a single file for a quick answer. --- # Codebase Analysis Extract architectural conventions from open source repositories through a structured multi-phase process. Output goes to Gitea convention repos. ## Naming Convention - **Language patterns** (`go-patterns`, `elixir-patterns`): stdlib/language idioms verified from source - **Project conventions** (`-conventions`): how a specific codebase does things — e.g., `temporal-conventions`, `cockroachdb-conventions` ## Phases Execute in order. Each phase builds on the previous. ### Phase 1: Clone & Shape Clone the repo on forge (`~/src/analysis/`). Measure: ``` - Total size, file count, commit count, contributor count - Top-level directory structure - Language breakdown (if polyglot) ``` ### Phase 2: Import Hierarchy Find the most-depended-upon packages. This reveals what the codebase considers foundational: ```bash # Go: grep imports, extract internal packages, count grep -rh '"/' --include="*.go" | sed ... | sort | uniq -c | sort -rn # Elixir: grep aliases/imports grep -rh "alias\|import\|use" --include="*.ex" | ... ``` The top 5-10 packages are where architectural decisions live. ### Phase 3: Interface Contracts Find the key abstractions: ```bash # Go grep -rn "type.*interface {" --include="*.go" | grep -v test | grep -v mock # Elixir grep -rn "@callback\|@behaviour\|defprotocol" --include="*.ex" ``` ### Phase 4: Error Handling & Quality Markers ```bash # Error style grep -rh "fmt.Errorf.*%w" | wc -l # wrapping grep -rh "errors.New\|errors.Wrap" | wc -l # sentinel/pkg # Quality markers grep -rn "TODO\|FIXME\|HACK" --include="*.go" | grep -v test | wc -l ``` Note TODO format — does the project use `TODO(owner):`? Plain `TODO:`? Version-gated TODOs? ### Phase 5: Unique Patterns Look for project-specific infrastructure that's NOT in stdlib: - Custom concurrency primitives (handles, schedulers, pools) - Testing infrastructure (custom assertions, harnesses) - Configuration systems (dynamic config, feature flags) - Plugin/extension systems ### Phase 6: Git Archaeology (requires full clone) This is where the real insight lives. For each interesting pattern: ```bash # When was it introduced? git log --all --oneline --diff-filter=A -- path/to/pattern # Who wrote it and why? git log --format="%an%n%ad%n%s%n%b" -1 # What did it replace? git log -p -S "old_pattern_name" -- relevant/path ``` ### Phase 7: PR Discussions Find the PR where a key pattern was introduced: ```bash gh api "search/issues?q=repo:/++type:pr" \ --jq '.items[] | {number, title}' ``` Then read: - PR body (the "why" from the author) - Review comments (the debate) - Issue comments (the resolution) Key questions to answer: 1. What problem motivated this pattern? 2. What alternatives were considered? 3. What did reviewers push back on? 4. How was it migrated in (big-bang vs gradual)? ### Phase 8: Synthesis & Output Produce two artifacts: **1. `analysis.md`** — Full architectural analysis: - Repository shape and import hierarchy - Key patterns with code examples - PR discussion summaries - Cross-ecosystem comparisons - Code quality metrics **2. `conventions.md`** — Extracted patterns with: - Pattern name - Location in source - Code example - When to use / when NOT to use - Origin story (from PR discussion if available) ## Output Repos Push to Gitea under `rodin/-conventions`: ```bash GITEA_TOKEN=$(cat ~/.openclaw/credentials/gitea-rodin) # Create repo if needed: curl -s -X POST "https://gitea.weiker.me/api/v1/user/repos" \ -H "Authorization: token $GITEA_TOKEN" \ -H "Content-Type: application/json" \ -d '{"name": "-conventions", "auto_init": true, "default_branch": "master", "license": "MIT"}' ``` ## Quality Criteria Before pushing, verify: - Each pattern has a real code example from the repo - "When to use / when NOT to use" sections exist - PR discussion quotes are attributed - Cross-ecosystem comparisons note prior art - Files end with `` sentinel ## Cross-Ecosystem Observations Always note when a pattern exists in multiple repos. Example: Temporal's `goro.Handle` (2021) predates CockroachDB's `stop.Handle` (2025) — same solution, invented independently. These connections are the most valuable findings. ## The 4 Categories of Pattern Breaks When you find impurity (pattern violations), classify them: 1. **Ship behavior, fix plumbing later** — time pressure, author knows it's wrong (tagged with TODO) 2. **Better tooling exposed the limitation** — old pattern worked but was invisible to profiling/tracing 3. **Removal cost > carrying cost** — technically debt but zero interest rate 4. **Context needs different pattern** — not actually a break See `references/pattern-breaks.md` for detailed examples. ## Execution Notes - Clone on **forge** (`host="node", node="forge"`) — has disk space - Use full clones (not `--depth 1`) for archaeology - `gh api` for GitHub PR lookups (authenticated on all nodes) - One repo at a time for focused analysis - Markdownlint all output before pushing