feat: repeatable mechanical method for patterns mode

5 steps: Quantify → Extract one → Decision tree → Cross-refs → Hyperlinks. Delegation strategy (per-entry, not per-file). Discovery greps for Go, Elixir, Rust, Python. Hyperlink scripts per language.
2026-04-30 14:46:41 -07:00
parent 0c51a9334f
commit bd9790caa1
2 changed files with 261 additions and 17 deletions
@@ -242,25 +242,106 @@ Output: `<language>-patterns` or `<ecosystem>-patterns` repo.
 codebase?" Filter everything through: "If I were writing new code
 in this language/ecosystem, what rules does this source teach me?"

-**This is iterative, not one-shot.** Keep extracting until you've
-identified ALL patterns the source demonstrates. A first pass finds
-the obvious ones. Second pass greps for variations and edge cases.
-Third pass finds the patterns that break. You're done when scanning
-the source no longer reveals new rules.
+**This is iterative, not one-shot.** The method produces quality
+through decomposition, not through asking one agent to "write a
+good file." Each step is bounded, mechanical, and verifiable.

-**Process:**
-1. **Discovery pass** — scan the source by topic area, identify every
-   distinct pattern (aim for 15-30+ per topic in a large codebase)
-2. **Deepening pass** — for each pattern, grep for 5-10 real usages
-   across the codebase. Note variations. Find the best example.
-3. **Edge case pass** — find where each pattern DOESN'T apply.
-   Grep for violations — are they bugs, or legitimate exceptions?
-4. **Cross-reference pass** — which patterns interact? Which ones
-   conflict? Document the decision framework for choosing between
-   competing patterns.
+### The Repeatable Method

-Repeat until scanning the source yields no new patterns. A language
-stdlib should produce 50-200+ patterns across all topics.
+**Step 1: Quantify** (5 min per topic)
+
+For each topic area, run frequency grep commands to find patterns.
+The goal is COUNTS — how often does this pattern appear?
+
+```
+# Example: error handling in Go
+grep -rn "^var Err" --include="*.go" | grep -v test | wc -l  → 55
+grep -rn "fmt.Errorf.*%w" --include="*.go" | grep -v test | wc -l  → 115
+grep -rn "errors\.Is\|errors\.As" --include="*.go" | wc -l  → 212
+```
+
+Output: a numbered list of pattern names + counts. This IS the
+table of contents for that topic file.
+
+**Step 2: Extract one** (5-10 min per pattern)
+
+For EACH pattern from the list, in order:
+1. Find the best example (grep → pick the clearest one)
+2. Read 10 lines of surrounding context (understand WHY)
+3. Write one pattern entry (40-80 lines, all required sections)
+4. Move to the next pattern
+
+The key constraint: **write one pattern entry completely before
+starting the next.** Never read all patterns then write all entries.
+This prevents context exhaustion and ensures each entry is complete.
+
+**Step 3: Decision tree** (5 min per topic)
+
+After all patterns are written, add a decision tree at the end.
+Format: "If X, use pattern A. If Y, use pattern B."
+
+**Step 4: Cross-references** (2 min per topic)
+
+Add `See also:` links to related topic files.
+
+**Step 5: Hyperlinks** (mechanical, scriptable)
+
+Convert all source references to clickable permalinks:
+```bash
+HEAD=$(git rev-parse HEAD)
+BASE="https://github.com/OWNER/REPO/blob/${HEAD}"
+sed -i -E "s|\`(path/file\.ext):([0-9]+)\`|[\1#L\2](${BASE}/\1#L\2)|g" file.md
+```
+
+### Delegation Strategy
+
+When using sub-agents:
+
+- **DO:** One agent per pattern entry (bounded: read one, write one)
+- **DO:** Give the agent the grep output as input (they don't discover,
+  they deepen a known pattern)
+- **DO:** Include one complete example entry in the prompt as the
+  quality reference
+- **DON'T:** Ask one agent to write an entire topic file
+- **DON'T:** Ask agents to "discover patterns" (they'll find 5 obvious
+  ones and miss 10 important ones)
+- **DON'T:** Let agents choose their own structure (give them the
+  template)
+
+**Template for sub-agent task:**
+```
+Write pattern entry for: [PATTERN NAME]
+Source repo: [REPO] at commit [SHA]
+Access: [SSH command to get to the source]
+Permalink base: [URL]
+Grep that found this: [the grep command + sample output]
+Reference quality: [paste ONE complete pattern entry as example]
+Write to: [output path]
+```
+
+### Parallelism
+
+- Step 1 (quantify): run for ALL topics in parallel (just grep)
+- Step 2 (extract): run per-pattern entries in parallel (max 5)
+- Steps 3-5: sequential (need all entries to exist first)
+
+### Done Criteria
+
+A topic file is done when:
+- [ ] Every pattern from Step 1's list has an entry
+- [ ] Each entry has ALL required sections (source, why, when to use
+  with before/after, when NOT to use with over-application)
+- [ ] Decision tree exists at the end
+- [ ] All source refs are hyperlinked
+- [ ] PATTERN_COMPLETE sentinel at EOF
+- [ ] File is 500-1000 lines (if shorter, entries are too shallow)
+
+A language is done when:
+- [ ] 8-12 topic files exist
+- [ ] Each topic has 10-15+ patterns
+- [ ] Total is 5,000-10,000+ lines
+- [ ] No grep scan reveals patterns not yet documented
+- [ ] smells.md covers anti-patterns found in the source

 **Output structure — one file per topic:**