feat: repeatable mechanical method for patterns mode

5 steps: Quantify → Extract one → Decision tree → Cross-refs → Hyperlinks.
Delegation strategy (per-entry, not per-file).
Discovery greps for Go, Elixir, Rust, Python.
Hyperlink scripts per language.
This commit is contained in:
Rodin
2026-04-30 14:46:41 -07:00
parent 0c51a9334f
commit bd9790caa1
2 changed files with 261 additions and 17 deletions
+98 -17
View File
@@ -242,25 +242,106 @@ Output: `<language>-patterns` or `<ecosystem>-patterns` repo.
codebase?" Filter everything through: "If I were writing new code
in this language/ecosystem, what rules does this source teach me?"
**This is iterative, not one-shot.** Keep extracting until you've
identified ALL patterns the source demonstrates. A first pass finds
the obvious ones. Second pass greps for variations and edge cases.
Third pass finds the patterns that break. You're done when scanning
the source no longer reveals new rules.
**This is iterative, not one-shot.** The method produces quality
through decomposition, not through asking one agent to "write a
good file." Each step is bounded, mechanical, and verifiable.
**Process:**
1. **Discovery pass** — scan the source by topic area, identify every
distinct pattern (aim for 15-30+ per topic in a large codebase)
2. **Deepening pass** — for each pattern, grep for 5-10 real usages
across the codebase. Note variations. Find the best example.
3. **Edge case pass** — find where each pattern DOESN'T apply.
Grep for violations — are they bugs, or legitimate exceptions?
4. **Cross-reference pass** — which patterns interact? Which ones
conflict? Document the decision framework for choosing between
competing patterns.
### The Repeatable Method
Repeat until scanning the source yields no new patterns. A language
stdlib should produce 50-200+ patterns across all topics.
**Step 1: Quantify** (5 min per topic)
For each topic area, run frequency grep commands to find patterns.
The goal is COUNTS — how often does this pattern appear?
```
# Example: error handling in Go
grep -rn "^var Err" --include="*.go" | grep -v test | wc -l → 55
grep -rn "fmt.Errorf.*%w" --include="*.go" | grep -v test | wc -l → 115
grep -rn "errors\.Is\|errors\.As" --include="*.go" | wc -l → 212
```
Output: a numbered list of pattern names + counts. This IS the
table of contents for that topic file.
**Step 2: Extract one** (5-10 min per pattern)
For EACH pattern from the list, in order:
1. Find the best example (grep → pick the clearest one)
2. Read 10 lines of surrounding context (understand WHY)
3. Write one pattern entry (40-80 lines, all required sections)
4. Move to the next pattern
The key constraint: **write one pattern entry completely before
starting the next.** Never read all patterns then write all entries.
This prevents context exhaustion and ensures each entry is complete.
**Step 3: Decision tree** (5 min per topic)
After all patterns are written, add a decision tree at the end.
Format: "If X, use pattern A. If Y, use pattern B."
**Step 4: Cross-references** (2 min per topic)
Add `See also:` links to related topic files.
**Step 5: Hyperlinks** (mechanical, scriptable)
Convert all source references to clickable permalinks:
```bash
HEAD=$(git rev-parse HEAD)
BASE="https://github.com/OWNER/REPO/blob/${HEAD}"
sed -i -E "s|\`(path/file\.ext):([0-9]+)\`|[\1#L\2](${BASE}/\1#L\2)|g" file.md
```
### Delegation Strategy
When using sub-agents:
- **DO:** One agent per pattern entry (bounded: read one, write one)
- **DO:** Give the agent the grep output as input (they don't discover,
they deepen a known pattern)
- **DO:** Include one complete example entry in the prompt as the
quality reference
- **DON'T:** Ask one agent to write an entire topic file
- **DON'T:** Ask agents to "discover patterns" (they'll find 5 obvious
ones and miss 10 important ones)
- **DON'T:** Let agents choose their own structure (give them the
template)
**Template for sub-agent task:**
```
Write pattern entry for: [PATTERN NAME]
Source repo: [REPO] at commit [SHA]
Access: [SSH command to get to the source]
Permalink base: [URL]
Grep that found this: [the grep command + sample output]
Reference quality: [paste ONE complete pattern entry as example]
Write to: [output path]
```
### Parallelism
- Step 1 (quantify): run for ALL topics in parallel (just grep)
- Step 2 (extract): run per-pattern entries in parallel (max 5)
- Steps 3-5: sequential (need all entries to exist first)
### Done Criteria
A topic file is done when:
- [ ] Every pattern from Step 1's list has an entry
- [ ] Each entry has ALL required sections (source, why, when to use
with before/after, when NOT to use with over-application)
- [ ] Decision tree exists at the end
- [ ] All source refs are hyperlinked
- [ ] PATTERN_COMPLETE sentinel at EOF
- [ ] File is 500-1000 lines (if shorter, entries are too shallow)
A language is done when:
- [ ] 8-12 topic files exist
- [ ] Each topic has 10-15+ patterns
- [ ] Total is 5,000-10,000+ lines
- [ ] No grep scan reveals patterns not yet documented
- [ ] smells.md covers anti-patterns found in the source
**Output structure — one file per topic:**