- New --doc-map flag (DOC_MAP_FILE env var): path to YAML config mapping source path globs to governing design docs - New --doc-map-max-bytes flag (DOC_MAP_MAX_BYTES env var): cap on total injected doc content, default 100KB - review/docmap.go: DocMapConfig parsing, glob matching with ** support, doc loading via VCS with directory expansion and size guard - budget.Sections: new DesignDocs field, trimmed after conventions - budget.buildResult: injects DesignDocs under ## Design Documents heading - action.yml: doc-map and doc-map-max-bytes inputs wired to env vars - CHANGELOG.md: created with unreleased entry - Tests: ParseDocMapConfig, MatchDocs, globMatch, LoadMatchingDocs
10 KiB
Plan: Issue #137 — doc-map input for path-scoped doc injection
Problem
review-bot currently injects context via patterns-repo (external VCS repos) and conventions-file (a single file from the reviewed repo). There is no mechanism to inject local repo documentation files scoped to the paths changed in a PR.
First consumer: grgl/gargoyle#778 wants a "doc adherence" reviewer that checks code against the module's governing design doc, without injecting every doc in the tree.
Constraints
- Must work with existing
budget.Fitarchitecture (docs go intoSystemBasesection, never trimmed — or added as a new section belowConventions) - Must not fail the review if doc files are missing (warn + skip)
- Context guard: default 100KB total injected doc content (configurable)
- YAML parsing must use
github.com/goccy/go-yaml(the only approved YAML library) - No new third-party dependencies (Go standard library + approved packages only)
- Path security: doc files must be read via VCS API (not local filesystem), so they are always fetched from the PR head ref within the repo workspace — same path used by
conventions-fileloading
Wait — re-reading the issue: the issue says "local repo files". In the CI action context, the action runner has the repo checked out. The design doc says "read each doc file from the local checkout". But review-bot has no local checkout — it runs as a binary and reads files via VCS API. Let me reconcile:
conventions-fileusesvcs.GetFileContent(fetches from VCS API, default branch)- The doc-map docs should also be read via VCS API
- The doc-map config file itself (
doc-mapinput) is a local file in the workspace (likesystem-prompt-file) - The doc paths inside the config ARE relative to the repo root, to be fetched via VCS API
Conclusion: The doc-map YAML file is read from local filesystem (like system-prompt-file). The doc files listed inside are fetched from the VCS API.
Actually, re-reading more carefully: "Read each doc file (or all .md files under a directory) from the local checkout". But review-bot doesn't have a local checkout. Since system-prompt-file and conventions-file are both read locally, I should follow the same approach consistently.
Final decision: The doc-map config file is local (passed via --doc-map flag, read with os.ReadFile after workspace validation). The listed doc paths (and directory expansion) are read via VCS GetFileContent / GetAllFilesInPath — matching the conventions-file pattern for consistency, and enabling it to work on any branch (not just the checked-out one).
Proposed Approach
New files
review/docmap.go—DocMaptype, YAML parsing, glob matching, doc loading logicreview/docmap_test.go— unit tests
Modified files
cmd/review-bot/main.go— add--doc-mapflag, wire up in Step 6c.gitea/actions/review/action.yml— adddoc-mapinput, pass asDOC_MAP_FILEenv varbudget/budget.go— addDesignDocssection (betweenSystemBase/ConventionsandDiff)CHANGELOG.md— update
DocMap types (review/docmap.go)
// DocMapping maps a set of path globs to doc files/directories.
type DocMapping struct {
Paths []string `yaml:"paths"` // glob patterns
Docs []string `yaml:"docs"` // file paths or directories
}
// DocMapConfig is the top-level YAML structure.
type DocMapConfig struct {
Mappings []DocMapping `yaml:"mappings"`
}
// DocMapOptions controls doc loading behavior.
type DocMapOptions struct {
MaxBytes int // default 100*1024
}
Key functions
// ParseDocMapConfig parses the YAML config file from a local path.
func ParseDocMapConfig(path string) (*DocMapConfig, error)
// MatchDocs returns deduplicated doc paths for the given changed files.
func MatchDocs(cfg *DocMapConfig, changedFiles []string) []string
// LoadMatchingDocs fetches doc content via VCS, respecting size limit.
// Returns (content, error). Missing files are warned and skipped.
func LoadMatchingDocs(ctx context.Context, fetcher DocFetcher, owner, repo string, docPaths []string, opts DocMapOptions) (string, error)
DocFetcher interface
// DocFetcher fetches files and directory listings from VCS.
// Subset of vcsClient, defined here to keep review package free of cmd-level deps.
type DocFetcher interface {
GetFileContent(ctx context.Context, owner, repo, filepath string) (string, error)
GetAllFilesInPath(ctx context.Context, owner, repo, path string) (map[string]string, error)
}
Glob matching
Use path.Match from the Go standard library. It matches patterns like lib/gargoyle/engine/signal_risk/**. The ** glob is NOT natively supported by path.Match, so we need either:
a) Use filepath.Match which also doesn't support **
b) Implement simple ** support: ** matches any number of path segments
Decision: Implement minimal ** support: split path on /, split pattern on /, match each segment with filepath.Match. When a pattern segment is **, it consumes any number of remaining segments. This covers the primary use case without a new dependency.
Budget integration
Add DesignDocs field to budget.Sections. Position: after Conventions, before FileContext (trimming order: Patterns → Conventions → DesignDocs → FileContext → Diff). Inject under ## Design Documents heading in system prompt.
Context size guard
Accumulate doc content bytes. If total would exceed MaxBytes, truncate last doc with a notice and stop loading more.
State/Data Model
DocMapConfig
└── []DocMapping
├── Paths []string (glob patterns against changed file paths)
└── Docs []string (local doc paths or directories in target repo)
Flow:
1. Parse doc-map YAML → DocMapConfig
2. GetPullRequestFiles → []string of changed paths
3. MatchDocs(cfg, changedPaths) → deduplicated []string doc paths
4. For each doc path:
- If path ends with "/" or is a "directory" → GetAllFilesInPath, filter .md
- Otherwise → GetFileContent
5. Accumulate, respect size limit
6. Inject into system prompt
Error Cases
| Situation | Behavior |
|---|---|
--doc-map file not found |
Fatal error (like --system-prompt-file) |
--doc-map file invalid YAML |
Fatal error with descriptive message |
| Unknown keys in YAML | Log warning, continue |
| Doc file not found in VCS | Log warning, skip |
| Doc directory empty | Log debug, skip |
| Total size exceeds limit | Truncate with notice, log warning |
| No changed paths match | No docs injected, review runs normally |
paths list empty in a mapping |
Skip that mapping (no match possible) |
docs list empty in a mapping |
Skip that mapping (nothing to inject) |
Edge Cases
- Empty
mappingslist → no docs injected, no error - Same doc matched by multiple mappings → deduplicate by path
- Directory with no
.mdfiles → skip silently (log debug) - Very large single doc file → counts against limit, may truncate
- Symlinks/special files in VCS → GetFileContent handles or errors (warn + skip)
doc-mappath outside workspace → fatal error (validateWorkspacePath)- Directory path specified as
docsentry without trailing/→ check if it's a directory via ListContents or GetAllFilesInPath; if error, try GetFileContent
Testing Strategy
Unit tests (review/docmap_test.go)
- ParseDocMapConfig — valid YAML, invalid YAML, unknown keys (warning), empty file
- MatchDocs — no match, single match, multi-match, deduplication,
**glob, exact match - LoadMatchingDocs — with mock DocFetcher:
- file path → content returned
- missing file → warned + skipped
- directory path → expands .md files
- directory with no .md → empty
- size guard → truncation with notice
- deduplication in combined results
Integration coverage
The existing main_test.go tests cover flag wiring — add a test for --doc-map flag parsing and workspace path validation.
Open Questions
-
Directory detection: The issue says "directory paths expand to all .md files". But review-bot has no local filesystem. When a
docsentry isdocs/domain/contexts/trading/, we can callGetAllFilesInPath. But what if someone writesdocs/domain/contexts/trading(no trailing slash)? We could try GetFileContent first, and if it fails with a 404 or "is directory" error, fall back to GetAllFilesInPath. OR we could just always call GetAllFilesInPath and if it returns content, use it; if it returns empty, try GetFileContent. Decision: Try GetAllFilesInPath first (always). If it returns ≥1 file, treat as directory. If it returns 0 files AND no error, try GetFileContent. If GetAllFilesInPath returns an error, try GetFileContent. -
Budget section placement: The issue says docs go in "system prompt after system-prompt-file content". That means docs are part of the system prompt. Current budget: SystemBase (includes additionalPrompt) → Patterns → Conventions. I'll add DesignDocs after Conventions (trim after Conventions). Docs are injected into system prompt via
buildResult. Decision: DesignDocs section in budget, trimmed after Conventions, before FileContext. -
Configurable size limit: The issue says "configurable". Add
--doc-map-max-bytesflag (default 102400). Pass viaDocMapOptions. Decision: Add flag. Default 100KB (102400 bytes).
Completion Checklist
doc-mapinput added to action.yml with correct env var passthrough--doc-mapand--doc-map-max-bytesflags parsed in main.godoc-mapfile validated withvalidateWorkspacePathbefore reading- YAML parsed with
go-yaml, unknown keys warned not errored - Glob matching handles
**segments - Changed files list from PR drives intersection (not hardcoded)
- Docs deduplicated before fetching
- Missing doc files: warn + skip, not fatal
- Context size guard truncates with notice, logs warning
DesignDocssection added tobudget.SectionsandbuildResult- Tests cover: match, no-match, dedup, missing file, directory expansion, size guard, YAML parse error
go test ./...passesgo vet ./...passes- CHANGELOG updated