Files
review-bot/PLAN-137.md
T
Rodin 9670a5fda3
CI / test (pull_request) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 40s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m26s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m27s
feat(#137): add doc-map input for path-scoped doc injection
- New --doc-map flag (DOC_MAP_FILE env var): path to YAML config mapping
  source path globs to governing design docs
- New --doc-map-max-bytes flag (DOC_MAP_MAX_BYTES env var): cap on total
  injected doc content, default 100KB
- review/docmap.go: DocMapConfig parsing, glob matching with ** support,
  doc loading via VCS with directory expansion and size guard
- budget.Sections: new DesignDocs field, trimmed after conventions
- budget.buildResult: injects DesignDocs under ## Design Documents heading
- action.yml: doc-map and doc-map-max-bytes inputs wired to env vars
- CHANGELOG.md: created with unreleased entry
- Tests: ParseDocMapConfig, MatchDocs, globMatch, LoadMatchingDocs
2026-05-15 03:25:54 +00:00

10 KiB

Plan: Issue #137 — doc-map input for path-scoped doc injection

Problem

review-bot currently injects context via patterns-repo (external VCS repos) and conventions-file (a single file from the reviewed repo). There is no mechanism to inject local repo documentation files scoped to the paths changed in a PR.

First consumer: grgl/gargoyle#778 wants a "doc adherence" reviewer that checks code against the module's governing design doc, without injecting every doc in the tree.

Constraints

  • Must work with existing budget.Fit architecture (docs go into SystemBase section, never trimmed — or added as a new section below Conventions)
  • Must not fail the review if doc files are missing (warn + skip)
  • Context guard: default 100KB total injected doc content (configurable)
  • YAML parsing must use github.com/goccy/go-yaml (the only approved YAML library)
  • No new third-party dependencies (Go standard library + approved packages only)
  • Path security: doc files must be read via VCS API (not local filesystem), so they are always fetched from the PR head ref within the repo workspace — same path used by conventions-file loading

Wait — re-reading the issue: the issue says "local repo files". In the CI action context, the action runner has the repo checked out. The design doc says "read each doc file from the local checkout". But review-bot has no local checkout — it runs as a binary and reads files via VCS API. Let me reconcile:

  • conventions-file uses vcs.GetFileContent (fetches from VCS API, default branch)
  • The doc-map docs should also be read via VCS API
  • The doc-map config file itself (doc-map input) is a local file in the workspace (like system-prompt-file)
  • The doc paths inside the config ARE relative to the repo root, to be fetched via VCS API

Conclusion: The doc-map YAML file is read from local filesystem (like system-prompt-file). The doc files listed inside are fetched from the VCS API.

Actually, re-reading more carefully: "Read each doc file (or all .md files under a directory) from the local checkout". But review-bot doesn't have a local checkout. Since system-prompt-file and conventions-file are both read locally, I should follow the same approach consistently.

Final decision: The doc-map config file is local (passed via --doc-map flag, read with os.ReadFile after workspace validation). The listed doc paths (and directory expansion) are read via VCS GetFileContent / GetAllFilesInPath — matching the conventions-file pattern for consistency, and enabling it to work on any branch (not just the checked-out one).

Proposed Approach

New files

  1. review/docmap.goDocMap type, YAML parsing, glob matching, doc loading logic
  2. review/docmap_test.go — unit tests

Modified files

  1. cmd/review-bot/main.go — add --doc-map flag, wire up in Step 6c
  2. .gitea/actions/review/action.yml — add doc-map input, pass as DOC_MAP_FILE env var
  3. budget/budget.go — add DesignDocs section (between SystemBase/Conventions and Diff)
  4. CHANGELOG.md — update

DocMap types (review/docmap.go)

// DocMapping maps a set of path globs to doc files/directories.
type DocMapping struct {
    Paths []string `yaml:"paths"` // glob patterns
    Docs  []string `yaml:"docs"`  // file paths or directories
}

// DocMapConfig is the top-level YAML structure.
type DocMapConfig struct {
    Mappings []DocMapping `yaml:"mappings"`
}

// DocMapOptions controls doc loading behavior.
type DocMapOptions struct {
    MaxBytes int // default 100*1024
}

Key functions

// ParseDocMapConfig parses the YAML config file from a local path.
func ParseDocMapConfig(path string) (*DocMapConfig, error)

// MatchDocs returns deduplicated doc paths for the given changed files.
func MatchDocs(cfg *DocMapConfig, changedFiles []string) []string

// LoadMatchingDocs fetches doc content via VCS, respecting size limit.
// Returns (content, error). Missing files are warned and skipped.
func LoadMatchingDocs(ctx context.Context, fetcher DocFetcher, owner, repo string, docPaths []string, opts DocMapOptions) (string, error)

DocFetcher interface

// DocFetcher fetches files and directory listings from VCS.
// Subset of vcsClient, defined here to keep review package free of cmd-level deps.
type DocFetcher interface {
    GetFileContent(ctx context.Context, owner, repo, filepath string) (string, error)
    GetAllFilesInPath(ctx context.Context, owner, repo, path string) (map[string]string, error)
}

Glob matching

Use path.Match from the Go standard library. It matches patterns like lib/gargoyle/engine/signal_risk/**. The ** glob is NOT natively supported by path.Match, so we need either:

a) Use filepath.Match which also doesn't support ** b) Implement simple ** support: ** matches any number of path segments

Decision: Implement minimal ** support: split path on /, split pattern on /, match each segment with filepath.Match. When a pattern segment is **, it consumes any number of remaining segments. This covers the primary use case without a new dependency.

Budget integration

Add DesignDocs field to budget.Sections. Position: after Conventions, before FileContext (trimming order: Patterns → Conventions → DesignDocs → FileContext → Diff). Inject under ## Design Documents heading in system prompt.

Context size guard

Accumulate doc content bytes. If total would exceed MaxBytes, truncate last doc with a notice and stop loading more.

State/Data Model

DocMapConfig
  └── []DocMapping
        ├── Paths []string  (glob patterns against changed file paths)
        └── Docs  []string  (local doc paths or directories in target repo)

Flow:
  1. Parse doc-map YAML → DocMapConfig
  2. GetPullRequestFiles → []string of changed paths
  3. MatchDocs(cfg, changedPaths) → deduplicated []string doc paths
  4. For each doc path:
       - If path ends with "/" or is a "directory" → GetAllFilesInPath, filter .md
       - Otherwise → GetFileContent
  5. Accumulate, respect size limit
  6. Inject into system prompt

Error Cases

Situation Behavior
--doc-map file not found Fatal error (like --system-prompt-file)
--doc-map file invalid YAML Fatal error with descriptive message
Unknown keys in YAML Log warning, continue
Doc file not found in VCS Log warning, skip
Doc directory empty Log debug, skip
Total size exceeds limit Truncate with notice, log warning
No changed paths match No docs injected, review runs normally
paths list empty in a mapping Skip that mapping (no match possible)
docs list empty in a mapping Skip that mapping (nothing to inject)

Edge Cases

  • Empty mappings list → no docs injected, no error
  • Same doc matched by multiple mappings → deduplicate by path
  • Directory with no .md files → skip silently (log debug)
  • Very large single doc file → counts against limit, may truncate
  • Symlinks/special files in VCS → GetFileContent handles or errors (warn + skip)
  • doc-map path outside workspace → fatal error (validateWorkspacePath)
  • Directory path specified as docs entry without trailing / → check if it's a directory via ListContents or GetAllFilesInPath; if error, try GetFileContent

Testing Strategy

Unit tests (review/docmap_test.go)

  1. ParseDocMapConfig — valid YAML, invalid YAML, unknown keys (warning), empty file
  2. MatchDocs — no match, single match, multi-match, deduplication, ** glob, exact match
  3. LoadMatchingDocs — with mock DocFetcher:
    • file path → content returned
    • missing file → warned + skipped
    • directory path → expands .md files
    • directory with no .md → empty
    • size guard → truncation with notice
    • deduplication in combined results

Integration coverage

The existing main_test.go tests cover flag wiring — add a test for --doc-map flag parsing and workspace path validation.

Open Questions

  1. Directory detection: The issue says "directory paths expand to all .md files". But review-bot has no local filesystem. When a docs entry is docs/domain/contexts/trading/, we can call GetAllFilesInPath. But what if someone writes docs/domain/contexts/trading (no trailing slash)? We could try GetFileContent first, and if it fails with a 404 or "is directory" error, fall back to GetAllFilesInPath. OR we could just always call GetAllFilesInPath and if it returns content, use it; if it returns empty, try GetFileContent. Decision: Try GetAllFilesInPath first (always). If it returns ≥1 file, treat as directory. If it returns 0 files AND no error, try GetFileContent. If GetAllFilesInPath returns an error, try GetFileContent.

  2. Budget section placement: The issue says docs go in "system prompt after system-prompt-file content". That means docs are part of the system prompt. Current budget: SystemBase (includes additionalPrompt) → Patterns → Conventions. I'll add DesignDocs after Conventions (trim after Conventions). Docs are injected into system prompt via buildResult. Decision: DesignDocs section in budget, trimmed after Conventions, before FileContext.

  3. Configurable size limit: The issue says "configurable". Add --doc-map-max-bytes flag (default 102400). Pass via DocMapOptions. Decision: Add flag. Default 100KB (102400 bytes).

Completion Checklist

  1. doc-map input added to action.yml with correct env var passthrough
  2. --doc-map and --doc-map-max-bytes flags parsed in main.go
  3. doc-map file validated with validateWorkspacePath before reading
  4. YAML parsed with go-yaml, unknown keys warned not errored
  5. Glob matching handles ** segments
  6. Changed files list from PR drives intersection (not hardcoded)
  7. Docs deduplicated before fetching
  8. Missing doc files: warn + skip, not fatal
  9. Context size guard truncates with notice, logs warning
  10. DesignDocs section added to budget.Sections and buildResult
  11. Tests cover: match, no-match, dedup, missing file, directory expansion, size guard, YAML parse error
  12. go test ./... passes
  13. go vet ./... passes
  14. CHANGELOG updated