9670a5fda3
CI / test (pull_request) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 40s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m26s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m27s
- New --doc-map flag (DOC_MAP_FILE env var): path to YAML config mapping source path globs to governing design docs - New --doc-map-max-bytes flag (DOC_MAP_MAX_BYTES env var): cap on total injected doc content, default 100KB - review/docmap.go: DocMapConfig parsing, glob matching with ** support, doc loading via VCS with directory expansion and size guard - budget.Sections: new DesignDocs field, trimmed after conventions - budget.buildResult: injects DesignDocs under ## Design Documents heading - action.yml: doc-map and doc-map-max-bytes inputs wired to env vars - CHANGELOG.md: created with unreleased entry - Tests: ParseDocMapConfig, MatchDocs, globMatch, LoadMatchingDocs
195 lines
10 KiB
Markdown
195 lines
10 KiB
Markdown
# Plan: Issue #137 — doc-map input for path-scoped doc injection
|
|
|
|
## Problem
|
|
|
|
review-bot currently injects context via `patterns-repo` (external VCS repos) and `conventions-file` (a single file from the reviewed repo). There is no mechanism to inject local repo documentation files scoped to the paths changed in a PR.
|
|
|
|
First consumer: `grgl/gargoyle#778` wants a "doc adherence" reviewer that checks code against the module's governing design doc, without injecting every doc in the tree.
|
|
|
|
## Constraints
|
|
|
|
- Must work with existing `budget.Fit` architecture (docs go into `SystemBase` section, never trimmed — or added as a new section below `Conventions`)
|
|
- Must not fail the review if doc files are missing (warn + skip)
|
|
- Context guard: default 100KB total injected doc content (configurable)
|
|
- YAML parsing must use `github.com/goccy/go-yaml` (the only approved YAML library)
|
|
- No new third-party dependencies (Go standard library + approved packages only)
|
|
- Path security: doc files must be read via VCS API (not local filesystem), so they are always fetched from the PR head ref within the repo workspace — same path used by `conventions-file` loading
|
|
|
|
Wait — re-reading the issue: the issue says "local repo files". In the CI action context, the action runner has the repo checked out. The design doc says "read each doc file from the local checkout". But review-bot has no local checkout — it runs as a binary and reads files via VCS API. Let me reconcile:
|
|
|
|
- `conventions-file` uses `vcs.GetFileContent` (fetches from VCS API, default branch)
|
|
- The doc-map docs should also be read via VCS API
|
|
- The doc-map config file itself (`doc-map` input) is a local file in the workspace (like `system-prompt-file`)
|
|
- The doc paths inside the config ARE relative to the repo root, to be fetched via VCS API
|
|
|
|
**Conclusion:** The `doc-map` YAML file is read from local filesystem (like `system-prompt-file`). The doc files listed inside are fetched from the VCS API.
|
|
|
|
Actually, re-reading more carefully: "Read each doc file (or all .md files under a directory) from the local checkout". But review-bot doesn't have a local checkout. Since `system-prompt-file` and `conventions-file` are both read locally, I should follow the same approach consistently.
|
|
|
|
**Final decision:** The `doc-map` config file is local (passed via `--doc-map` flag, read with `os.ReadFile` after workspace validation). The listed doc paths (and directory expansion) are read via VCS `GetFileContent` / `GetAllFilesInPath` — matching the `conventions-file` pattern for consistency, and enabling it to work on any branch (not just the checked-out one).
|
|
|
|
## Proposed Approach
|
|
|
|
### New files
|
|
|
|
1. `review/docmap.go` — `DocMap` type, YAML parsing, glob matching, doc loading logic
|
|
2. `review/docmap_test.go` — unit tests
|
|
|
|
### Modified files
|
|
|
|
1. `cmd/review-bot/main.go` — add `--doc-map` flag, wire up in Step 6c
|
|
2. `.gitea/actions/review/action.yml` — add `doc-map` input, pass as `DOC_MAP_FILE` env var
|
|
3. `budget/budget.go` — add `DesignDocs` section (between `SystemBase`/`Conventions` and `Diff`)
|
|
4. `CHANGELOG.md` — update
|
|
|
|
### DocMap types (review/docmap.go)
|
|
|
|
```go
|
|
// DocMapping maps a set of path globs to doc files/directories.
|
|
type DocMapping struct {
|
|
Paths []string `yaml:"paths"` // glob patterns
|
|
Docs []string `yaml:"docs"` // file paths or directories
|
|
}
|
|
|
|
// DocMapConfig is the top-level YAML structure.
|
|
type DocMapConfig struct {
|
|
Mappings []DocMapping `yaml:"mappings"`
|
|
}
|
|
|
|
// DocMapOptions controls doc loading behavior.
|
|
type DocMapOptions struct {
|
|
MaxBytes int // default 100*1024
|
|
}
|
|
```
|
|
|
|
### Key functions
|
|
|
|
```go
|
|
// ParseDocMapConfig parses the YAML config file from a local path.
|
|
func ParseDocMapConfig(path string) (*DocMapConfig, error)
|
|
|
|
// MatchDocs returns deduplicated doc paths for the given changed files.
|
|
func MatchDocs(cfg *DocMapConfig, changedFiles []string) []string
|
|
|
|
// LoadMatchingDocs fetches doc content via VCS, respecting size limit.
|
|
// Returns (content, error). Missing files are warned and skipped.
|
|
func LoadMatchingDocs(ctx context.Context, fetcher DocFetcher, owner, repo string, docPaths []string, opts DocMapOptions) (string, error)
|
|
```
|
|
|
|
### DocFetcher interface
|
|
|
|
```go
|
|
// DocFetcher fetches files and directory listings from VCS.
|
|
// Subset of vcsClient, defined here to keep review package free of cmd-level deps.
|
|
type DocFetcher interface {
|
|
GetFileContent(ctx context.Context, owner, repo, filepath string) (string, error)
|
|
GetAllFilesInPath(ctx context.Context, owner, repo, path string) (map[string]string, error)
|
|
}
|
|
```
|
|
|
|
### Glob matching
|
|
|
|
Use `path.Match` from the Go standard library. It matches patterns like `lib/gargoyle/engine/signal_risk/**`. The `**` glob is NOT natively supported by `path.Match`, so we need either:
|
|
|
|
a) Use `filepath.Match` which also doesn't support `**`
|
|
b) Implement simple `**` support: `**` matches any number of path segments
|
|
|
|
**Decision:** Implement minimal `**` support: split path on `/`, split pattern on `/`, match each segment with `filepath.Match`. When a pattern segment is `**`, it consumes any number of remaining segments. This covers the primary use case without a new dependency.
|
|
|
|
### Budget integration
|
|
|
|
Add `DesignDocs` field to `budget.Sections`. Position: after `Conventions`, before `FileContext` (trimming order: Patterns → Conventions → DesignDocs → FileContext → Diff). Inject under `## Design Documents` heading in system prompt.
|
|
|
|
### Context size guard
|
|
|
|
Accumulate doc content bytes. If total would exceed `MaxBytes`, truncate last doc with a notice and stop loading more.
|
|
|
|
## State/Data Model
|
|
|
|
```
|
|
DocMapConfig
|
|
└── []DocMapping
|
|
├── Paths []string (glob patterns against changed file paths)
|
|
└── Docs []string (local doc paths or directories in target repo)
|
|
|
|
Flow:
|
|
1. Parse doc-map YAML → DocMapConfig
|
|
2. GetPullRequestFiles → []string of changed paths
|
|
3. MatchDocs(cfg, changedPaths) → deduplicated []string doc paths
|
|
4. For each doc path:
|
|
- If path ends with "/" or is a "directory" → GetAllFilesInPath, filter .md
|
|
- Otherwise → GetFileContent
|
|
5. Accumulate, respect size limit
|
|
6. Inject into system prompt
|
|
```
|
|
|
|
## Error Cases
|
|
|
|
| Situation | Behavior |
|
|
|-----------|----------|
|
|
| `--doc-map` file not found | Fatal error (like `--system-prompt-file`) |
|
|
| `--doc-map` file invalid YAML | Fatal error with descriptive message |
|
|
| Unknown keys in YAML | Log warning, continue |
|
|
| Doc file not found in VCS | Log warning, skip |
|
|
| Doc directory empty | Log debug, skip |
|
|
| Total size exceeds limit | Truncate with notice, log warning |
|
|
| No changed paths match | No docs injected, review runs normally |
|
|
| `paths` list empty in a mapping | Skip that mapping (no match possible) |
|
|
| `docs` list empty in a mapping | Skip that mapping (nothing to inject) |
|
|
|
|
## Edge Cases
|
|
|
|
- Empty `mappings` list → no docs injected, no error
|
|
- Same doc matched by multiple mappings → deduplicate by path
|
|
- Directory with no `.md` files → skip silently (log debug)
|
|
- Very large single doc file → counts against limit, may truncate
|
|
- Symlinks/special files in VCS → GetFileContent handles or errors (warn + skip)
|
|
- `doc-map` path outside workspace → fatal error (validateWorkspacePath)
|
|
- Directory path specified as `docs` entry without trailing `/` → check if it's a directory via ListContents or GetAllFilesInPath; if error, try GetFileContent
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit tests (review/docmap_test.go)
|
|
|
|
1. **ParseDocMapConfig** — valid YAML, invalid YAML, unknown keys (warning), empty file
|
|
2. **MatchDocs** — no match, single match, multi-match, deduplication, `**` glob, exact match
|
|
3. **LoadMatchingDocs** — with mock DocFetcher:
|
|
- file path → content returned
|
|
- missing file → warned + skipped
|
|
- directory path → expands .md files
|
|
- directory with no .md → empty
|
|
- size guard → truncation with notice
|
|
- deduplication in combined results
|
|
|
|
### Integration coverage
|
|
|
|
The existing `main_test.go` tests cover flag wiring — add a test for `--doc-map` flag parsing and workspace path validation.
|
|
|
|
## Open Questions
|
|
|
|
1. **Directory detection**: The issue says "directory paths expand to all .md files". But review-bot has no local filesystem. When a `docs` entry is `docs/domain/contexts/trading/`, we can call `GetAllFilesInPath`. But what if someone writes `docs/domain/contexts/trading` (no trailing slash)? We could try GetFileContent first, and if it fails with a 404 or "is directory" error, fall back to GetAllFilesInPath. OR we could just always call GetAllFilesInPath and if it returns content, use it; if it returns empty, try GetFileContent.
|
|
**Decision**: Try GetAllFilesInPath first (always). If it returns ≥1 file, treat as directory. If it returns 0 files AND no error, try GetFileContent. If GetAllFilesInPath returns an error, try GetFileContent.
|
|
|
|
2. **Budget section placement**: The issue says docs go in "system prompt after system-prompt-file content". That means docs are part of the system prompt. Current budget: SystemBase (includes additionalPrompt) → Patterns → Conventions. I'll add DesignDocs after Conventions (trim after Conventions). Docs are injected into system prompt via `buildResult`.
|
|
**Decision**: DesignDocs section in budget, trimmed after Conventions, before FileContext.
|
|
|
|
3. **Configurable size limit**: The issue says "configurable". Add `--doc-map-max-bytes` flag (default 102400). Pass via `DocMapOptions`.
|
|
**Decision**: Add flag. Default 100KB (102400 bytes).
|
|
|
|
## Completion Checklist
|
|
|
|
1. `doc-map` input added to action.yml with correct env var passthrough
|
|
2. `--doc-map` and `--doc-map-max-bytes` flags parsed in main.go
|
|
3. `doc-map` file validated with `validateWorkspacePath` before reading
|
|
4. YAML parsed with `go-yaml`, unknown keys warned not errored
|
|
5. Glob matching handles `**` segments
|
|
6. Changed files list from PR drives intersection (not hardcoded)
|
|
7. Docs deduplicated before fetching
|
|
8. Missing doc files: warn + skip, not fatal
|
|
9. Context size guard truncates with notice, logs warning
|
|
10. `DesignDocs` section added to `budget.Sections` and `buildResult`
|
|
11. Tests cover: match, no-match, dedup, missing file, directory expansion, size guard, YAML parse error
|
|
12. `go test ./...` passes
|
|
13. `go vet ./...` passes
|
|
14. CHANGELOG updated
|