feat(#137): add doc-map input for path-scoped doc injection #138
Reference in New Issue
Block a user
Delete Branch "issue-137"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
$PR_BODY
Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
9670a5fd)Sonnet Review
Summary
Well-structured feature addition that follows all established patterns. The doc-map feature is cleanly integrated into the existing architecture with proper error handling, deduplication, size guards, and thorough test coverage.
Findings
review/docmap.goreviewpackage already exists and has a broader purpose. This package-level comment in docmap.go will either be ignored (not the first file) or will conflict with the package doc in other files. The file-level comment should just be a regular comment, not apackagedoc comment, or should be omitted since the package is already documented elsewhere.review/docmap.gotruncateUTF8function is duplicated — an identical implementation already exists inbudget/budget.go. Per the DRY principle and the project's own patterns, this should be extracted to a shared utility. Sincereviewandbudgetare separate packages, the options are: (1) move it to a shared internal package, or (2) accept the duplication with a comment. As-is, future changes to the truncation logic must be made in two places.review/docmap.goloadDocEntries, whenGetAllFilesInPathreturns an error (rather than empty results), the code silently falls through to tryGetFileContent. The plan's decision explicitly states 'If GetAllFilesInPath returns an error, try GetFileContent', which this does — but the directory error is swallowed without logging. If both calls fail, only the file error is returned. Consider logging a debug message for the directory error before the fallback, which would help diagnose unexpected behavior.review/docmap_test.gofakeDocFetcherstruct fields have inconsistent comment styles (inline comments on the struct fields are fine), but thedirsfield typemap[string]map[string]stringcould benefit from a brief comment explaining the structure, similar to thefilesfield comment. Minor readability nit.review/docmap_test.goTestLoadMatchingDocs_ContextSizeGuardusesMaxBytes: 350with content of 200 bytes each and checkslen(content) > 600. The content after metadata (headings, newlines) will be well above 350 bytes due to formatting overhead. The test is validating the right behavior (truncation occurs), but the boundary check (> 600) is loose. Not a bug, just slightly confusing as the actual formatted content with headings for doc a alone is ~220+ bytes.PLAN-137.mddocs/or.design/directory if architecture decision records are desired.Recommendation
APPROVE — APPROVE. CI passes and the implementation is solid. The core feature — doc-map YAML parsing, ** glob matching, VCS-based doc fetching, budget integration, and size guards — is correctly implemented and thoroughly tested. The
truncateUTF8duplication is the most actionable finding but is not a blocker for merging. The package comment collision in docmap.go is a minor issue but won't cause runtime problems. All established patterns (error handling with%wwrapping, interface-based abstraction viaDocFetcher, table-driven tests,sloglogging,t.Helper()usage) are followed correctly.Review by sonnet
Evaluated against
9670a5fd@@ -0,0 +1,194 @@# Plan: Issue #137 — doc-map input for path-scoped doc injection[NIT] Committing a PLAN-137.md file to the repository is unusual — planning/design documents are typically kept in issue trackers or wikis. This couples the implementation repository to an issue-tracking artifact. Consider removing it before merge or moving it to a
docs/or.design/directory if architecture decision records are desired.@@ -0,0 +54,4 @@return nil, fmt.Errorf("read doc-map file %q: %w", localPath, err)}var cfg DocMapConfig[MINOR] The package comment says 'Package review provides doc-map parsing and doc injection...' but the
reviewpackage already exists and has a broader purpose. This package-level comment in docmap.go will either be ignored (not the first file) or will conflict with the package doc in other files. The file-level comment should just be a regular comment, not apackagedoc comment, or should be omitted since the package is already documented elsewhere.@@ -0,0 +198,4 @@continue}for _, entry := range entries {[MINOR] The
truncateUTF8function is duplicated — an identical implementation already exists inbudget/budget.go. Per the DRY principle and the project's own patterns, this should be extracted to a shared utility. Sincereviewandbudgetare separate packages, the options are: (1) move it to a shared internal package, or (2) accept the duplication with a comment. As-is, future changes to the truncation logic must be made in two places.@@ -0,0 +242,4 @@content string}// loadDocEntries returns the doc content for a given path.[MINOR] In
loadDocEntries, whenGetAllFilesInPathreturns an error (rather than empty results), the code silently falls through to tryGetFileContent. The plan's decision explicitly states 'If GetAllFilesInPath returns an error, try GetFileContent', which this does — but the directory error is swallowed without logging. If both calls fail, only the file error is returned. Consider logging a debug message for the directory error before the fallback, which would help diagnose unexpected behavior.@@ -0,0 +11,4 @@// fakeDocFetcher is a mock DocFetcher for tests.type fakeDocFetcher struct {files map[string]string // path -> content[NIT] The
fakeDocFetcherstruct fields have inconsistent comment styles (inline comments on the struct fields are fine), but thedirsfield typemap[string]map[string]stringcould benefit from a brief comment explaining the structure, similar to thefilesfield comment. Minor readability nit.@@ -0,0 +335,4 @@}}func TestLoadMatchingDocs_ContextSizeGuard(t *testing.T) {[NIT] The test
TestLoadMatchingDocs_ContextSizeGuardusesMaxBytes: 350with content of 200 bytes each and checkslen(content) > 600. The content after metadata (headings, newlines) will be well above 350 bytes due to formatting overhead. The test is validating the right behavior (truncation occurs), but the boundary check (> 600) is loose. Not a bug, just slightly confusing as the actual formatted content with headings for doc a alone is ~220+ bytes.Self-review against
9670a5fda3Assessment: ✅ Clean
No issues found — ready for human review.
Checks:
filesslice from Step 3 failure safely handled. ✅conventions-filepattern. Interface in consumer package. ✅**glob without new dependency. Size guard prevents context overflow. ✅Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
9670a5fd)Security Review
Summary
The changes add a path-scoped design doc injection feature with careful workspace path validation and use of the approved YAML parser. CI passed and the implementation avoids obvious SSRF, injection, or auth issues. I identified a few defense-in-depth hardening opportunities around resource limits and LLM prompt injection handling.
Findings
review/docmap.gobudget/budget.goreview/docmap.goRecommendation
APPROVE — Overall, the feature is implemented securely with good input validation for the local doc-map file and appropriate use of the approved YAML library. To strengthen the security posture: (1) Add resource-exhaustion guards to doc fetching by avoiding full directory content retrieval before enforcing limits—prefer a two-phase approach (list metadata first, then fetch files incrementally until the byte budget is reached), and consider caps on number of files and per-file size. (2) When embedding design docs, include explicit prompt-guardrails in the system message that instruct the model to treat design-doc content as reference data and to ignore any instructions found within that content, or move the docs to the user prompt with clear data boundaries to mitigate prompt injection risks. (3) Sanitize doc paths from the YAML (reject absolute paths and any path traversal components) before invoking VCS API calls for defense-in-depth. With these adjustments, the changes would further reduce DoS and LLM prompt manipulation risks while maintaining the intended functionality.
Review by security
Evaluated against
9670a5fd@@ -185,6 +187,10 @@ func buildResult(s Sections, trimmed []string, estTokens int) Result {sys.WriteString("\n\n## Repository Conventions\n\nThe repository has the following coding conventions that must be respected:\n\n")[MINOR] Prompt-injection hardening: Design docs (which are ultimately repository-controlled data) are injected into the system prompt without explicit instruction separation. While content is fetched from the default branch (reducing attacker control via PR), best practice is to clearly treat docs as data and direct the model not to follow any instructions contained within them to mitigate prompt injection.
@@ -0,0 +165,4 @@// a formatted string suitable for injection into the system prompt.//// Behavior:// - Paths that look like directories (end with /, or GetAllFilesInPath returns files)[MINOR] Potential denial-of-service via unbounded directory fetch: LoadMatchingDocs calls GetAllFilesInPath which returns full contents of all files under a path before applying the MaxBytes cap. A malicious or misconfigured doc-map could point to a very large directory, causing excessive memory/network usage prior to truncation.
@@ -0,0 +226,4 @@sb.WriteString("\n> ⚠️ (truncated — size limit reached)\n")}totalBytes += len(content)slog.Debug("doc-map: injected doc", "path", entry.path, "bytes", len(content))[MINOR] Path hardening: doc paths from the local YAML are passed directly to GetAllFilesInPath/GetFileContent. Although the VCS API should constrain paths to the repo, adding local validation (e.g., reject absolute paths, clean and deny any '..' segments) would provide defense-in-depth against potential backend path handling quirks.
Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
9670a5fd)Gpt Review
Summary
Solid implementation of path-scoped design doc injection with clear wiring through flags, action inputs, budgeting, and comprehensive tests. Minor documentation and comment consistency issues noted, but functionally and idiomatically sound.
Findings
budget/budget.goREADME.mdreview/docmap.goreview/docmap.goreview/docmap.goRecommendation
APPROVE — The feature is well-implemented: action inputs are correctly added and passed via environment variables, CLI flags are introduced and validated, documents are matched and loaded via VCS with robust glob matching and a sensible size cap, and the budgeting layer integrates the new 'DesignDocs' section in the appropriate trim order. Tests are comprehensive and cover parsing, matching, directory expansion, size guards, and error handling, aligning with the repository's testing conventions. CI is green. Please address the minor documentation inconsistencies (trim order comment and README inputs table) and consider the small comment and error-reporting nits for clarity. Otherwise, this is good to merge.
Review by gpt
Evaluated against
9670a5fd@@ -0,0 +1,303 @@// Package review provides doc-map parsing and doc injection for path-scoped[NIT] Comment for globMatch mentions 'Standard path.Match patterns' while code uses filepath.Match. Consider aligning the comment to say 'filepath.Match patterns' for accuracy.
@@ -0,0 +50,4 @@// Unknown top-level keys produce a warning but are not fatal.func ParseDocMapConfig(localPath string) (*DocMapConfig, error) {data, err := readFileBytes(localPath)if err != nil {[NIT] In ParseDocMapConfig fallback, the returned error on second unmarshal failure wraps the original strict-mode error (err), potentially obscuring the relaxed unmarshal error (err2). Consider returning or including the second error for clearer diagnostics.
@@ -0,0 +229,4 @@slog.Debug("doc-map: injected doc", "path", entry.path, "bytes", len(content))}}[NIT] sortDocEntries uses a manual insertion sort; consider using sort.Slice for simplicity and readability since the standard library provides it.
Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
f7815b87)Sonnet Review
Summary
A well-structured feature addition implementing path-scoped design document injection. The code follows established Go patterns, is well-tested, and handles security concerns (path traversal) and error cases correctly. A few minor issues worth noting but none are blockers.
Findings
review/docmap.goreview/docmap.goreview/docmap.gosort.Slice(entries, func(i, j int) bool { return entries[i].path < entries[j].path }). The comment says "doc lists are small" which justifies the O(n²) complexity, but the standard library sort is both clearer and handles all sizes correctly. This is a NIT-level style issue per project conventions.review/docmap.go// Package review ...format documented in the Go patterns (documentation.md pattern #3). The comment reads// doc-map parsing and doc injection for path-scoped design document context in AI code reviews.— but this is a file-level comment, not a package comment. The review package presumably has a package comment elsewhere; this file comment is fine as a file-level description, though it's atypical (most Go files don't have pre-package file comments).review/docmap_test.godirs map[string]map[string]string // dir path -> (file path -> content)— the comment is useful but gofmt may reformat the spacing. Not a correctness issue.PLAN-137.mdRecommendation
APPROVE — Approve. The implementation is solid: the DocFetcher interface keeps the review package decoupled from cmd-level dependencies (correct application of the interface pattern), the glob matching correctly handles
**, path traversal is defended at both the config-reading level and before VCS calls, the budget integration is clean with the right trimming priority, and test coverage is thorough (parse, match, load, dir expansion, truncation, path traversal rejection). The duplicated truncateUTF8 is explicitly acknowledged with a justification comment, which is the right approach. The minor findings are cosmetic or informational and do not affect correctness or security.Review by sonnet
Evaluated against
f7815b87@@ -0,0 +1,194 @@# Plan: Issue #137 — doc-map input for path-scoped doc injection[NIT] Committing a PLAN file to the repository is unusual and will persist in the repo history as a planning artifact. Consider whether this file should be in the PR description or a wiki instead. That said, some projects intentionally keep ADRs/plans in the repo, so this is a process note rather than a code issue.
@@ -0,0 +1,332 @@// doc-map parsing and doc injection for path-scoped design document context in AI code reviews.[NIT] The package doc comment is a single-line imperative description rather than the conventional
// Package review ...format documented in the Go patterns (documentation.md pattern #3). The comment reads// doc-map parsing and doc injection for path-scoped design document context in AI code reviews.— but this is a file-level comment, not a package comment. The review package presumably has a package comment elsewhere; this file comment is fine as a file-level description, though it's atypical (most Go files don't have pre-package file comments).@@ -0,0 +54,4 @@}var cfg DocMapConfigif err := yaml.UnmarshalWithOptions(data, &cfg, yaml.Strict()); err != nil {[MINOR] ParseDocMapConfig uses yaml.Strict() first, then re-parses with yaml.Unmarshal on failure to get the 'unknown keys' warning. This double-parse is slightly wasteful but more importantly, it means the error message logged is the strict-mode error (which describes the unknown key) rather than a custom message. The current approach works correctly but could be simplified: just always use non-strict mode and rely on the YAML library's warning mechanisms if available, or accept that users won't get per-key detail. The current approach is acceptable given the go-yaml library's API.
@@ -0,0 +167,4 @@// - Paths that look like directories (end with /, or GetAllFilesInPath returns files)// are expanded to all .md files under them.// - Missing files are logged as warnings and skipped.// - Total content is capped at opts.MaxBytes; truncation is noted inline.[MINOR] LoadMatchingDocs writes a truncation notice ("⚠️ Design document context truncated — size limit reached.") to the string builder when limitReached is set inside the outer loop over entries, but this notice can be written multiple times if the outer loop continues processing additional docPaths after limitReached is already set. The outer loop does break on limitReached, so the notice in the inner loop (line 229) is the concern. Looking more carefully: the inner loop only writes the notice once then breaks, but if the same doc triggers truncation, the outer notice at line 229 fires then the outer loop also has a break due to limitReached. The flow is actually correct — the inner notice fires at most once (truncated flag set, limitReached set, inner break), and the outer loop breaks at the top. No double-write occurs. However, this is subtle enough to warrant a clarifying comment.
@@ -0,0 +299,4 @@}// validateDocPath rejects doc paths that could cause path traversal via the// VCS API (absolute paths, any ".." segment). Defense-in-depth: the VCS API[MINOR] sortDocEntries uses an insertion sort implemented manually. The standard library's sort.Slice would be idiomatic and more readable:
sort.Slice(entries, func(i, j int) bool { return entries[i].path < entries[j].path }). The comment says "doc lists are small" which justifies the O(n²) complexity, but the standard library sort is both clearer and handles all sizes correctly. This is a NIT-level style issue per project conventions.@@ -0,0 +11,4 @@// fakeDocFetcher is a mock DocFetcher for tests.type fakeDocFetcher struct {files map[string]string // path -> content[NIT] The fakeDocFetcher struct fields lack alignment/formatting:
dirs map[string]map[string]string // dir path -> (file path -> content)— the comment is useful but gofmt may reformat the spacing. Not a correctness issue.Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
f7815b87)Security Review
Summary
The PR adds path-scoped design doc injection with careful input validation, prompt-injection mitigation, and secure handling in the composite action. Overall, changes are security-conscious; the only concern is potential resource exhaustion when expanding directories of docs.
Findings
review/docmap.goreview/docmap.goRecommendation
APPROVE — Overall, the implementation is robust: the composite action continues to guard against SSRF, the new doc-map is validated within the workspace, VCS fetches are restricted to repo paths with traversal checks, and design docs are injected with explicit prompt-injection guidance. To harden further against resource exhaustion, avoid fetching entire directories in one call: add a list-only API (e.g., GetFileListInPath) and then fetch file contents one-by-one, stopping once MaxBytes or a reasonable file count cap is reached (e.g., 50 files). Alternatively, enforce a maximum number of files and per-file size limits when expanding directories. For path validation, normalize inputs and reject percent-encoded traversal, backslashes, and non-printable characters. With these adjustments, the feature will be more resilient to accidental or malicious large inputs. Since CI passed and no exploitable vulnerabilities were found, approval is recommended.
Review by security
Evaluated against
f7815b87@@ -0,0 +232,4 @@if sb.Len() == 0 {return "", nil}return sb.String(), nil[MINOR] Potential resource exhaustion: loadDocEntries uses GetAllFilesInPath to retrieve all files and their full contents for a directory, with no cap on number of files or total bytes fetched from the VCS before applying the post-fetch MaxBytes limit. A large directory of .md files could cause high memory/network usage before truncation.
@@ -0,0 +293,4 @@}}// readFileBytes reads the contents of a local file.[MINOR] Path validation defense-in-depth: validateDocPath rejects absolute paths and literal '..' segments but does not normalize or reject percent-encoded traversal (e.g., %2e%2e), backslashes on Windows, or control characters. While doc-map is typically trusted, adding stricter validation would harden against edge cases.
Gpt Review
Summary
Solid addition of path-scoped design doc injection with clear budgeting, secure path validation, and thorough tests. CI passed and the implementation aligns with repo conventions and error-handling patterns.
Findings
cmd/review-bot/main.goreview/docmap.goRecommendation
APPROVE — The feature is well-implemented: action inputs are wired correctly, budgeting gained a dedicated DesignDocs section with proper trim order, the doc-map parser handles unknown keys leniently, path traversal is guarded, glob matching supports **, and tests cover key cases including directory expansion, size limits, and bad paths. CI passed, so approval is appropriate. Consider removing the redundant error check after LoadMatchingDocs in main.go (or have LoadMatchingDocs propagate an aggregated error if that's desired), and adjust the misleading splitPath comment in review/docmap.go. Otherwise, this is ready to merge.
Review by gpt
Evaluated against
f7815b87@@ -353,0 +372,4 @@changedPaths = append(changedPaths, f.Filename)}matchedDocs := review.MatchDocs(docMapCfg, changedPaths)[MINOR] LoadMatchingDocs is checked for a non-nil error, but LoadMatchingDocs currently never returns an error (it logs and skips per-file issues). This check is redundant and can be removed, or LoadMatchingDocs should propagate an aggregated error if intended.
@@ -0,0 +155,4 @@}patParts = patParts[1:]pathParts = pathParts[1:]}[NIT] Comment in splitPath says "Clean and split on "/"" but the implementation does not clean the path, it only splits. Update the comment to avoid confusion or add cleaning if truly intended.
Self-review against
60c6bd9f49Assessment: ✅ Clean
No issues found — ready for human review.
Checks performed:
ParseDocMapConfig,MatchDocs,LoadMatchingDocs,globMatch(viaTestGlobMatch)budget.DesignDocspath throughFit()andbuildResult()covered by 3 new testsvalidateDocPath(absolute paths,..traversal);validateWorkspacePathfor local config filebuildResult()forDesignDocssectionPLAN-137.mdremoved (contained raw thinking stream); replaced with cleandocs/DESIGN-137-doc-map.mdCHANGELOG.mdandREADME.mdupdatedmake precommitclean (go vet,go test ./...)Sonnet Review
Summary
A well-implemented feature that adds path-scoped design doc injection. The implementation is clean, idiomatic Go with good test coverage, proper error handling, and thoughtful security measures (path traversal guards, prompt injection notice). A few minor observations around code structure and edge cases, but nothing blocking.
Findings
review/docmap.goLoadMatchingDocsfunction has a subtle double-write bug for the truncation notice. Whenavailable <= 0is detected at the start of the inner loop, it writes a truncation notice and breaks. But whenlen(content) > availablecauses a mid-content truncation, it writes a different truncation notice inline. Then, on the next outer-loop iteration,limitReachedis true and we hit theif limitReached { break }before writing another notice. This works correctly, but there are two different truncation messages (one at the outer level `Recommendation
APPROVE — APPROVE. The implementation is solid: correct glob semantics with
**support, proper path traversal validation (both for the doc-map YAML file viavalidateWorkspacePathand for doc paths viavalidateDocPath), prompt injection guard in the budget section header, graceful degradation on missing files, context size limiting with inline truncation notices, and comprehensive test coverage. The CI passes.The MINOR findings are worth noting for follow-up but do not block merging: (1) the double truncation-message paths are functionally correct but could confuse future maintainers; (2) the empty-directory fallthrough to single-file-fetch could produce confusing errors for trailing-slash directory paths that exist but are empty. These are edge cases that the test suite doesn't currently exercise. Consider adding a test for an empty directory path with trailing slash to catch the fallthrough behavior.
Review by sonnet
Evaluated against
60c6bd9f@@ -0,0 +19,4 @@)// DocMapping maps a set of path glob patterns to governing doc files/directories.type DocMapping struct {[NIT] The package doc comment is at the file level but not in the
// Package review ...form. Per the documentation pattern, the package-level comment (if this is the first file users will encounter for new functionality) should be// Package review .... However, since this is a new file in an existing package and the primary package doc presumably exists elsewhere, a file-level comment is acceptable. The comment could be more explicit:// Package review provides ...or simply be a file comment. Minor style nit.@@ -0,0 +86,4 @@result = append(result, doc)}}}[MINOR] The
ParseDocMapConfigstrict-mode fallback parses twice on any strict-mode failure, not just unknown-key failures. If the YAML is structurally valid but has a type mismatch (e.g.,paths: "not-a-list"), the strict parse fails, the relaxed parse also fails (returning an error), and the function returns the strict-mode error wrappinglocalPath. But for unknown keys, it succeeds on the relaxed parse and logs a warning. The comment says "Re-parse without strict mode to log which keys are unknown" — but the code doesn't actually log which keys are unknown; it just logs the original strict error. The behavior is correct (lenient for unknown keys, strict for structural issues) but the comment is misleading about what gets logged.@@ -0,0 +159,4 @@// All pattern parts consumed; path must also be consumed.return len(pathParts) == 0}[MINOR] The
loadDocEntriesfunction silently falls through from a successful-but-emptyGetAllFilesInPathcall to aGetFileContentcall. If a user specifies a directory path likedocs/domain/(with trailing slash) and the directory exists but has no files, the function will then try to fetchdocs/domain/as a single file, which will likely fail with an obscure error. The debug log message only fires whendirErr != nil, not when the directory exists but is empty. This means the caller's debug log"doc-map: no .md files found under path"inLoadMatchingDocswill never fire for this case — instead it falls to the file-fetch error path. Consider: ifGetAllFilesInPathreturns nil error with an empty map, and the path ends with/, skip the single-file fallback and return empty entries directly.@@ -0,0 +196,4 @@slog.Debug("doc-map: no .md files found under path", "path", docPath)continue}[MINOR] The
LoadMatchingDocsfunction has a subtle double-write bug for the truncation notice. Whenavailable <= 0is detected at the start of the inner loop, it writes a truncation notice and breaks. But whenlen(content) > availablecauses a mid-content truncation, it writes a different truncation notice inline. Then, on the next outer-loop iteration,limitReachedis true and we hit theif limitReached { break }before writing another notice. This works correctly, but there are two different truncation messages (one at the outer level `@@ -0,0 +250,4 @@}// Try directory expansion first.files, dirErr := fetcher.GetAllFilesInPath(ctx, owner, repo, docPath)[NIT]
sortDocEntriesuses an insertion sort described as "doc lists are small". This is fine, but Go'ssort.Sliceorslices.SortFunc(Go 1.21+) would be more idiomatic and readable with no meaningful performance difference at these sizes. The comment justifying insertion sort is unnecessary ceremony for a standard sort.@@ -0,0 +7,4 @@"path/filepath""strings""testing")[NIT] The
fakeDocFetcherstruct has inconsistent alignment on the field comments —dirshas a trailing space before the comment to align withfiles. This would normally be fixed by gofmt automatically, but gofmt aligns struct field comments differently from map literal comments. Not a real issue, just a minor style note.Security Review
Summary
The feature is implemented thoughtfully with budget limits and path traversal checks, but there is a critical data exposure risk. Loading the doc-map from the PR workspace allows an untrusted PR author to cause arbitrary repository documents (from the default branch) to be sent to external LLM providers.
Findings
cmd/review-bot/main.goRecommendation
REQUEST_CHANGES — The new doc-map functionality is well-integrated and includes defenses against SSRF and path traversal, plus context-size limits and prompt-injection caution. However, because the doc-map file is read from the PR's workspace, a malicious PR author can alter the doc-map to cause the system to fetch and inject arbitrary documents from the repository's default branch into the LLM request, potentially exposing sensitive content to third-party providers or into PR comments. To resolve this, do one or more of the following: (1) Load the doc-map file from a trusted ref (e.g., default/protected branch) instead of the PR branch; (2) Disable or ignore doc-map on untrusted PRs (e.g., from forks) unless explicitly approved; (3) Enforce an allowlist of doc path prefixes (e.g., only under docs/), and reject patterns that match broad directories (like repo root) to limit exposure; (4) Optionally add a policy flag to require maintainer approval before doc-map-derived content is used. After addressing this data exfiltration vector, the feature appears secure and robust.
Review by security
Evaluated against
60c6bd9f@@ -350,6 +352,46 @@ func main() {slog.Debug("loaded system prompt file", "file", *systemPromptFile, "bytes", len(additionalPrompt))[MAJOR] Doc-map configuration is read from the PR workspace and used to fetch arbitrary docs from the repository (default branch) via the VCS API, which are then sent to external LLMs. An untrusted PR author can modify the doc-map file in their PR to map any changed path to sensitive repository documents, exfiltrating content outside the organization and potentially into PR comments via the model's response. Mitigations: fetch the doc-map from a trusted reference (e.g., default/protected branch) rather than the PR branch, or disable doc-map on untrusted PRs; additionally allowlist permissible doc path prefixes (e.g., only under docs/) and/or require maintainer opt-in before using doc-map on a PR.