feat(#141): validate-docmap subcommand — CI hard-fail on missing docmap coverage #142

Merged
rodin merged 11 commits from issue-141 into main 2026-05-15 07:39:22 +00:00
7 changed files with 950 additions and 51 deletions
+34
View File
@@ -296,6 +296,40 @@ review-bot \
--conventions-file CONVENTIONS.md
```
Review

[NIT] The validate-docmap documentation is clear; consider explicitly noting that docs: symlinks are treated as stale (rejected) for safety, matching the implementation and tests, to prevent surprises.

**[NIT]** The validate-docmap documentation is clear; consider explicitly noting that docs: symlinks are treated as stale (rejected) for safety, matching the implementation and tests, to prevent surprises.
## Subcommands
### `validate-docmap`
Verifies that a `doc-map.yml` is consistent before running a review. Two checks:
1. **Coverage**: every changed file is matched by at least one `paths:` glob.
2. **Stale docs**: every `docs:` entry exists on disk under `--repo-root`.
```bash
# Typical CI usage — pipe git diff into the command
git diff --name-only origin/main HEAD | \
review-bot validate-docmap \
--docmap .review-bot/doc-map.yml \
--repo-root .
```
| Flag | Required | Default | Description |
|------|----------|---------|-------------|
| `--docmap` | Yes | — | Path to doc-map YAML file |
| `--repo-root` | No | `.` (cwd) | Root for resolving `docs:` paths |
Exit codes: `0`=clean, `1`=failures found, `2`=usage/parse error.
### `validate-url`
Resolves a URL and verifies all IPs are publicly routable (used in CI to prevent SSRF).
```bash
review-bot validate-url https://gitea.example.com
```
Exit codes: `0`=safe, `1`=blocked/private IP, `2`=error.
## Environment Variables
All flags have environment variable equivalents:
-42
View File
1
@@ -1,42 +0,0 @@
## Dev Loop Status: 2026-05-15 04:35 UTC
**Repository:** review-bot (rodin/review-bot on Gitea)
**Status:** ✅ OPTIMAL
### Health Check
- **Working tree:** clean
- **Branch:** main (up to date with origin)
- **Build:** ✅ passes (`go build ./cmd/review-bot`)
- **Tests:** ✅ ALL PASS (6/6 packages)
- **Vet:** ✅ clean
- **Open issues:** 0
- **Open PRs:** 0
### Recent Changes
- ✅ Merged PR #140: test coverage improvements (44.6% → 49.3%)
- ✅ PR #137 (doc-map feature) merged successfully with all reviews approved
### Coverage
| Package | Coverage |
|---------|----------|
| cmd/review-bot | 49.3% ↑ |
| gitea | 85.2% |
| github | 86.3% |
| review | 92.0% |
### Status Summary
All recent feature work (doc-map, coverage improvements) successfully integrated. Repository is stable and ready for next development cycle.
### Next Priority
- Continue increasing cmd/review-bot coverage (target: ≥60%)
- Monitor prod logs for edge cases
- VCS integration stable; GitHub + Gitea paths clear
---
_Dev-loop cycle complete at 04:35 UTC._
+2
View File
@@ -64,6 +64,8 @@ func main() {
switch os.Args[1] {
case "validate-url":
os.Exit(runValidateURL(os.Args[2:]))
case "validate-docmap":
os.Exit(runValidateDocmap(os.Args[2:]))
}
}
+263
View File
@@ -0,0 +1,263 @@
package main
import (
"bufio"
"flag"
"fmt"
"io"
"os"
"path/filepath"
"strings"
"gitea.weiker.me/rodin/review-bot/review"
)
Review

[NIT] The maxDocmapBytes constant is typed as int64 solely to compare against fi.Size() (which returns int64). Consider using an untyped constant or keeping it typed but adding a brief comment explaining why int64 was chosen, since the type is not otherwise obvious from context.

**[NIT]** The `maxDocmapBytes` constant is typed as `int64` solely to compare against `fi.Size()` (which returns `int64`). Consider using an untyped constant or keeping it typed but adding a brief comment explaining why `int64` was chosen, since the type is not otherwise obvious from context.
// maxDocmapBytes is the maximum size of the doc-map YAML file that will be
// read. Files larger than this are rejected before reading to prevent memory
// exhaustion from an oversized PR-controlled file.
const maxDocmapBytes int64 = 10 * 1024 * 1024 // 10 MB
// validateDocmapPath checks that localPath is safe to read as the doc-map
// file. It enforces three invariants before the file is opened:
//
// 1. The path resolves to a regular file within resolvedRoot (path
// confinement): prevents a PR-controlled --docmap from reading arbitrary
// host files via absolute paths or ".." traversal.
// 2. The path is not a symlink: prevents denial-of-service via /dev/zero or
// information disclosure via symlinks that point outside the workspace.
// 3. The file does not exceed maxDocmapBytes: prevents memory exhaustion
// from an oversized but legitimately committed doc-map file.
//
// resolvedRoot must already be an absolute, symlink-free path (obtained from
// filepath.Abs + filepath.EvalSymlinks).
func validateDocmapPath(localPath, resolvedRoot string) error {
Review

[MINOR] validateDocmapPath does not verify that the docmap input is a regular file. While symlinks are rejected and the path is confined to the repo root, allowing non-regular files (e.g., FIFOs, sockets, device nodes) could lead to denial-of-service (e.g., blocking reads) if such a special file exists in the workspace. Consider checking fi.Mode().IsRegular() and rejecting otherwise.

**[MINOR]** validateDocmapPath does not verify that the docmap input is a regular file. While symlinks are rejected and the path is confined to the repo root, allowing non-regular files (e.g., FIFOs, sockets, device nodes) could lead to denial-of-service (e.g., blocking reads) if such a special file exists in the workspace. Consider checking fi.Mode().IsRegular() and rejecting otherwise.
// Resolve the docmap path to an absolute path.
absPath, err := filepath.Abs(localPath)
if err != nil {
return fmt.Errorf("cannot resolve path: %w", err)
}
// Lstat: do NOT follow symlinks. We want to inspect the entry itself.
fi, err := os.Lstat(absPath)
Review

[MINOR] validateDocmapPath performs os.Lstat on the provided --docmap path before verifying it is confined within --repo-root. While content is not read and the flag is typically controlled by CI, reordering the confinement check before any filesystem access would avoid even a metadata probe on arbitrary host paths if the path were ever user-controlled.

**[MINOR]** validateDocmapPath performs os.Lstat on the provided --docmap path before verifying it is confined within --repo-root. While content is not read and the flag is typically controlled by CI, reordering the confinement check before any filesystem access would avoid even a metadata probe on arbitrary host paths if the path were ever user-controlled.
if err != nil {
return fmt.Errorf("cannot stat file: %w", err)
Outdated
Review

[MINOR] The EvalSymlinks call on --repo-root (line ~91) will fail with an error if the directory does not yet exist, which gives a confusing error message at startup rather than a clear 'directory not found' message. The error message just says 'failed to resolve --repo-root' without distinguishing a non-existent directory from a permission error. Not a functional bug, but a UX rough edge.

**[MINOR]** The EvalSymlinks call on --repo-root (line ~91) will fail with an error if the directory does not yet exist, which gives a confusing error message at startup rather than a clear 'directory not found' message. The error message just says 'failed to resolve --repo-root' without distinguishing a non-existent directory from a permission error. Not a functional bug, but a UX rough edge.
}
// Reject symlinks outright — a symlink can point to /dev/zero or arbitrary
// host paths, bypassing both the confinement check and the size check.
if fi.Mode()&os.ModeSymlink != 0 {
Outdated
Review

[NIT] The usage message emitted via multiple fmt.Fprintln(errWriter, "") calls (blank line) could use a single fmt.Fprintf with \n\n for readability. Minor style issue only.

**[NIT]** The usage message emitted via multiple `fmt.Fprintln(errWriter, "")` calls (blank line) could use a single `fmt.Fprintf` with `\n\n` for readability. Minor style issue only.
Review

[MINOR] validateDocmapPath rejects symlinks and enforces a size cap but does not verify that the target is a regular file (fi.Mode().IsRegular()). Adding this check would defensively prevent interacting with special files (FIFOs/devices) in non-standard environments and align with the comment that it should be a "regular file."

**[MINOR]** validateDocmapPath rejects symlinks and enforces a size cap but does not verify that the target is a regular file (fi.Mode().IsRegular()). Adding this check would defensively prevent interacting with special files (FIFOs/devices) in non-standard environments and align with the comment that it should be a "regular file."
return fmt.Errorf("symlinks are not allowed")
}
// Confine to resolvedRoot: the cleaned absolute path must be a descendant
// of the repo root. This catches paths that escaped via "..", absolute
// paths that happen to be outside the root, etc.
rel, err := filepath.Rel(resolvedRoot, absPath)
Outdated
Review

[MAJOR] Docmap file path is read directly with review.ParseDocMapConfig without validating that the path is within --repo-root and not a symlink. A malicious PR can create .review-bot/doc-map.yml as a symlink to a large or special file (e.g., /dev/zero) causing resource exhaustion or reading arbitrary host files. Harden by resolving and confining the docmap path to repoRoot and rejecting symlinks before reading.

**[MAJOR]** Docmap file path is read directly with review.ParseDocMapConfig without validating that the path is within --repo-root and not a symlink. A malicious PR can create .review-bot/doc-map.yml as a symlink to a large or special file (e.g., /dev/zero) causing resource exhaustion or reading arbitrary host files. Harden by resolving and confining the docmap path to repoRoot and rejecting symlinks before reading.
Outdated
Review

[MINOR] There is no upper bound on the size of the docmap YAML when reading it (os.ReadFile). Even without symlinks, a large committed doc-map file could cause excessive memory usage. Consider checking file size via os.Lstat before reading and enforcing a reasonable cap.

**[MINOR]** There is no upper bound on the size of the docmap YAML when reading it (os.ReadFile). Even without symlinks, a large committed doc-map file could cause excessive memory usage. Consider checking file size via os.Lstat before reading and enforcing a reasonable cap.
if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(os.PathSeparator)) {
return fmt.Errorf("path must be within --repo-root")
Outdated
Review

[MINOR] The coverage check reports all uncovered files even when the docmap has zero mappings (empty config). This is technically correct behaviour, but an empty docmap + any changed files will fail with a coverage error. There is no documented special case for an empty docmap meaning 'coverage not required'. This is a design choice but could surprise users who have the --docmap flag set but an empty file. Consider documenting this in the usage text or treating an empty docmap as 'coverage not enforced'.

**[MINOR]** The coverage check reports all uncovered files even when the docmap has zero mappings (empty config). This is technically correct behaviour, but an empty docmap + any changed files will fail with a coverage error. There is no documented special case for an empty docmap meaning 'coverage not required'. This is a design choice but could surprise users who have the --docmap flag set but an empty file. Consider documenting this in the usage text or treating an empty docmap as 'coverage not enforced'.
}
// Enforce size cap before reading to prevent memory exhaustion.
Review

[MINOR] validateDocmapPath does not explicitly ensure the target is a regular file (fi.Mode().IsRegular()). A directory will be rejected later by ParseDocMapConfig via os.ReadFile, but an early explicit check would yield a clearer error and avoid attempting to read directories.

**[MINOR]** validateDocmapPath does not explicitly ensure the target is a regular file (fi.Mode().IsRegular()). A directory will be rejected later by ParseDocMapConfig via os.ReadFile, but an early explicit check would yield a clearer error and avoid attempting to read directories.
if fi.Size() > maxDocmapBytes {
return fmt.Errorf("file size %d bytes exceeds %d-byte limit", fi.Size(), maxDocmapBytes)
}
return nil
}
// runValidateDocmap implements the `review-bot validate-docmap` subcommand.
Review

[NIT] The validateDocmapPath function documents that resolvedRoot must be an absolute, symlink-free path, but this is only enforced by convention (the caller must pass the right value). An internal assertion or a brief check if !filepath.IsAbs(resolvedRoot) at the top of the function would make the contract self-enforcing. Low priority since the caller in runValidateDocmap does this correctly.

**[NIT]** The `validateDocmapPath` function documents that `resolvedRoot` must be an absolute, symlink-free path, but this is only enforced by convention (the caller must pass the right value). An internal assertion or a brief check `if !filepath.IsAbs(resolvedRoot)` at the top of the function would make the contract self-enforcing. Low priority since the caller in `runValidateDocmap` does this correctly.
//
// It reads changed file paths from stdin (one per line, as produced by
// `git diff --name-only`), parses a doc-map YAML file, and performs two checks:
//
// 1. Coverage check: every changed file must be matched by at least one
Outdated
Review

[NIT] When matching coverage, changed file paths from stdin are used as-is. For robustness on Windows inputs, normalize backslashes to forward slashes before matching (e.g., f = strings.ReplaceAll(f, "\", "/")).

**[NIT]** When matching coverage, changed file paths from stdin are used as-is. For robustness on Windows inputs, normalize backslashes to forward slashes before matching (e.g., f = strings.ReplaceAll(f, "\\", "/")).
// paths: glob in the docmap. Fails if any file is uncovered.
Outdated
Review

[MINOR] readLines reads from os.Stdin directly rather than accepting an io.Reader parameter. This forces tests to swap os.Stdin (which is fragile and not goroutine-safe), whereas accepting io.Reader would let callers pass any reader. The existing test helper works around this by redirecting os.Stdin, but the function signature itself is less testable than it could be. Compare with validateurl.go pattern — if that also uses os.Stdin directly this is consistent, but it's still worth noting.

**[MINOR]** readLines reads from os.Stdin directly rather than accepting an io.Reader parameter. This forces tests to swap os.Stdin (which is fragile and not goroutine-safe), whereas accepting io.Reader would let callers pass any reader. The existing test helper works around this by redirecting os.Stdin, but the function signature itself is less testable than it could be. Compare with validateurl.go pattern — if that also uses os.Stdin directly this is consistent, but it's still worth noting.
//
// 2. Stale-docs check: every docs: entry in the docmap must exist on disk
// (relative to --repo-root). Fails if any path is missing.
//
// Both checks always run — all failures are reported before exiting.
//
Review

[MINOR] The stale-docs check (checkStaleDocs) runs even when no changed files were provided on stdin (empty stdin case). This means a docmap with stale docs: entries will always fail, even on runs where no files changed — which is arguably the correct behavior, but worth a comment clarifying the intent. Currently the empty-stdin test passes because the test fixture has the doc file present, so it's fine, but the behavior asymmetry (coverage check is vacuously true on empty input, stale-docs check is not) is undocumented.

**[MINOR]** The stale-docs check (`checkStaleDocs`) runs even when no changed files were provided on stdin (empty stdin case). This means a docmap with stale `docs:` entries will always fail, even on runs where no files changed — which is arguably the correct behavior, but worth a comment clarifying the intent. Currently the empty-stdin test passes because the test fixture has the doc file present, so it's fine, but the behavior asymmetry (coverage check is vacuously true on empty input, stale-docs check is not) is undocumented.
// Exit codes:
//
// 0 — clean (all files covered, all docs exist)
// 1 — one or more coverage or stale-doc failures
// 2 — usage error, missing flag, or YAML parse error
func runValidateDocmap(args []string) int {
security-review-bot marked this conversation as resolved Outdated
Outdated
Review

[MINOR] repoRoot is only cleaned (filepath.Clean) but not resolved; resolving repoRoot with filepath.Abs + filepath.EvalSymlinks before Rel checks would harden against cases where the root itself is a symlink and reduce ambiguity across platforms.

**[MINOR]** repoRoot is only cleaned (filepath.Clean) but not resolved; resolving repoRoot with filepath.Abs + filepath.EvalSymlinks before Rel checks would harden against cases where the root itself is a symlink and reduce ambiguity across platforms.
Outdated
Review

[MINOR] The stale-docs check calls filepath.EvalSymlinks(absRoot) on the --repo-root path, which will fail with an error if the directory does not yet exist (e.g. a new repo checkout where the root hasn't been created). In practice --repo-root defaults to . (cwd) which always exists, so this is low-risk, but an explicit os.Stat check before EvalSymlinks would give a clearer error message when the root is genuinely missing vs. a symlink resolution failure.

**[MINOR]** The stale-docs check calls `filepath.EvalSymlinks(absRoot)` on the `--repo-root` path, which will fail with an error if the directory does not yet exist (e.g. a new repo checkout where the root hasn't been created). In practice `--repo-root` defaults to `.` (cwd) which always exists, so this is low-risk, but an explicit `os.Stat` check before `EvalSymlinks` would give a clearer error message when the root is genuinely missing vs. a symlink resolution failure.
fs := flag.NewFlagSet("validate-docmap", flag.ContinueOnError)
fs.SetOutput(errWriter)
Outdated
Review

[NIT] The error message 'ERROR: stale docmap docs: entries (paths do not exist):' has an awkward colon after 'docs:' — this reads as if 'docs:' is YAML syntax leaking into the message. Consider 'ERROR: stale docmap entries (docs: paths that do not exist on disk):' or simply 'ERROR: docs: entries not found on disk:'.

**[NIT]** The error message 'ERROR: stale docmap docs: entries (paths do not exist):' has an awkward colon after 'docs:' — this reads as if 'docs:' is YAML syntax leaking into the message. Consider 'ERROR: stale docmap entries (docs: paths that do not exist on disk):' or simply 'ERROR: docs: entries not found on disk:'.
docmapFlag := fs.String("docmap", "", "Path to doc-map YAML file (required)")
repoRootFlag := fs.String("repo-root", ".", "Repo root for resolving docs: paths (default: cwd)")
Review

[MINOR] The stale-docs check runs even when --repo-root resolution fails, but only coverage failures are accumulated before the repo-root resolution block. If filepath.Abs or filepath.EvalSymlinks fails the function returns exit code 2 immediately — meaning a coverage failure that was already accumulated (and stored in failed = true) is silently swallowed, and the user sees exit 2 with only the repo-root error rather than exit 1 with both failures. Consider either: (a) emitting the accumulated coverage failures before returning 2, or (b) resolving repo-root before the coverage check so all errors are consistently collected.

**[MINOR]** The stale-docs check runs even when `--repo-root` resolution fails, but only coverage failures are accumulated before the repo-root resolution block. If `filepath.Abs` or `filepath.EvalSymlinks` fails the function returns exit code 2 immediately — meaning a coverage failure that was already accumulated (and stored in `failed = true`) is silently swallowed, and the user sees exit 2 with only the repo-root error rather than exit 1 with both failures. Consider either: (a) emitting the accumulated coverage failures before returning 2, or (b) resolving repo-root before the coverage check so all errors are consistently collected.
if err := fs.Parse(args); err != nil {
Review

[NIT] The usage message prints flag help manually (fmt.Fprintln chains) rather than calling fs.Usage(). This is inconsistent with how flag.FlagSet normally surfaces help and could get out of sync with the registered flags. Minor since this is a short flag set, but worth noting.

**[NIT]** The usage message prints flag help manually (fmt.Fprintln chains) rather than calling `fs.Usage()`. This is inconsistent with how flag.FlagSet normally surfaces help and could get out of sync with the registered flags. Minor since this is a short flag set, but worth noting.
// flag.ContinueOnError already wrote the error to errWriter.
return 2
}
Outdated
Review

[NIT] The stale-docs check always runs against ALL mappings in the docmap regardless of whether those mappings were triggered by any changed files. This is intentional per the PR description ('all docs: entries must exist on disk'), but the behaviour differs from what a user might expect: a mapping for legacy/** with a stale doc path will fail even when no legacy files changed. A short doc comment on checkStaleDocs clarifying this global-check intent would help future maintainers.

**[NIT]** The stale-docs check always runs against ALL mappings in the docmap regardless of whether those mappings were triggered by any changed files. This is intentional per the PR description ('all docs: entries must exist on disk'), but the behaviour differs from what a user might expect: a mapping for `legacy/**` with a stale doc path will fail even when no legacy files changed. A short doc comment on checkStaleDocs clarifying this global-check intent would help future maintainers.
Outdated
Review

[NIT] The flag is named --docmap (no hyphen between doc and map) but the README, function name, and error messages consistently spell it doc-map. Minor inconsistency — --doc-map would match the established naming convention of --doc-map-max-bytes and the config file name doc-map.yml.

**[NIT]** The flag is named `--docmap` (no hyphen between doc and map) but the README, function name, and error messages consistently spell it `doc-map`. Minor inconsistency — `--doc-map` would match the established naming convention of `--doc-map-max-bytes` and the config file name `doc-map.yml`.
if *docmapFlag == "" {
fmt.Fprintln(errWriter, "Error: --docmap is required")
fmt.Fprintln(errWriter, "")
fmt.Fprintln(errWriter, "usage: review-bot validate-docmap --docmap <path> [--repo-root <dir>]")
Outdated
Review

[MINOR] The error message for stale docs reads "ERROR: stale docmap docs: entries (paths do not exist):" — the docs: YAML key embedded in the prose makes it slightly awkward to read ("stale docmap docs: entries"). Consider "ERROR: stale docs: entries in docmap do not exist on disk:" for clarity.

**[MINOR]** The error message for stale docs reads `"ERROR: stale docmap docs: entries (paths do not exist):"` — the `docs:` YAML key embedded in the prose makes it slightly awkward to read ("stale docmap docs: entries"). Consider `"ERROR: stale docs: entries in docmap do not exist on disk:"` for clarity.
fmt.Fprintln(errWriter, " Changed files are read from stdin, one per line.")
fmt.Fprintln(errWriter, " Example: git diff --name-only origin/main HEAD | review-bot validate-docmap --docmap .review-bot/doc-map.yml")
return 2
}
Outdated
Review

[NIT] Deduplication of docs: entries uses the raw docPath string; consider normalizing (e.g., filepath.Clean and using forward slashes) before inserting into the seen set to avoid duplicate reports for equivalent paths like "docs/./a.md" vs "docs/a.md".

**[NIT]** Deduplication of docs: entries uses the raw docPath string; consider normalizing (e.g., filepath.Clean and using forward slashes) before inserting into the seen set to avoid duplicate reports for equivalent paths like "docs/./a.md" vs "docs/a.md".
// Resolve repoRoot first — the docmap path is validated against it below.
// Use an absolute, symlink-free path so a symlinked --repo-root cannot
// bypass the escape guard in validateDocmapPath or checkStaleDocs.
absRoot, err := filepath.Abs(*repoRootFlag)
if err != nil {
fmt.Fprintf(errWriter, "Error: failed to resolve --repo-root %q: %v\n", *repoRootFlag, err)
return 2
}
resolvedRoot, err := filepath.EvalSymlinks(absRoot)
if err != nil {
if os.IsNotExist(err) {
fmt.Fprintf(errWriter, "Error: --repo-root %q does not exist\n", *repoRootFlag)
security-review-bot marked this conversation as resolved Outdated
Outdated
Review

[MAJOR] Path traversal/existence probing: checkStaleDocs joins untrusted docmap paths with --repo-root and calls os.Stat without validating that the doc path is relative and confined to repoRoot. A malicious PR can include entries like "../../etc/passwd" or absolute paths to probe the CI runner filesystem. Validate doc paths (reject absolute and '..' segments) and enforce confinement with filepath.Rel/EvalSymlinks before stat.

**[MAJOR]** Path traversal/existence probing: checkStaleDocs joins untrusted docmap paths with --repo-root and calls os.Stat without validating that the doc path is relative and confined to repoRoot. A malicious PR can include entries like "../../etc/passwd" or absolute paths to probe the CI runner filesystem. Validate doc paths (reject absolute and '..' segments) and enforce confinement with filepath.Rel/EvalSymlinks before stat.
Outdated
Review

[MINOR] checkStaleDocs confines paths via filepath.Clean + filepath.Rel but does not resolve symlinks. A symlink under repo-root could point outside and os.Stat would follow it, potentially leaking host file existence. Consider using filepath.EvalSymlinks on fullPath and re-checking it stays under repoRoot before statting.

**[MINOR]** checkStaleDocs confines paths via filepath.Clean + filepath.Rel but does not resolve symlinks. A symlink under repo-root could point outside and os.Stat would follow it, potentially leaking host file existence. Consider using filepath.EvalSymlinks on fullPath and re-checking it stays under repoRoot before statting.
} else {
security-review-bot marked this conversation as resolved Outdated
Outdated
Review

[MAJOR] Path traversal/information disclosure: stale-docs check joins repoRoot with unvalidated doc paths and calls os.Stat without enforcing that the resolved path stays within repoRoot. Absolute paths or ".." segments in doc-map docs: entries can cause os.Stat to probe arbitrary filesystem locations (e.g., /etc/passwd), creating an existence oracle visible in CI logs.

**[MAJOR]** Path traversal/information disclosure: stale-docs check joins repoRoot with unvalidated doc paths and calls os.Stat without enforcing that the resolved path stays within repoRoot. Absolute paths or ".." segments in doc-map docs: entries can cause os.Stat to probe arbitrary filesystem locations (e.g., /etc/passwd), creating an existence oracle visible in CI logs.
fmt.Fprintf(errWriter, "Error: failed to resolve --repo-root %q: %v\n", *repoRootFlag, err)
}
return 2
}
// Harden the docmap file path before reading it. The --docmap flag value
// may reference a PR-controlled file (e.g. .review-bot/doc-map.yml).
// Validate that it:
// 1. Resolves within resolvedRoot (prevent reading arbitrary host files).
Outdated
Review

[NIT] readLines uses bufio.Scanner with the default 64KB token buffer. Very long file paths (unlikely but possible with generated paths) would silently be truncated by scanner.Scan() returning false without error when a line exceeds the buffer. For a CI tool reading git diff output this is practically fine, but a comment noting the 64KB line limit or using scanner.Buffer to increase it would be defensive.

**[NIT]** readLines uses bufio.Scanner with the default 64KB token buffer. Very long file paths (unlikely but possible with generated paths) would silently be truncated by scanner.Scan() returning false without error when a line exceeds the buffer. For a CI tool reading git diff output this is practically fine, but a comment noting the 64KB line limit or using scanner.Buffer to increase it would be defensive.
// 2. Is not a symlink (prevent /dev/zero or symlink-based host probing).
// 3. Does not exceed maxDocmapBytes (prevent memory exhaustion from an
// oversized committed file).
if err := validateDocmapPath(*docmapFlag, resolvedRoot); err != nil {
fmt.Fprintf(errWriter, "Error: --docmap %q is invalid: %v\n", *docmapFlag, err)
return 2
}
// Parse docmap YAML.
security-review-bot marked this conversation as resolved Outdated
Outdated
Review

[MAJOR] Symlink traversal allows probing existence of arbitrary host files. checkStaleDocs uses os.Stat on repo-controlled paths after Clean/Rel checks, but does not prevent symlinks under repoRoot from pointing outside the workspace. A malicious PR can add a symlink in the repo and reference it in doc-map to infer presence/absence of sensitive files on the CI runner via pass/fail or logs. Mitigate by using os.Lstat to reject symlinks, or resolve with filepath.EvalSymlinks and re-validate the resolved path remains within repoRoot before stat.

**[MAJOR]** Symlink traversal allows probing existence of arbitrary host files. checkStaleDocs uses os.Stat on repo-controlled paths after Clean/Rel checks, but does not prevent symlinks under repoRoot from pointing outside the workspace. A malicious PR can add a symlink in the repo and reference it in doc-map to infer presence/absence of sensitive files on the CI runner via pass/fail or logs. Mitigate by using os.Lstat to reject symlinks, or resolve with filepath.EvalSymlinks and re-validate the resolved path remains within repoRoot before stat.
cfg, err := review.ParseDocMapConfig(*docmapFlag)
if err != nil {
fmt.Fprintf(errWriter, "Error: failed to parse docmap %q: %v\n", *docmapFlag, err)
return 2
}
// Read changed files from stdin.
changedFiles, err := readLines(os.Stdin)
if err != nil {
fmt.Fprintf(errWriter, "Error: failed to read stdin: %v\n", err)
return 2
}
failed := false
// --- Check 1: Coverage ---
// Note: an empty docmap (no mappings) means every changed file is
Outdated
Review

[MINOR] Path confinement check uses strings.HasPrefix(rel, "..") which may over/under-match on some platforms. Consider tightening to rel == ".." or strings.HasPrefix(rel, ".."+string(os.PathSeparator)) to be precise with path segment boundaries.

**[MINOR]** Path confinement check uses strings.HasPrefix(rel, "..") which may over/under-match on some platforms. Consider tightening to rel == ".." or strings.HasPrefix(rel, ".."+string(os.PathSeparator)) to be precise with path segment boundaries.
// uncovered — there are no patterns to match against. This is intentional:
Review

[MINOR] The readLines function is defined in validatedocmap.go but is a generic utility. If another subcommand file ever needs to read lines from stdin, there will be a naming conflict since both would be in package main. This is fine for now but worth noting — consider a brief comment that this is intentionally scoped to the validate-docmap use case, or move it to a shared helpers file if reuse is anticipated.

**[MINOR]** The `readLines` function is defined in `validatedocmap.go` but is a generic utility. If another subcommand file ever needs to read lines from stdin, there will be a naming conflict since both would be in `package main`. This is fine for now but worth noting — consider a brief comment that this is intentionally scoped to the validate-docmap use case, or move it to a shared helpers file if reuse is anticipated.
// if you declare a doc-map, every changed file must be accounted for.
// On empty stdin the check is vacuously true (no files to cover).
var uncovered []string
for _, f := range changedFiles {
// Normalize Windows-style backslashes to forward slashes so that
// changed-file paths from git on Windows match doc-map globs.
f = strings.ReplaceAll(f, "\\", "/")
if !review.FileCoveredByDocMap(cfg, f) {
uncovered = append(uncovered, f)
}
}
if len(uncovered) > 0 {
Outdated
Review

[NIT] readLines uses bufio.Scanner with the default token limit (~64KB). While file paths are typically short, if extremely long lines were ever piped in, Scanner would error. Consider documenting this assumption or using bufio.Reader with ReadString('\n') for robustness.

**[NIT]** readLines uses bufio.Scanner with the default token limit (~64KB). While file paths are typically short, if extremely long lines were ever piped in, Scanner would error. Consider documenting this assumption or using bufio.Reader with ReadString('\n') for robustness.
failed = true
fmt.Fprintln(errWriter, "ERROR: changed files with no docmap coverage:")
for _, f := range uncovered {
fmt.Fprintf(errWriter, " %s\n", f)
}
}
// --- Check 2: Stale docs ---
// checkStaleDocs validates each path before touching the filesystem; see
// its documentation for the path-traversal hardening applied.
staleDocs := checkStaleDocs(cfg, resolvedRoot)
if len(staleDocs) > 0 {
failed = true
fmt.Fprintln(errWriter, "ERROR: stale docmap docs: entries (paths do not exist):")
for _, d := range staleDocs {
fmt.Fprintf(errWriter, " %s\n", d)
}
}
if failed {
return 1
}
fmt.Fprintln(outWriter, "OK: docmap is valid")
return 0
}
// checkStaleDocs returns deduplicated docs: entries that do not exist under
// repoRoot.
//
// Path-traversal hardening: each docPath is validated with
// review.ValidateDocPath (rejects absolute paths and ".." segments) and then
// confined to repoRoot via filepath.Clean + filepath.Rel before os.Lstat is
// called. Symlinks are treated as stale — a CI tool running against
// PR-controlled content must not follow symlinks that could probe arbitrary
// host paths. Paths that fail any check are treated as invalid (reported as
// stale) without following any symlinks.
func checkStaleDocs(cfg *review.DocMapConfig, repoRoot string) []string {
seen := make(map[string]struct{})
var stale []string
for _, mapping := range cfg.Mappings {
for _, docPath := range mapping.Docs {
if docPath == "" {
continue
}
if _, ok := seen[docPath]; ok {
continue
}
seen[docPath] = struct{}{}
// Guard 1: reject absolute paths and ".." segments sourced from
// PR-controlled YAML before joining with repoRoot.
if err := review.ValidateDocPath(docPath); err != nil {
stale = append(stale, docPath)
continue
}
Review

[NIT] readLines uses bufio.Scanner with the default token size; while fine for filenames, consider documenting or bumping the buffer if you ever accept longer lines via stdin in the future.

**[NIT]** readLines uses bufio.Scanner with the default token size; while fine for filenames, consider documenting or bumping the buffer if you ever accept longer lines via stdin in the future.
// Guard 2: verify the cleaned joined path does not escape repoRoot.
// filepath.Clean resolves any remaining ".." after the join; the
// filepath.Rel check confirms the path is still under repoRoot.
fullPath := filepath.Clean(filepath.Join(repoRoot, filepath.FromSlash(docPath)))
rel, err := filepath.Rel(repoRoot, fullPath)
if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(os.PathSeparator)) {
stale = append(stale, docPath)
continue
}
// Use Lstat (not Stat) so symlinks are never followed. A symlink
// under repoRoot could point anywhere on the host, allowing a
// malicious PR to probe file existence. Treat symlinks as stale.
fi, err := os.Lstat(fullPath)
if err != nil {
stale = append(stale, docPath)
continue
}
if fi.Mode()&os.ModeSymlink != 0 {
stale = append(stale, docPath)
}
}
}
return stale
}
// readLines reads all non-empty trimmed lines from r.
func readLines(r io.Reader) ([]string, error) {
scanner := bufio.NewScanner(r)
var lines []string
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line != "" {
lines = append(lines, line)
}
}
return lines, scanner.Err()
}
+552
View File
@@ -0,0 +1,552 @@
package main
import (
"bytes"
"os"
"path/filepath"
"strings"
"testing"
)
// makeDocmapYAML writes a YAML string to a temp file and returns its path.
// The file is created in t.TempDir() — use makeDocmapInDir when the docmap
// must be located inside a specific repo-root directory.
func makeDocmapYAML(t *testing.T, content string) string {
t.Helper()
f, err := os.CreateTemp(t.TempDir(), "doc-map-*.yml")
if err != nil {
t.Fatalf("CreateTemp: %v", err)
}
defer f.Close()
if _, err := f.WriteString(content); err != nil {
t.Fatalf("WriteString: %v", err)
}
return f.Name()
}
// makeDocmapInDir writes a YAML string to a file inside dir and returns the
// file path. Use this instead of makeDocmapYAML when also passing --repo-root,
// because validateDocmapPath requires the docmap to be within the repo root.
func makeDocmapInDir(t *testing.T, dir, content string) string {
t.Helper()
if err := os.MkdirAll(filepath.Join(dir, ".review-bot"), 0o755); err != nil {
t.Fatalf("MkdirAll: %v", err)
}
path := filepath.Join(dir, ".review-bot", "doc-map.yml")
if err := os.WriteFile(path, []byte(content), 0o644); err != nil {
t.Fatalf("WriteFile: %v", err)
}
return path
}
// makeDocFile creates a file (and any parent dirs) at the given path relative to dir.
func makeDocFile(t *testing.T, dir, rel string) {
t.Helper()
Review

[NIT] The captureOutput helper mutates package-level outWriter/errWriter variables without any synchronisation. This is fine for sequential tests but would race if tests run in parallel (t.Parallel()). No parallel calls exist currently, but a comment noting this limitation would be defensive documentation.

**[NIT]** The `captureOutput` helper mutates package-level `outWriter`/`errWriter` variables without any synchronisation. This is fine for sequential tests but would race if tests run in parallel (`t.Parallel()`). No parallel calls exist currently, but a comment noting this limitation would be defensive documentation.
full := filepath.Join(dir, rel)
if err := os.MkdirAll(filepath.Dir(full), 0o755); err != nil {
t.Fatalf("MkdirAll: %v", err)
}
if err := os.WriteFile(full, []byte("# doc\n"), 0o644); err != nil {
t.Fatalf("WriteFile: %v", err)
}
}
// captureOutput redirects outWriter/errWriter to buffers for the duration of f.
func captureOutput(f func()) (stdout, stderr string) {
var outBuf, errBuf bytes.Buffer
origOut, origErr := outWriter, errWriter
outWriter = &outBuf
errWriter = &errBuf
defer func() {
outWriter = origOut
errWriter = origErr
}()
f()
return outBuf.String(), errBuf.String()
}
func TestRunValidateDocmap_Clean(t *testing.T) {
dir := t.TempDir()
makeDocFile(t, dir, "docs/foo.md")
docmap := makeDocmapInDir(t, dir, `
mappings:
Review

[NIT] TestRunValidateDocmap_Clean has a long comment block explaining why it relies on process-level stdin instead of the stdinValidateDocmap helper, which is defined later in the same file. The test would be cleaner and more consistent if it simply used stdinValidateDocmap with an empty string — which is exactly what TestRunValidateDocmap_EmptyStdin does. These two tests appear to be testing the same case (empty stdin → clean). Consider removing TestRunValidateDocmap_Clean or consolidating it with TestRunValidateDocmap_EmptyStdin.

**[NIT]** TestRunValidateDocmap_Clean has a long comment block explaining why it relies on process-level stdin instead of the stdinValidateDocmap helper, which is defined later in the same file. The test would be cleaner and more consistent if it simply used stdinValidateDocmap with an empty string — which is exactly what TestRunValidateDocmap_EmptyStdin does. These two tests appear to be testing the same case (empty stdin → clean). Consider removing TestRunValidateDocmap_Clean or consolidating it with TestRunValidateDocmap_EmptyStdin.
- paths:
- "lib/foo/**"
docs:
- docs/foo.md
`)
// A covered file with all docs existing → clean.
code, stdout, _ := stdinValidateDocmap(t,
"lib/foo/bar.ex\n",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if code != 0 {
t.Errorf("expected exit 0 for clean, got %d", code)
}
if !strings.Contains(stdout, "OK") {
t.Errorf("expected 'OK' in stdout, got %q", stdout)
}
}
func TestRunValidateDocmap_MissingDocmapFlag(t *testing.T) {
var code int
_, stderr := captureOutput(func() {
code = runValidateDocmap([]string{})
})
if code != 2 {
t.Errorf("expected exit 2 for missing --docmap, got %d", code)
}
if !strings.Contains(stderr, "--docmap") {
t.Errorf("expected --docmap in stderr, got %q", stderr)
}
}
func TestRunValidateDocmap_BadYAML(t *testing.T) {
dir := t.TempDir()
docmap := makeDocmapInDir(t, dir, "mappings: [{{invalid")
var code int
_, stderr := captureOutput(func() {
code = runValidateDocmap([]string{"--docmap", docmap, "--repo-root", dir})
})
if code != 2 {
t.Errorf("expected exit 2 for bad YAML, got %d", code)
}
if !strings.Contains(stderr, "failed to parse") {
t.Errorf("expected parse error in stderr, got %q", stderr)
}
}
func TestRunValidateDocmap_StaleDocs(t *testing.T) {
dir := t.TempDir()
// docs/foo.md does NOT exist on disk.
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/foo/**"
docs:
- docs/foo.md
`)
var code int
_, stderr := captureOutput(func() {
code = runValidateDocmap([]string{
"--docmap", docmap,
"--repo-root", dir,
})
})
if code != 1 {
t.Errorf("expected exit 1 for stale docs, got %d", code)
}
if !strings.Contains(stderr, "docs/foo.md") {
t.Errorf("expected stale path in stderr, got %q", stderr)
}
if !strings.Contains(stderr, "stale docmap") {
t.Errorf("expected 'stale docmap' in stderr, got %q", stderr)
Review

[NIT] stdinValidateDocmap defers f.Close() but the file is opened read-write and os.Stdin is set to f. The defer will close the file descriptor while it may still be in use by runValidateDocmap if Go's test runner parallelises subtests. In practice this is safe because the function runs synchronously and the defer fires after the return, but it is slightly fragile. The existing validateurl pattern likely has the same structure so this is not a regression, just a note.

**[NIT]** stdinValidateDocmap defers f.Close() but the file is opened read-write and os.Stdin is set to f. The defer will close the file descriptor while it may still be in use by runValidateDocmap if Go's test runner parallelises subtests. In practice this is safe because the function runs synchronously and the defer fires after the return, but it is slightly fragile. The existing validateurl pattern likely has the same structure so this is not a regression, just a note.
}
}
// stdinValidateDocmap runs runValidateDocmap with a synthetic stdin.
//
// Implementation note: we write stdinContent to a temp file and point
// os.Stdin at it. The defer f.Close() fires after stdinValidateDocmap
// returns, which is after runValidateDocmap has finished reading stdin
Outdated
Review

[MINOR] The comment on stdinValidateDocmap says "Tests must not call t.Parallel() while sharing the global os.Stdin" but this constraint is not enforced (e.g. via a test-level mutex or a serial subtest runner). The tests currently do not call t.Parallel(), so this is safe today, but the constraint is easy to violate silently when new tests are added. Consider documenting this with a t.Setenv-style workaround or a note that tests in this file must remain serial.

**[MINOR]** The comment on `stdinValidateDocmap` says "Tests must not call t.Parallel() while sharing the global os.Stdin" but this constraint is not enforced (e.g. via a test-level mutex or a serial subtest runner). The tests currently do not call `t.Parallel()`, so this is safe today, but the constraint is easy to violate silently when new tests are added. Consider documenting this with a `t.Setenv`-style workaround or a note that tests in this file must remain serial.
// synchronously — so the file is not closed while still in use.
// Tests must not call t.Parallel() while sharing the global os.Stdin.
func stdinValidateDocmap(t *testing.T, stdinContent string, args []string) (code int, stdout, stderr string) {
Review

[NIT] stdinValidateDocmap leaks the temp file handle — it calls defer f.Close() but the file is passed to os.Stdin before the function returns. Because os.Stdin is restored in the deferred restore and runValidateDocmap reads synchronously, this works in practice, but a reader of the test may be confused. A comment explaining the ordering would help.

**[NIT]** stdinValidateDocmap leaks the temp file handle — it calls defer f.Close() but the file is passed to os.Stdin before the function returns. Because os.Stdin is restored in the deferred restore and runValidateDocmap reads synchronously, this works in practice, but a reader of the test may be confused. A comment explaining the ordering would help.
t.Helper()
// Write stdin content to a temp file and redirect os.Stdin.
f, err := os.CreateTemp(t.TempDir(), "stdin-*")
if err != nil {
t.Fatalf("CreateTemp for stdin: %v", err)
}
defer f.Close()
if _, err := f.WriteString(stdinContent); err != nil {
t.Fatalf("WriteString for stdin: %v", err)
}
Review

[NIT] The comment on stdinValidateDocmap says 'Tests must not call t.Parallel() while sharing the global os.Stdin' but there is no t.Skip or assertion to enforce this. Since os.Stdin mutation is inherently process-global, this is a latent footgun if someone later adds t.Parallel() to these tests. Consider adding a brief comment directing future maintainers to use TestMain to serialize tests instead, or document that this constraint is enforced by convention.

**[NIT]** The comment on `stdinValidateDocmap` says 'Tests must not call t.Parallel() while sharing the global os.Stdin' but there is no `t.Skip` or assertion to enforce this. Since os.Stdin mutation is inherently process-global, this is a latent footgun if someone later adds t.Parallel() to these tests. Consider adding a brief comment directing future maintainers to use `TestMain` to serialize tests instead, or document that this constraint is enforced by convention.
if _, err := f.Seek(0, 0); err != nil {
t.Fatalf("Seek for stdin: %v", err)
}
origStdin := os.Stdin
os.Stdin = f
defer func() { os.Stdin = origStdin }()
stdout, stderr = captureOutput(func() {
Review

[MINOR] The comment on stdinValidateDocmap says "Tests must not call t.Parallel() while sharing the global os.Stdin" — this is a valid concern, but none of the tests call t.Parallel() anyway, so the guard is informational only. The real risk is if a future contributor adds t.Parallel() without reading this comment. Consider adding a // WARNING: prefix or an explicit // do not add t.Parallel() to callers note to make it more prominent.

**[MINOR]** The comment on `stdinValidateDocmap` says "Tests must not call t.Parallel() while sharing the global os.Stdin" — this is a valid concern, but none of the tests call `t.Parallel()` anyway, so the guard is informational only. The real risk is if a future contributor adds `t.Parallel()` without reading this comment. Consider adding a `// WARNING:` prefix or an explicit `// do not add t.Parallel() to callers` note to make it more prominent.
code = runValidateDocmap(args)
})
return
}
func TestRunValidateDocmap_UncoveredFile(t *testing.T) {
dir := t.TempDir()
makeDocFile(t, dir, "docs/foo.md")
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/foo/**"
docs:
Review

[NIT] TestCheckStaleDocs_PathTraversal constructs the YAML by concatenating tc.docPath directly into the string literal. A path like /etc/passwd happens to be valid YAML, but paths containing YAML-special characters (e.g. :, {, #) could produce a parse error (exit 2) rather than the expected stale-doc error (exit 1). Adding a YAML quoting step or restricting the test vectors would make the test more robust.

**[NIT]** `TestCheckStaleDocs_PathTraversal` constructs the YAML by concatenating `tc.docPath` directly into the string literal. A path like `/etc/passwd` happens to be valid YAML, but paths containing YAML-special characters (e.g. `:`, `{`, `#`) could produce a parse error (exit 2) rather than the expected stale-doc error (exit 1). Adding a YAML quoting step or restricting the test vectors would make the test more robust.
- docs/foo.md
`)
code, _, stderr := stdinValidateDocmap(t,
"lib/bar/uncovered.ex\n",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if code != 1 {
t.Errorf("expected exit 1 for uncovered file, got %d", code)
}
if !strings.Contains(stderr, "lib/bar/uncovered.ex") {
t.Errorf("expected uncovered file in stderr, got %q", stderr)
}
if !strings.Contains(stderr, "no docmap coverage") {
t.Errorf("expected 'no docmap coverage' in stderr, got %q", stderr)
}
}
func TestRunValidateDocmap_BothFailures(t *testing.T) {
dir := t.TempDir()
// docs/foo.md intentionally missing
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/foo/**"
docs:
- docs/foo.md
`)
code, _, stderr := stdinValidateDocmap(t,
"lib/bar/uncovered.ex\n",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if code != 1 {
t.Errorf("expected exit 1 for both failures, got %d", code)
}
if !strings.Contains(stderr, "no docmap coverage") {
t.Errorf("expected coverage error in stderr, got %q", stderr)
}
if !strings.Contains(stderr, "stale docmap") {
t.Errorf("expected stale-docs error in stderr, got %q", stderr)
}
}
func TestRunValidateDocmap_EmptyStdin(t *testing.T) {
dir := t.TempDir()
makeDocFile(t, dir, "docs/foo.md")
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/foo/**"
docs:
- docs/foo.md
`)
code, stdout, _ := stdinValidateDocmap(t,
"",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if code != 0 {
t.Errorf("expected exit 0 for empty stdin, got %d", code)
}
if !strings.Contains(stdout, "OK") {
t.Errorf("expected 'OK' in stdout, got %q", stdout)
}
}
func TestRunValidateDocmap_BlankLinesSkipped(t *testing.T) {
dir := t.TempDir()
makeDocFile(t, dir, "docs/foo.md")
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/foo/**"
docs:
- docs/foo.md
`)
// stdin with only blank lines → effectively empty, should be clean
code, stdout, _ := stdinValidateDocmap(t,
"\n \n\n",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if code != 0 {
t.Errorf("expected exit 0 for blank-only stdin, got %d", code)
}
if !strings.Contains(stdout, "OK") {
t.Errorf("expected 'OK' in stdout for blank-only stdin, got %q", stdout)
}
}
func TestRunValidateDocmap_DuplicateDocsDeduped(t *testing.T) {
dir := t.TempDir()
// docs/shared.md intentionally missing — but it appears in TWO mappings.
// Should appear only once in stale list.
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/foo/**"
docs:
- docs/shared.md
- paths:
- "lib/bar/**"
docs:
- docs/shared.md
`)
code, _, stderr := stdinValidateDocmap(t,
"",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if code != 1 {
t.Errorf("expected exit 1 for stale doc, got %d", code)
}
count := strings.Count(stderr, "docs/shared.md")
if count != 1 {
t.Errorf("expected docs/shared.md to appear exactly once in stderr (deduplicated), got %d occurrences: %q", count, stderr)
}
}
// TestCheckStaleDocs_PathTraversal verifies that checkStaleDocs rejects
// traversal and absolute paths without touching the host filesystem.
func TestCheckStaleDocs_PathTraversal(t *testing.T) {
dir := t.TempDir()
// Baseline: a valid doc that exists.
makeDocFile(t, dir, "docs/valid.md")
Review

[NIT] In TestCheckStaleDocs_PathTraversal, the inner table test creates a docmap using string concatenation with the docPath directly embedded in YAML (- +tc.docPath+``). For paths like /etc/passwd and ../../etc/passwd, this creates valid YAML. However, for paths containing special YAML characters (:, #, {), this could produce invalid YAML that would fail at parse time rather than at the stale-docs check, making the test misleading. The current test cases are safe, but using fmt.Sprintf with YAML quoting or writing the file programmatically would be more robust.

**[NIT]** In `TestCheckStaleDocs_PathTraversal`, the inner table test creates a docmap using string concatenation with the docPath directly embedded in YAML (`- `+tc.docPath+``). For paths like `/etc/passwd` and `../../etc/passwd`, this creates valid YAML. However, for paths containing special YAML characters (`:`, `#`, `{`), this could produce invalid YAML that would fail at parse time rather than at the stale-docs check, making the test misleading. The current test cases are safe, but using `fmt.Sprintf` with YAML quoting or writing the file programmatically would be more robust.
tests := []struct {
name string
docPath string
wantStale bool
}{
{"dot-dot traversal", "../../etc/passwd", true},
{"dot-dot single", "../outside", true},
{"absolute path", "/etc/passwd", true},
{"valid present path", "docs/valid.md", false},
{"valid missing path", "docs/missing.md", true},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/**"
docs:
- `+tc.docPath+`
`)
code, _, stderr := stdinValidateDocmap(t,
"",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if tc.wantStale {
if code != 1 {
t.Errorf("path %q: expected exit 1 (stale/invalid), got %d; stderr: %q", tc.docPath, code, stderr)
}
} else {
if code != 0 {
t.Errorf("path %q: expected exit 0 (valid), got %d; stderr: %q", tc.docPath, code, stderr)
}
}
})
}
}
// TestCheckStaleDocs_SymlinkOutside verifies that a symlink under repoRoot
// pointing outside the repo is treated as stale (not followed).
func TestCheckStaleDocs_SymlinkOutside(t *testing.T) {
dir := t.TempDir()
// Create a symlink inside repoRoot pointing to a file outside the repo.
// We point at /etc/hostname (exists on Linux CI) but the test does not
// depend on that file existing — Lstat must reject the symlink itself.
linkPath := filepath.Join(dir, "docs", "secret.md")
if err := os.MkdirAll(filepath.Dir(linkPath), 0o755); err != nil {
t.Fatalf("MkdirAll: %v", err)
}
if err := os.Symlink("/etc/hostname", linkPath); err != nil {
t.Fatalf("Symlink: %v", err)
}
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/**"
docs:
- docs/secret.md
`)
code, _, stderr := stdinValidateDocmap(t,
"",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if code != 1 {
t.Errorf("expected exit 1 for symlink doc, got %d; stderr: %q", code, stderr)
}
if !strings.Contains(stderr, "docs/secret.md") {
t.Errorf("expected stale path in stderr, got %q", stderr)
}
}
// TestCheckStaleDocs_SymlinkInsideRepo verifies that a symlink pointing to
// another file *within* the repo is also treated as stale. We refuse all
// symlinks regardless of target to keep the check simple and safe.
func TestCheckStaleDocs_SymlinkInsideRepo(t *testing.T) {
dir := t.TempDir()
// Real doc file.
makeDocFile(t, dir, "docs/real.md")
// Symlink inside repo pointing at the real file.
linkPath := filepath.Join(dir, "docs", "link.md")
if err := os.Symlink(filepath.Join(dir, "docs", "real.md"), linkPath); err != nil {
t.Fatalf("Symlink: %v", err)
}
docmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/**"
docs:
- docs/link.md
`)
code, _, stderr := stdinValidateDocmap(t,
"",
[]string{"--docmap", docmap, "--repo-root", dir},
)
if code != 1 {
t.Errorf("expected exit 1 for symlink doc (even intra-repo), got %d; stderr: %q", code, stderr)
}
}
// TestRunValidateDocmap_SymlinkRepoRoot verifies that a --repo-root that is
// itself a symlink to a valid directory resolves correctly.
func TestRunValidateDocmap_SymlinkRepoRoot(t *testing.T) {
realDir := t.TempDir()
makeDocFile(t, realDir, "docs/foo.md")
// Create a symlink pointing at realDir.
symlinkDir := filepath.Join(t.TempDir(), "link-root")
if err := os.Symlink(realDir, symlinkDir); err != nil {
t.Fatalf("Symlink: %v", err)
}
// Place the docmap inside realDir so it passes the confinement check.
// (symlinkDir resolves to realDir, so files inside realDir are also inside
// the resolved repo-root.)
docmap := makeDocmapInDir(t, realDir, `
mappings:
- paths:
- "lib/**"
docs:
- docs/foo.md
`)
// Using the symlinked repo-root: the real doc exists → should be clean.
code, stdout, stderr := stdinValidateDocmap(t,
"lib/foo.go\n",
[]string{"--docmap", docmap, "--repo-root", symlinkDir},
)
if code != 0 {
t.Errorf("expected exit 0 for symlinked repo-root with existing doc, got %d; stderr: %q", code, stderr)
}
if !strings.Contains(stdout, "OK") {
t.Errorf("expected 'OK' in stdout, got %q", stdout)
}
}
// TestValidateDocmapPath_Symlink verifies that --docmap pointing at a symlink
// is rejected before the file is read (prevents /dev/zero DOS or arbitrary
// host-file reads via PR-controlled symlinks).
func TestValidateDocmapPath_Symlink(t *testing.T) {
dir := t.TempDir()
// Create a real docmap file to serve as the symlink target.
realDocmap := makeDocmapInDir(t, dir, `
mappings:
- paths:
- "lib/**"
docs:
- docs/foo.md
`)
// Create a symlink inside dir pointing to the real docmap.
symlinkPath := filepath.Join(dir, ".review-bot", "doc-map-link.yml")
if err := os.Symlink(realDocmap, symlinkPath); err != nil {
t.Fatalf("Symlink: %v", err)
}
code, _, stderr := stdinValidateDocmap(t,
"",
[]string{"--docmap", symlinkPath, "--repo-root", dir},
)
if code != 2 {
t.Errorf("expected exit 2 for symlink docmap, got %d; stderr: %q", code, stderr)
}
if !strings.Contains(stderr, "symlink") && !strings.Contains(stderr, "invalid") {
t.Errorf("expected symlink rejection in stderr, got %q", stderr)
}
}
// TestValidateDocmapPath_OutsideRepoRoot verifies that --docmap pointing
// outside --repo-root is rejected (prevents reading arbitrary host files).
func TestValidateDocmapPath_OutsideRepoRoot(t *testing.T) {
repoDir := t.TempDir()
// Create a docmap in a separate temp dir (outside the repo root).
outside := makeDocmapYAML(t, `
mappings:
- paths:
- "lib/**"
docs:
- docs/foo.md
`)
code, _, stderr := stdinValidateDocmap(t,
"",
[]string{"--docmap", outside, "--repo-root", repoDir},
)
if code != 2 {
t.Errorf("expected exit 2 for docmap outside repo-root, got %d; stderr: %q", code, stderr)
}
if !strings.Contains(stderr, "invalid") && !strings.Contains(stderr, "repo-root") {
t.Errorf("expected confinement rejection in stderr, got %q", stderr)
}
}
// TestValidateDocmapPath_SizeLimit verifies that --docmap files exceeding
// maxDocmapBytes are rejected before reading (prevents memory exhaustion).
func TestValidateDocmapPath_SizeLimit(t *testing.T) {
dir := t.TempDir()
// Write a file larger than maxDocmapBytes.
bigPath := filepath.Join(dir, ".review-bot", "big-doc-map.yml")
if err := os.MkdirAll(filepath.Dir(bigPath), 0o755); err != nil {
t.Fatalf("MkdirAll: %v", err)
}
// Exceed the limit by one byte.
bigContent := make([]byte, maxDocmapBytes+1)
if err := os.WriteFile(bigPath, bigContent, 0o644); err != nil {
t.Fatalf("WriteFile: %v", err)
}
code, _, stderr := stdinValidateDocmap(t,
"",
[]string{"--docmap", bigPath, "--repo-root", dir},
)
if code != 2 {
t.Errorf("expected exit 2 for oversized docmap, got %d; stderr: %q", code, stderr)
}
if !strings.Contains(stderr, "limit") && !strings.Contains(stderr, "size") && !strings.Contains(stderr, "invalid") {
t.Errorf("expected size limit error in stderr, got %q", stderr)
}
}
+22 -6
View File
@@ -66,6 +66,18 @@ func ParseDocMapConfig(localPath string) (*DocMapConfig, error) {
return &cfg, nil
Review

[MINOR] FileCoveredByDocMap assumes the caller passes forward-slash paths. Since it's exported, consider normalizing backslashes to forward slashes internally (as done in the CLI) or documenting this requirement in the function comment.

**[MINOR]** FileCoveredByDocMap assumes the caller passes forward-slash paths. Since it's exported, consider normalizing backslashes to forward slashes internally (as done in the CLI) or documenting this requirement in the function comment.
}
Review

[NIT] FileCoveredByDocMap doc comment says 'It is used by static validation tooling (e.g. the validate-docmap subcommand)'. Per the documentation pattern, the doc comment should start with the function name: // FileCoveredByDocMap reports whether... — the current phrasing already does start correctly but the second sentence adds implementation context that per the patterns belongs in a separate paragraph or omitted from the primary sentence. Very minor.

**[NIT]** `FileCoveredByDocMap` doc comment says 'It is used by static validation tooling (e.g. the validate-docmap subcommand)'. Per the documentation pattern, the doc comment should start with the function name: `// FileCoveredByDocMap reports whether...` — the current phrasing already does start correctly but the second sentence adds implementation context that per the patterns belongs in a separate paragraph or omitted from the primary sentence. Very minor.
// FileCoveredByDocMap reports whether at least one paths: glob in any mapping
// of cfg matches the given file path. It is used by static validation tooling
// (e.g. the validate-docmap subcommand) to check per-file docmap coverage.
func FileCoveredByDocMap(cfg *DocMapConfig, file string) bool {
for _, mapping := range cfg.Mappings {
if mappingMatches(mapping.Paths, []string{file}) {
return true
}
}
return false
}
// MatchDocs returns deduplicated doc paths for the given changed file paths.
// A mapping matches if any of its path globs matches any of the changed files.
func MatchDocs(cfg *DocMapConfig, changedFiles []string) []string {
@@ -245,7 +257,7 @@ type docEntry struct {
// If the path is a directory, all .md files under it are returned.
// If it's a file, a single entry is returned.
func loadDocEntries(ctx context.Context, fetcher DocFetcher, owner, repo, docPath string) ([]docEntry, error) {
if err := validateDocPath(docPath); err != nil {
if err := ValidateDocPath(docPath); err != nil {
return nil, fmt.Errorf("doc path %q rejected: %w", docPath, err)
}
@@ -298,11 +310,15 @@ func readFileBytes(path string) ([]byte, error) {
return os.ReadFile(path)
Review

[NIT] The comment on ValidateDocPath says 'Backslashes are rejected explicitly to prevent Windows platform edge cases.' — the sentence could be clearer that the tool itself may run on Windows and that backslashes in YAML doc paths could be misinterpreted by filepath.Join on that platform. Minor documentation clarity.

**[NIT]** The comment on ValidateDocPath says 'Backslashes are rejected explicitly to prevent Windows platform edge cases.' — the sentence could be clearer that the tool itself may run on Windows and that backslashes in YAML doc paths could be misinterpreted by filepath.Join on that platform. Minor documentation clarity.
}
// validateDocPath rejects doc paths that could cause path traversal via the
// VCS API (absolute paths, any ".." segment). Defense-in-depth: the VCS API
// should already scope paths to the repo, but we validate locally to avoid
// any quirk in backend path handling.
func validateDocPath(p string) error {
// ValidateDocPath rejects doc paths that could cause path traversal
// (absolute paths, any ".." segment, backslashes). Defense-in-depth: callers
// must also confine the joined path to the repo root via filepath.Rel before
Review

[NIT] The comment on ValidateDocPath mentions "Finding #3" which appears to be a reference to an internal issue/finding number that won't be meaningful to future readers of this file. This should be reworded to be self-contained (e.g., remove the parenthetical or rephrase as 'to prevent OS-specific path separator normalization issues').

**[NIT]** The comment on `ValidateDocPath` mentions "Finding #3" which appears to be a reference to an internal issue/finding number that won't be meaningful to future readers of this file. This should be reworded to be self-contained (e.g., remove the parenthetical or rephrase as 'to prevent OS-specific path separator normalization issues').
// any filesystem access. Backslashes are rejected explicitly to prevent
// Windows platform edge cases.
func ValidateDocPath(p string) error {
if strings.Contains(p, "\\") {
Review

[NIT] The comment on ValidateDocPath says "Defense-in-depth: the VCS API should already scope paths to the repo" — this is accurate for the VCS-fetch path, but ValidateDocPath is now also used by the local-filesystem stale-docs check where the VCS API is not involved. The comment is still broadly correct but could be generalized: "Defense-in-depth: callers should also confine the joined path to the repo root via filepath.Rel before filesystem access."

**[NIT]** The comment on `ValidateDocPath` says "Defense-in-depth: the VCS API should already scope paths to the repo" — this is accurate for the VCS-fetch path, but `ValidateDocPath` is now also used by the local-filesystem stale-docs check where the VCS API is not involved. The comment is still broadly correct but could be generalized: "Defense-in-depth: callers should also confine the joined path to the repo root via filepath.Rel before filesystem access."
return fmt.Errorf("backslashes not allowed in doc paths")
security-review-bot marked this conversation as resolved Outdated
Outdated
Review

[NIT] ValidateDocPath splits on '/' only. While later checks use filepath.FromSlash and Rel, consider normalizing or explicitly rejecting backslashes in paths to avoid platform-specific edge cases on Windows.

**[NIT]** ValidateDocPath splits on '/' only. While later checks use filepath.FromSlash and Rel, consider normalizing or explicitly rejecting backslashes in paths to avoid platform-specific edge cases on Windows.
}
Review

[NIT] The comment on ValidateDocPath says 'Defense-in-depth: callers must also confine the joined path to the repo root via filepath.Rel before any filesystem access.' This is good documentation but the function's contract doesn't verify the caller satisfies that postcondition. This is fine (it's defense-in-depth by definition), just making sure the comment is understood as advisory rather than enforced.

**[NIT]** The comment on `ValidateDocPath` says 'Defense-in-depth: callers must also confine the joined path to the repo root via filepath.Rel before any filesystem access.' This is good documentation but the function's contract doesn't verify the caller satisfies that postcondition. This is fine (it's defense-in-depth by definition), just making sure the comment is understood as advisory rather than enforced.
if filepath.IsAbs(p) {
return fmt.Errorf("absolute paths not allowed")
}
+76 -2
View File
@@ -384,7 +384,7 @@ func TestValidateDocPath(t *testing.T) {
"a/b/c",
}
for _, p := range valid {
if err := validateDocPath(p); err != nil {
if err := ValidateDocPath(p); err != nil {
t.Errorf("expected valid path %q to pass, got error: %v", p, err)
}
}
@@ -395,9 +395,13 @@ func TestValidateDocPath(t *testing.T) {
"docs/../../../etc/passwd",
"../sibling-repo/file.md",
"a/b/../c",
// Backslashes must be rejected (Finding #3 — Windows platform edge cases).
`docs\foo.md`,
`docs\..\secret`,
`\absolute`,
}
for _, p := range invalid {
if err := validateDocPath(p); err == nil {
if err := ValidateDocPath(p); err == nil {
t.Errorf("expected path %q to be rejected, but it was accepted", p)
}
}
@@ -420,6 +424,27 @@ func TestLoadMatchingDocs_PathTraversalRejected(t *testing.T) {
}
}
// TestValidateDocPath_Backslash verifies that backslash-bearing paths are
// rejected to prevent Windows platform edge cases where a path separator
// could be normalised differently by the host OS or VCS backend.
func TestValidateDocPath_Backslash(t *testing.T) {
backslashPaths := []string{
`docs\foo.md`,
`docs\subdir\file.md`,
`\absolute`,
}
for _, p := range backslashPaths {
if err := ValidateDocPath(p); err == nil {
t.Errorf("expected backslash path %q to be rejected, but it was accepted", p)
}
}
// Sanity: forward-slash path must still be accepted.
if err := ValidateDocPath("docs/foo.md"); err != nil {
t.Errorf("expected forward-slash path to be accepted, got: %v", err)
}
}
// ============================================================
// Helpers
// ============================================================
@@ -436,3 +461,52 @@ func writeTempYAML(t *testing.T, content string) string {
}
return filepath.Clean(f.Name())
}
// ============================================================
// FileCoveredByDocMap
// ============================================================
func TestFileCoveredByDocMap(t *testing.T) {
cfg := &DocMapConfig{
Mappings: []DocMapping{
{
Paths: []string{"lib/foo/**", "lib/bar/*.go"},
Docs: []string{"docs/foo.md"},
},
{
Paths: []string{"cmd/**"},
Docs: []string{"docs/cmd.md"},
},
},
}
cases := []struct {
file string
covered bool
}{
{"lib/foo/baz.ex", true},
{"lib/foo/sub/deep.ex", true},
{"lib/bar/util.go", true},
{"lib/bar/sub/util.go", false}, // *.go only matches one level
{"cmd/main.go", true},
{"cmd/sub/main.go", true},
{"internal/secret.go", false},
{"", false},
}
for _, tc := range cases {
t.Run(tc.file, func(t *testing.T) {
got := FileCoveredByDocMap(cfg, tc.file)
if got != tc.covered {
t.Errorf("FileCoveredByDocMap(%q) = %v, want %v", tc.file, got, tc.covered)
}
})
}
}
func TestFileCoveredByDocMap_EmptyConfig(t *testing.T) {
cfg := &DocMapConfig{}
if FileCoveredByDocMap(cfg, "lib/foo/bar.go") {
t.Error("expected false for empty config, got true")
}
}