After getting the resolved path from validateDocmapPath, Lstat the path
immediately before os.Open, then compare with f.Stat() after open using
os.SameFile. If the file was swapped between validation and open (e.g.,
replaced with a symlink pointing outside the repo), the inode comparison
catches it and returns an error.
Also changes defer f.Close() // nolint:errcheck to
defer func() { _ = f.Close() }() to follow the project convention of
explicit ignores over suppressor comments.
Addresses security bot finding (review 4812) against d6bab7a9.
Finding 5 [NIT] from self-review:
TestValidateDocmapPath_InRepoSymlinkAllowed verifies that a file-level
symlink inside the repo root whose resolved target is also within the
root is accepted by validateDocmapPath. This is the positive case for
the issue #150 behavioral change (commit 4dce8e4): only symlinks whose
resolved destination escapes the root are rejected. Intra-repo symlinks
are permitted and their resolved path is returned to the caller.
The test also asserts that the returned path is the resolved real file,
not the symlink entry itself (i.e., EvalSymlinks did its job).
Finding 4 [MINOR] from self-review:
Previously, validateDocmapPath validated *docmapFlag then returned error
only, leaving the caller to re-open the original (unresolved) path via
ParseDocMapConfig. In theory, the path could change between validation
and use (check-then-use race).
Change validateDocmapPath to return (string, error): on success it
returns the filepath.EvalSymlinks-resolved absolute path. The caller
now passes resolvedDocmap to ParseDocMapConfig instead of the original
*docmapFlag string, eliminating any check-then-use window.
Also update the test for TestValidateDocmapPath_DirSymlinkBypass to use
the new two-value return: _ for the resolved path, err for the error.
Low-risk in ephemeral CI but correct by construction.
Findings 1-3 from self-review (4dce8e4):
Finding 1 [NIT]: remove dead ModeSymlink check and its misleading
'defense-in-depth' comment. After filepath.EvalSymlinks, resolvedPath
is guaranteed symlink-free; fi.Mode()&os.ModeSymlink can never be set.
Dropped the unreachable branch; updated Lstat comment to say so.
Finding 2 [MINOR]: update validateDocmapPath godoc — invariant #2 now
reads 'The resolved path is within resolvedRoot' instead of 'The path
is not a symlink'. In-repo file-level symlinks whose resolved target
stays within the root are allowed; the confinement check enforces the
actual invariant.
Finding 3 [MINOR]: update inline comment in runValidateDocmap — the
bulleted list item now says 'Resolved target stays within the root
(in-repo symlinks allowed...)' instead of 'Is not a symlink'.
"ERROR: stale docmap docs: entries" had a vestigial "docs:" fragment
that reads awkwardly (looks like a YAML reference).
Change to: "ERROR: stale docmap entries (paths do not exist):"
Addresses NIT finding in review #4175.
Non-git tools (e.g. `find`, `ls`) can emit paths with a "./" prefix.
Without stripping this, "./cmd/foo.go" would not match the glob "cmd/**",
producing a false-positive uncovered-file failure.
Fix: add strings.TrimPrefix(f, "./") after backslash normalization.
Test: TestRunValidateDocmap_DotSlashPrefix
Addresses MINOR finding in review #4175.
Add IsRegular() check after Lstat so directories, FIFOs, and device nodes
produce a clear error ("docmap must be a regular file") instead of a
confusing downstream parse error.
Test: TestValidateDocmapPath_NonRegularFile
Addresses MINOR finding in review #4175.
When --doc-map-trusted-ref is provided, the --doc-map value is used as a
VCS API path parameter, not a local filesystem path. The early call to
validateWorkspacePath (which requires the file to exist locally) blocked
the trusted-ref code path when the doc-map did not exist in the local
checkout — defeating the feature's purpose in sparse checkouts or when
the file is only on the default branch.
Fix: guard the early validation with `&& *docMapTrustedRef == ""`.
Also fixes:
- review/docmap.go: correct ParseDocMapConfigContent godoc example to
match actual source format "owner/repo@ref:path"
- cmd/review-bot/main_test.go: add TestMainSubprocess_DocMapTrustedRefSkipsLocalValidation
to prevent regression
The doc-map YAML config was previously read from the local workspace
(the PR branch checkout). A malicious PR author could modify
.review-bot/doc-map.yml to map any path glob to sensitive design docs,
causing review-bot to fetch and inject those docs into the LLM prompt.
Fix: add --doc-map-trusted-ref (DOC_MAP_TRUSTED_REF) flag. When set to
a trusted ref (e.g. 'main'), the doc-map config is fetched from the VCS
API at that ref instead of from local workspace. A 404 from VCS is a
hard error (no silent fallback to local copy).
When unset, the local workspace is used with a security warning in the
logs pointing operators to the new flag.
Changes:
- review/docmap.go: add ParseDocMapConfigContent + parseDocMapBytes
helper to parse from in-memory content (fetched via VCS API)
- cmd/review-bot/main.go: add --doc-map-trusted-ref flag; Step 6c
branches on trusted-ref to fetch vs local-workspace load
- .gitea/actions/review/action.yml: add doc-map-trusted-ref input
- README.md: document new input
- CHANGELOG.md: security and feature entries
Tests:
- TestParseDocMapConfigContent_Valid/Empty/InvalidYAML/UnknownKeys
in review/docmap_test.go
Coverage: 53.0% cmd/review-bot
validateurl.go is VCS-generic but imported gitea.IsBlockedIP, creating an
unexpected generic→Gitea-specific dependency. Extract IsBlockedIP and its
CIDR list to internal/netutil/ipcheck.go (a neutral shared package).
- gitea/ipcheck.go becomes a thin forwarding wrapper (preserves API compat
for callers within the gitea package)
- gitea/ipcheck_test.go replaced with a forwarding smoke test; full coverage
moves to internal/netutil/ipcheck_test.go
- validateurl.go now imports internal/netutil directly
The field was named NewPosition with a misleading comment 'Gitea: absolute
line; GitHub: diff hunk position'. In reality both adapters use it as an
absolute new-file line number (Gitea maps it to new_position, GitHub maps it
to Line+Side:RIGHT). Rename to NewLine to match actual semantics and update
comments to explain per-adapter mapping.
The previous implementation called os.Lstat(absPath) which only avoids
following the *final* path component. A PR committing .review-bot/ as a
directory symlink pointing outside the repo would pass the filepath.Rel
confinement check because the textual path was inside the root while
the resolved destination was not.
Fix: call filepath.EvalSymlinks after filepath.Abs to resolve ALL symlink
components before the confinement check. If EvalSymlinks fails (dangling
symlink, nonexistent target) the path is rejected. The filepath.Rel check
then operates on the fully-resolved path.
Semantic change: file-level in-repo symlinks (target also within root) are
now allowed — the invariant is about where the content lives, not whether
the entry is a symlink. The test TestValidateDocmapPath_Symlink is updated
to test an out-of-repo symlink target, which must still be rejected.
Tests:
- TestValidateDocmapPath_DirSymlinkBypass: reproduces the attack vector
(dir symlink bypassing textual confinement check) and verifies it is
now rejected
- TestValidateDocmapPath_Symlink: updated to test out-of-repo symlink
Coverage: 54.0%
Address security-review-bot REQUEST_CHANGES findings on PR #142:
MAJOR (Finding #1): Docmap file path was read directly without validating it
is within --repo-root or checking for symlinks. A malicious PR could create
.review-bot/doc-map.yml as a symlink to /dev/zero (resource exhaustion) or an
arbitrary host file (information disclosure).
Fix: Add validateDocmapPath() called before ParseDocMapConfig(). It:
- Resolves --repo-root first (filepath.Abs + EvalSymlinks), moved up before
docmap parsing so both checks share the same resolved root
- Uses os.Lstat to detect symlinks and rejects them outright
- Confirms the docmap path is within resolvedRoot via filepath.Rel
- Checks file size against maxDocmapBytes (10 MB) before reading
MINOR (Finding #2): No upper bound on docmap YAML size.
Fix: os.Lstat size check enforces maxDocmapBytes cap before os.ReadFile.
Tests:
- TestValidateDocmapPath_Symlink: docmap is a symlink → exit 2
- TestValidateDocmapPath_OutsideRepoRoot: docmap outside repo-root → exit 2
- TestValidateDocmapPath_SizeLimit: docmap exceeds 10 MB cap → exit 2
- Updated all existing tests to use makeDocmapInDir() so the docmap
lives inside the repo-root, satisfying the new confinement check
Finding #1 [MAJOR]: replace os.Stat with os.Lstat in checkStaleDocs to
prevent symlink traversal. Symlinks under repoRoot could probe arbitrary
host file existence; Lstat never follows them. Symlinked docs are now
treated as stale.
Finding #2 [MINOR]: resolve --repo-root with filepath.Abs +
filepath.EvalSymlinks before passing to checkStaleDocs, so a symlinked
repo-root cannot bypass the filepath.Rel escape guard.
Finding #3 [NIT]: reject backslashes in ValidateDocPath to prevent
Windows platform edge cases where a path separator may be normalised
differently by the host OS or VCS backend.
Tests added:
- TestCheckStaleDocs_SymlinkOutside: symlink inside repo → outside
- TestCheckStaleDocs_SymlinkInsideRepo: intra-repo symlink also rejected
- TestRunValidateDocmap_SymlinkRepoRoot: symlinked --repo-root resolves OK
- TestValidateDocPath_Backslash: backslash paths rejected
- Backslash cases added to TestValidateDocPath invalid slice
All go test ./... pass, go vet ./... clean.
Export review.ValidateDocPath and use it in checkStaleDocs before
calling os.Stat. Add filepath.Clean + filepath.Rel confinement check
as defense-in-depth to ensure doc paths from PR-controlled YAML
cannot probe filesystem locations outside repoRoot.
Also add tests covering: ../../etc/passwd, /etc/passwd, ../outside,
a valid present path, and a valid missing path.
Addresses security finding from security-review-bot on PR #142.
TestRunValidateDocmap_Clean was reading real os.Stdin (fragile in CI).
Switch to stdinValidateDocmap with a covered file and empty-stdin test
already covered by TestRunValidateDocmap_EmptyStdin.
Adds 'review-bot validate-docmap' for CI hard-fail on docmap coverage gaps.
Usage:
git diff --name-only origin/main HEAD | \
review-bot validate-docmap --docmap .review-bot/doc-map.yml --repo-root .
Flags:
--docmap (required) path to doc-map YAML file
--repo-root (optional, default '.') root for resolving docs: paths
Two checks, both always run:
1. Coverage: every stdin file must match at least one paths: glob.
2. Stale docs: every docs: entry must exist on disk under --repo-root.
Exit codes: 0=clean, 1=failures found, 2=usage/parse error.
Tests cover: clean pass, uncovered file, stale doc, both failures,
empty stdin, blank-line stdin, and duplicate docs: deduplication.
- New --doc-map flag (DOC_MAP_FILE env var): path to YAML config mapping
source path globs to governing design docs
- New --doc-map-max-bytes flag (DOC_MAP_MAX_BYTES env var): cap on total
injected doc content, default 100KB
- review/docmap.go: DocMapConfig parsing, glob matching with ** support,
doc loading via VCS with directory expansion and size guard
- budget.Sections: new DesignDocs field, trimmed after conventions
- budget.buildResult: injects DesignDocs under ## Design Documents heading
- action.yml: doc-map and doc-map-max-bytes inputs wired to env vars
- CHANGELOG.md: created with unreleased entry
- Tests: ParseDocMapConfig, MatchDocs, globMatch, LoadMatchingDocs
The test constructs github.Client directly (matching the Gitea integration
test pattern), so setting VCS_TYPE does not affect the code under test.
Remove the setenv call to avoid implying routing is being exercised.
- Strip VCS_TYPE and VCS_URL in cleanEnv() to prevent env leakage in
subprocess tests when VCS_TYPE=github is set in the runner environment
(fixes#135)
- Add TestGithubAPIURL table-driven tests covering:
- Empty string defaults to https://api.github.com
- https://github.com maps to https://api.github.com
- Trailing slash variant maps correctly
- GHES host (ghe.example.com) gets /api/v3 suffix
- GHES concur domain does not map to api.github.com
(fixes#134)
- Add TestIntegration_GitHub_PostAndVerifyReview: exercises the GitHub
adapter end-to-end via VCS_TYPE=github. Skips gracefully when
INTEGRATION_GITHUB_TOKEN, INTEGRATION_GITHUB_REPO, and
INTEGRATION_GITHUB_PR are not set. Verifies GetAuthenticatedUser,
GetPullRequest, PostReview, and ListReviews succeed; notes that
DeleteReview on submitted GitHub reviews is expected to fail (422).
(fixes#133)
MAJOR fixes:
- gitea/ipcheck.go: replace startup panic with init()+error list pattern
Hard-coded CIDRs that fail to parse now recorded in blockedCIDRParseErrors
instead of panicking. TestBlockedCIDRsValid catches programming errors
in CI without violating CONVENTIONS.md 'never panic' rule.
- .gitea/actions/review/action.yml: re-validate SERVER_URL at start of
'Install review-bot' step to close DNS rebinding window between
'Determine version' and install-step curl calls.
MINOR fixes:
- gitea/client.go: add Timeout: 10*time.Second to net.Dialer per PLAN.md spec
- cmd/review-bot/validateurl.go: switch isValidateError to errors.As so
wrapped *validateError values are also detected
- gitea/ipcheck_test.go: clarify 198.51.100.1 (RFC5737 TEST-NET-2) comment;
add TestBlockedCIDRsValid to surface CIDR parse errors as test failures
NIT fixes:
- .gitea/actions/review/action.yml: refactor Python list comprehension in
SSRF check to for-loop (avoids side-effect-only comprehension, runner compat)
- gitea/export_test.go: expand comment explaining white-box test pattern
(why package gitea not gitea_test, Go stdlib precedent)
Remove PLAN.md (implementation complete)
## Changes
### Go: IP-level SSRF protection in gitea.Client (primary defense)
- Add gitea/ipcheck.go with IsBlockedIP() covering all blocked CIDR ranges:
loopback (127.0.0.0/8, ::1), RFC1918 (10/8, 172.16/12, 192.168/16),
link-local (169.254/16, fe80::/10), ULA (fc00::/7), CGN (100.64/10),
multicast, reserved, and unspecified ranges.
- IPv6-mapped IPv4 addresses (::ffff:x.x.x.x) are normalized before checking.
- Add safeDialContext to gitea.Client: resolves DNS, rejects any IP in a
blocked CIDR, then dials the resolved IP directly to narrow the DNS rebinding
window. NewClient now uses this safe transport by default.
- Add WithUnsafeDialer() for test code using httptest.Server (127.0.0.1).
- Update NewTestClient helper in export_test.go for all gitea unit tests.
- Update SetHTTPClient(nil) to restore the safe transport (not the plain one).
### Go: validate-url subcommand (defense-in-depth for future bash callers)
- Add 'review-bot validate-url <url>' subcommand: validates https scheme,
no user-info, resolves hostname, rejects any blocked IP.
- Exit 0=safe, 1=blocked, 2=validation error/dns failure.
- Add outWriter/errWriter vars to main.go for testable output capture.
### action.yml: Python3 IP check in 'Determine version' step
- After the https scheme validation, resolve SERVER_URL hostname with
socket.getaddrinfo and reject any result where
ipaddress.ip_address(ip).is_private/is_loopback/is_link_local/etc. is true.
- python3 is required on ubuntu-* runners (noted in existing comments).
- Covers the version-check curl that sends ACTION_TOKEN to SERVER_URL.
- SERVER_URL for install-step curls is covered by the same pre-check.
### Tests
- gitea/ipcheck_test.go: 30+ cases covering all blocked families + public IPs
- gitea/client_test.go: safe transport presence, WithUnsafeDialer, SSRF blocking
- cmd/review-bot/validateurl_test.go: scheme validation, user-info, exit codes
Closes#123
- Add --vcs-url flag as primary (reads VCS_URL env var)
- Keep --gitea-url and GITEA_URL as deprecated fallbacks with warnings
- Update action.yml: rename gitea-url input to vcs-url, pass VCS_URL to binary
- Update ci.yml: use VCS_URL env var in Run review step
- Update integration tests: INTEGRATION_GITEA_URL -> INTEGRATION_VCS_URL
- Update README: --vcs-url / VCS_URL with fallback note in env var table
Backward compat: existing GITEA_URL users get a deprecation warning and
continue to work unchanged until they migrate to VCS_URL.
Add commitID parameter to gitea.Client.PostReview so the review is
anchored to the specific commit that was evaluated. The caller
(cmd/review-bot) already computes evaluatedSHA from pr.Head.Sha;
this wires it through to the Gitea API payload.
When commitID is empty, omitempty drops it from the JSON and Gitea
defaults to the current PR head (backward-compatible).
Closes#107
When patterns-repo is configured, now logs at Info level:
- File paths loaded from each repo
- Count of files per repo
At Debug level logs skipped files (non-markdown/txt/yaml).
Warns if no pattern files were loaded from a repo (likely
misconfigured patterns-files path).
Closes#64
MAJOR:
- LoadRepoPersonas: add MaxPersonaFileSize check before parsing to
prevent resource exhaustion from oversized YAML files committed
to target repositories
MINOR:
- isNotFoundError: tighten substring match to 'HTTP 404' only to
avoid masking auth/transport errors containing generic 'not found'
- main.go: remove duplicate flag.Parse() call
- main.go: add comment explaining nil map indexing is safe in Go
when LoadRepoPersonas returns an error
Tests updated to reflect the intentional behavior change in
isNotFoundError and added test case for oversized file rejection.
Implements #60.
- Add ParsePersonaBytes() for parsing personas from byte data
- Add LoadRepoPersonas() to fetch personas from repo via Gitea API
- Add MergePersonas() to combine built-in and repo personas
- Add GetBuiltinPersonasMap() helper
- Update main.go to load repo personas first, fall back to built-in
- Add giteaClientAdapter to bridge gitea.Client to review.GiteaClient
When --persona is specified, the bot now:
1. Attempts to fetch personas from .review-bot/personas/*.yaml
2. If the named persona exists in the repo, uses it
3. Otherwise falls back to built-in personas
This allows repos to define domain-specific personas (e.g., trading
experts for gargoyle, crypto experts for kms-lite) without modifying
the review-bot codebase.
Add native SAP AI Core provider that handles OAuth token management and
deployment discovery automatically. This eliminates the need for the
external LLM proxy when running in SAP environments.
Changes:
- Add AICoreClient with OAuth token caching and deployment URL discovery
- Support both Anthropic and OpenAI models via AI Core deployments
- Update CI to use native AI Core provider
- Update action inputs to accept AI Core credentials
- Update README with AI Core configuration examples
Model names must match AI Core deployment names (e.g. anthropic--claude-4.6-sonnet, gpt-5).
MAJOR fixes:
- Remove external YAML dependency (github.com/goccy/go-yaml)
Per project convention: Go standard library only, zero dependencies.
Convert all persona files from YAML to JSON format.
- Fix TestValidateWorkspacePath error expectation
Go 1.21+ filepath.Join normalizes absolute paths differently.
MINOR fixes:
- Remove custom contains helper in persona_test.go (use strings.Contains)
- Add Unicode-safe CapitalizeFirst function for header titles
- ListBuiltinPersonas returns empty slice instead of nil on error
- Fix test comment about filepath.Join behavior
Documentation:
- Update README to reflect JSON-only persona format
- Update design doc with note about JSON decision
- Fix action.yml description for persona-file input
Add persona system for specialized review roles. Each persona defines:
- A specific review focus (security, architecture, documentation)
- Custom system prompt additions
- Personality/tone adjustments
Built-in personas: security, architect, docs
Custom personas: load from JSON via persona-file flag
Includes workspace validation to prevent path traversal attacks.
Closes#51
When a new push arrives while review-bot is processing, the review
would be posted against a stale commit. This causes noise in the
PR timeline with findings that reference code that no longer exists.
Before posting, re-fetch PR metadata and compare HEAD SHA with the
commit we evaluated against. If they differ, log a warning and exit
successfully — a new workflow run should already be processing the
new HEAD.
Fixes#52
Addresses intermittent 'unexpected end of JSON input' failures where the
LLM response body is truncated in transit between the proxy and client.
Root cause: network-level truncation where io.ReadAll returns partial data
(observed in 3/50 CI runs through HAI proxy). The response body reading
was already using io.ReadAll correctly, but transient network issues
between the proxy and client can still cause partial reads.
Changes:
- Add Content-Length validation in doRequest: detect when fewer bytes
arrive than the server declared, triggering a retry
- Add retry logic in Complete: retries once on retryable errors (body
read failures, content-length mismatches) with a 500ms backoff
- Add parse-level retry in main: if ParseResponse fails, re-requests
from the LLM once before giving up (defensive, since retries always
succeed per issue evidence)
- Improve ParseResponse error diagnostics: log raw vs cleaned lengths
and a preview of the cleaned content to aid future debugging
Does NOT retry on API errors (4xx/5xx) or structural issues — only
transient body read problems.
Closes#47
Previously findOwnReview returned only the single most-recent matching
review, so on PRs with multiple force-pushes only the latest old review
got superseded. The rest accumulated as unsuperseded stale reviews.
Changes:
- Add findAllOwnReviews() to collect all non-superseded matching reviews
- Loop over all old reviews in the supersede phase
- Add GetTimelineReviewCommentIDForReview() to find comment IDs by
review ID (fetches review body, matches in timeline by prefix)
- Each old review gets independently superseded and its inline comments
resolved
The old findOwnReview is kept for backward compat (tested, may be
useful as a utility).
Closes#27
After superseding an old review, resolves all its inline comments via
POST /pulls/comments/{id}/resolve. This clears unresolved conversation
markers from the PR timeline and diff view.
New API methods:
- ListReviewComments: paginated GET /repos/.../pulls/{n}/reviews/{id}/comments
- ResolveComment: POST /repos/.../pulls/comments/{id}/resolve
Behavior:
- Only resolves after successful supersede (gated on supersedeOK)
- Aggregates failures and logs at warn level
- Truncates error bodies to 256 bytes (security)
- Non-fatal: review still posts even if resolution fails