Compare commits

...

5 Commits

Author SHA1 Message Date
Rodin eff5b83852 feat(#143): fetch doc-map config from trusted VCS ref
The doc-map YAML config was previously read from the local workspace
(the PR branch checkout). A malicious PR author could modify
.review-bot/doc-map.yml to map any path glob to sensitive design docs,
causing review-bot to fetch and inject those docs into the LLM prompt.

Fix: add --doc-map-trusted-ref (DOC_MAP_TRUSTED_REF) flag. When set to
a trusted ref (e.g. 'main'), the doc-map config is fetched from the VCS
API at that ref instead of from local workspace. A 404 from VCS is a
hard error (no silent fallback to local copy).

When unset, the local workspace is used with a security warning in the
logs pointing operators to the new flag.

Changes:
- review/docmap.go: add ParseDocMapConfigContent + parseDocMapBytes
  helper to parse from in-memory content (fetched via VCS API)
- cmd/review-bot/main.go: add --doc-map-trusted-ref flag; Step 6c
  branches on trusted-ref to fetch vs local-workspace load
- .gitea/actions/review/action.yml: add doc-map-trusted-ref input
- README.md: document new input
- CHANGELOG.md: security and feature entries

Tests:
- TestParseDocMapConfigContent_Valid/Empty/InvalidYAML/UnknownKeys
  in review/docmap_test.go

Coverage: 53.0% cmd/review-bot
2026-05-15 10:39:43 +00:00
Rodin eb3770e18c chore(fmt): align test comments in gitea/ipcheck_test.go
CI / test (push) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
2026-05-15 10:23:11 +00:00
Rodin 77a7f667cb refactor(#154): extract baseSubprocessArgs helper in main_test.go subprocess tests
CI / test (push) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
2026-05-15 10:18:34 +00:00
Rodin 76b6493628 fix(#150): add EvalSymlinks to validateDocmapPath — close dir-symlink bypass
CI / test (push) Successful in 15s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
2026-05-15 10:18:04 +00:00
Rodin 4dce8e4454 fix(#150): add EvalSymlinks to validateDocmapPath — close dir-symlink bypass
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 39s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m3s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m16s
The previous implementation called os.Lstat(absPath) which only avoids
following the *final* path component. A PR committing .review-bot/ as a
directory symlink pointing outside the repo would pass the filepath.Rel
confinement check because the textual path was inside the root while
the resolved destination was not.

Fix: call filepath.EvalSymlinks after filepath.Abs to resolve ALL symlink
components before the confinement check. If EvalSymlinks fails (dangling
symlink, nonexistent target) the path is rejected. The filepath.Rel check
then operates on the fully-resolved path.

Semantic change: file-level in-repo symlinks (target also within root) are
now allowed — the invariant is about where the content lives, not whether
the entry is a symlink. The test TestValidateDocmapPath_Symlink is updated
to test an out-of-repo symlink target, which must still be rejected.

Tests:
- TestValidateDocmapPath_DirSymlinkBypass: reproduces the attack vector
  (dir symlink bypassing textual confinement check) and verifies it is
  now rejected
- TestValidateDocmapPath_Symlink: updated to test out-of-repo symlink

Coverage: 54.0%
2026-05-15 08:37:31 +00:00
10 changed files with 285 additions and 91 deletions
+11
View File
@@ -141,6 +141,16 @@ inputs:
description: 'Maximum bytes of injected doc content from doc-map (default 102400 = 100KB)' description: 'Maximum bytes of injected doc content from doc-map (default 102400 = 100KB)'
required: false required: false
default: '102400' default: '102400'
doc-map-trusted-ref:
description: >-
Git ref (branch, tag, or SHA) from which to fetch the doc-map config file
via VCS API instead of reading it from the local workspace. Recommended
when using doc-map: set this to the default branch (e.g. 'main') so a
malicious PR cannot modify the doc-map config to inject arbitrary design
docs into the LLM prompt. When unset, the config is read from the local
workspace (the PR branch) with a security warning in the logs.
required: false
default: ''
runs: runs:
using: 'composite' using: 'composite'
@@ -507,6 +517,7 @@ runs:
PERSONA_FILE: ${{ inputs.persona-file }} PERSONA_FILE: ${{ inputs.persona-file }}
DOC_MAP_FILE: ${{ inputs.doc-map }} DOC_MAP_FILE: ${{ inputs.doc-map }}
DOC_MAP_MAX_BYTES: ${{ inputs.doc-map-max-bytes }} DOC_MAP_MAX_BYTES: ${{ inputs.doc-map-max-bytes }}
DOC_MAP_TRUSTED_REF: ${{ inputs.doc-map-trusted-ref }}
AICORE_CLIENT_ID: ${{ inputs.aicore-client-id }} AICORE_CLIENT_ID: ${{ inputs.aicore-client-id }}
AICORE_CLIENT_SECRET: ${{ inputs.aicore-client-secret }} AICORE_CLIENT_SECRET: ${{ inputs.aicore-client-secret }}
AICORE_AUTH_URL: ${{ inputs.aicore-auth-url }} AICORE_AUTH_URL: ${{ inputs.aicore-auth-url }}
+15
View File
@@ -2,8 +2,23 @@
## Unreleased ## Unreleased
### Security
- **`validateDocmapPath`: add `EvalSymlinks` to close directory-symlink bypass** ([#150](https://gitea.weiker.me/rodin/review-bot/issues/150)): The previous implementation used `os.Lstat` which only avoids following the *final* path component. An intermediate directory symlink (e.g. `.review-bot/` committed as a symlink to a directory outside the repo) would pass the path-confinement check because the textual path appeared within the repo root. `filepath.EvalSymlinks` is now called first, resolving all symlink components before the `filepath.Rel` confinement check. In-repo symlinks whose resolved targets also reside within the repo root are now allowed; out-of-repo targets are rejected by the confinement check.
- **`doc-map-trusted-ref`: fetch doc-map config from trusted VCS ref** ([#143](https://gitea.weiker.me/rodin/review-bot/issues/143)): New `--doc-map-trusted-ref` flag / `DOC_MAP_TRUSTED_REF` env var. When set, the doc-map YAML config is fetched from the specified VCS ref (e.g. `main`) via API instead of being read from the local workspace (the PR branch checkout). This prevents a malicious PR from modifying `.review-bot/doc-map.yml` to inject arbitrary design docs into the LLM prompt. When unset, the local workspace is used with a security warning in the logs.
### Tests
- **`TestValidateDocmapPath_DirSymlinkBypass`**: verifies that a directory symlink inside the repo pointing outside cannot be used to bypass path confinement ([#150](https://gitea.weiker.me/rodin/review-bot/issues/150)).
- **`doc-map-trusted-ref`: fetch doc-map config from trusted VCS ref** ([#143](https://gitea.weiker.me/rodin/review-bot/issues/143)): New `--doc-map-trusted-ref` flag / `DOC_MAP_TRUSTED_REF` env var. When set, the doc-map YAML config is fetched from the specified VCS ref (e.g. `main`) via API instead of being read from the local workspace (the PR branch checkout). This prevents a malicious PR from modifying `.review-bot/doc-map.yml` to inject arbitrary design docs into the LLM prompt. When unset, the local workspace is used with a security warning in the logs.
>>>>>>> 3222c76 (feat(#143): fetch doc-map config from trusted VCS ref)
### Added ### Added
- **`doc-map-trusted-ref` input** (`--doc-map-trusted-ref` flag / `DOC_MAP_TRUSTED_REF` env var): Git ref (branch, tag, or SHA) from which to fetch the doc-map config via VCS API. Recommended for all `doc-map` users. Example: `doc-map-trusted-ref: main`. ([#143](https://gitea.weiker.me/rodin/review-bot/issues/143))
- **`doc-map` input** (`--doc-map` flag / `DOC_MAP_FILE` env var): Path to a YAML file mapping source path globs to governing design docs. review-bot intersects the map with changed PR paths and injects matching docs into the system prompt under a `## Design Documents` heading. ([#137](https://gitea.weiker.me/rodin/review-bot/issues/137)) - **`doc-map` input** (`--doc-map` flag / `DOC_MAP_FILE` env var): Path to a YAML file mapping source path globs to governing design docs. review-bot intersects the map with changed PR paths and injects matching docs into the system prompt under a `## Design Documents` heading. ([#137](https://gitea.weiker.me/rodin/review-bot/issues/137))
- **`doc-map-max-bytes` input** (`--doc-map-max-bytes` flag / `DOC_MAP_MAX_BYTES` env var): Cap on total injected design doc content in bytes. Default: 102400 (100 KB). Prevents accidental context overflow when a PR touches many modules. - **`doc-map-max-bytes` input** (`--doc-map-max-bytes` flag / `DOC_MAP_MAX_BYTES` env var): Cap on total injected design doc content in bytes. Default: 102400 (100 KB). Prevents accidental context overflow when a PR touches many modules.
- **`DesignDocs` budget section**: Design docs are included in the context budget and trimmed after conventions, before file context, if the total exceeds the model's context limit. - **`DesignDocs` budget section**: Design docs are included in the context budget and trimmed after conventions, before file context, if the total exceeds the model's context limit.
+1
View File
@@ -210,6 +210,7 @@ AI Core handles OAuth token management and deployment discovery automatically. M
| `system-prompt-file` | No | `""` | Local file with additional system prompt instructions | | `system-prompt-file` | No | `""` | Local file with additional system prompt instructions |
| `doc-map` | No | `""` | Path to a YAML file mapping source path globs to governing design docs | | `doc-map` | No | `""` | Path to a YAML file mapping source path globs to governing design docs |
| `doc-map-max-bytes` | No | `102400` | Maximum bytes of injected doc content from doc-map (default 100KB) | | `doc-map-max-bytes` | No | `102400` | Maximum bytes of injected doc content from doc-map (default 100KB) |
| `doc-map-trusted-ref` | No | `""` | Git ref (e.g. `main`) to fetch the doc-map config from via VCS API instead of local workspace. **Recommended for security** — prevents a PR from modifying the doc-map config to inject arbitrary docs. |
| `persona` | No | `""` | Built-in persona name (security, architect, docs) | | `persona` | No | `""` | Built-in persona name (security, architect, docs) |
| `persona-file` | No | `""` | Path to persona file (YAML or JSON) with custom review focus | | `persona-file` | No | `""` | Path to persona file (YAML or JSON) with custom review focus |
| `temperature` | No | `0` | LLM temperature (0 = server default) | | `temperature` | No | `0` | LLM temperature (0 = server default) |
+44 -7
View File
@@ -101,6 +101,7 @@ func main() {
aicoreResourceGroup := flag.String("aicore-resource-group", envOrDefault("AICORE_RESOURCE_GROUP", "default"), "SAP AI Core resource group (for provider=aicore)") aicoreResourceGroup := flag.String("aicore-resource-group", envOrDefault("AICORE_RESOURCE_GROUP", "default"), "SAP AI Core resource group (for provider=aicore)")
docMapFile := flag.String("doc-map", envOrDefault("DOC_MAP_FILE", ""), "Path to YAML file mapping source path globs to governing design docs") docMapFile := flag.String("doc-map", envOrDefault("DOC_MAP_FILE", ""), "Path to YAML file mapping source path globs to governing design docs")
docMapMaxBytes := flag.Int("doc-map-max-bytes", envOrDefaultInt("DOC_MAP_MAX_BYTES", review.DefaultDocMapMaxBytes), "Maximum bytes of injected doc content (default 102400)") docMapMaxBytes := flag.Int("doc-map-max-bytes", envOrDefaultInt("DOC_MAP_MAX_BYTES", review.DefaultDocMapMaxBytes), "Maximum bytes of injected doc content (default 102400)")
docMapTrustedRef := flag.String("doc-map-trusted-ref", envOrDefault("DOC_MAP_TRUSTED_REF", ""), "Git ref (e.g. main) to fetch the doc-map config from via VCS API instead of local workspace. Recommended to prevent PR branch from controlling which docs are injected.")
flag.Parse() flag.Parse()
@@ -368,10 +369,45 @@ func main() {
// Step 6c: Load path-scoped design docs if doc-map specified // Step 6c: Load path-scoped design docs if doc-map specified
designDocs := "" designDocs := ""
if *docMapFile != "" { if *docMapFile != "" {
docMapCfg, err := review.ParseDocMapConfig(resolvedDocMapFile) var docMapCfg *review.DocMapConfig
if err != nil {
slog.Error("failed to parse doc-map file", "file", *docMapFile, "error", err) if *docMapTrustedRef != "" {
os.Exit(1) // Fetch doc-map config from a trusted VCS ref (e.g. the default branch).
// This prevents a malicious PR from modifying the doc-map config to
// inject arbitrary docs into the LLM prompt.
slog.Info("doc-map: fetching config from trusted ref",
"path", *docMapFile,
"ref", *docMapTrustedRef)
content, fetchErr := vcs.GetFileContentRef(ctx, owner, repoName, *docMapFile, *docMapTrustedRef)
if fetchErr != nil {
slog.Error("doc-map: failed to fetch config from trusted ref",
"path", *docMapFile,
"ref", *docMapTrustedRef,
"error", fetchErr)
os.Exit(1)
}
source := fmt.Sprintf("%s/%s@%s:%s", owner, repoName, *docMapTrustedRef, *docMapFile)
var parseErr error
docMapCfg, parseErr = review.ParseDocMapConfigContent(content, source)
if parseErr != nil {
slog.Error("doc-map: failed to parse fetched config",
"source", source,
"error", parseErr)
os.Exit(1)
}
} else {
// Local workspace fallback — the doc-map is read from the PR branch checkout.
// SECURITY WARNING: a malicious PR can modify this file to inject arbitrary
// docs. Set --doc-map-trusted-ref (or DOC_MAP_TRUSTED_REF) to a trusted ref
// (e.g. "main") to fetch the config from the default branch instead.
slog.Warn("doc-map: loading config from local workspace (PR branch) — " +
"set --doc-map-trusted-ref to fetch from a trusted ref for security")
var parseErr error
docMapCfg, parseErr = review.ParseDocMapConfig(resolvedDocMapFile)
if parseErr != nil {
slog.Error("failed to parse doc-map file", "file", *docMapFile, "error", parseErr)
os.Exit(1)
}
} }
// Collect changed file paths from the PR for intersection. // Collect changed file paths from the PR for intersection.
@@ -385,10 +421,11 @@ func main() {
if len(matchedDocs) > 0 { if len(matchedDocs) > 0 {
docMapOpts := review.DocMapOptions{MaxBytes: *docMapMaxBytes} docMapOpts := review.DocMapOptions{MaxBytes: *docMapMaxBytes}
designDocs, err = review.LoadMatchingDocs(ctx, vcs, owner, repoName, matchedDocs, docMapOpts) var loadErr error
if err != nil { designDocs, loadErr = review.LoadMatchingDocs(ctx, vcs, owner, repoName, matchedDocs, docMapOpts)
if loadErr != nil {
// Non-fatal: individual missing files are already warned; log and continue. // Non-fatal: individual missing files are already warned; log and continue.
slog.Warn("doc-map: partial failure loading docs", "error", err) slog.Warn("doc-map: partial failure loading docs", "error", loadErr)
} }
if designDocs != "" { if designDocs != "" {
slog.Info("doc-map: injected design docs", "matched", len(matchedDocs), "bytes", len(designDocs)) slog.Info("doc-map: injected design docs", "matched", len(matchedDocs), "bytes", len(designDocs))
+50 -56
View File
@@ -880,16 +880,9 @@ func TestMainSubprocess_MissingFlags(t *testing.T) {
func TestMainSubprocess_InvalidReviewerName(t *testing.T) { func TestMainSubprocess_InvalidReviewerName(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
os.Args = []string{"review-bot", os.Args = append(baseSubprocessArgs(),
"--gitea-url", "http://localhost",
"--repo", "owner/repo",
"--pr", "1",
"--reviewer-name", "invalid name", "--reviewer-name", "invalid name",
"--reviewer-token", "tok", )
"--llm-base-url", "http://localhost",
"--llm-api-key", "key",
"--llm-model", "model",
}
main() main()
return return
} }
@@ -908,15 +901,15 @@ func TestMainSubprocess_InvalidReviewerName(t *testing.T) {
func TestMainSubprocess_InvalidRepo(t *testing.T) { func TestMainSubprocess_InvalidRepo(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
os.Args = []string{"review-bot", args := baseSubprocessArgs()
"--gitea-url", "http://localhost", // Replace the canonical --repo value with an invalid one.
"--repo", "invalidrepo", for i, a := range args {
"--pr", "1", if a == "--repo" && i+1 < len(args) {
"--reviewer-token", "tok", args[i+1] = "invalidrepo"
"--llm-base-url", "http://localhost", break
"--llm-api-key", "key", }
"--llm-model", "model",
} }
os.Args = args
main() main()
return return
} }
@@ -935,15 +928,15 @@ func TestMainSubprocess_InvalidRepo(t *testing.T) {
func TestMainSubprocess_InvalidPRNumber(t *testing.T) { func TestMainSubprocess_InvalidPRNumber(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
os.Args = []string{"review-bot", args := baseSubprocessArgs()
"--gitea-url", "http://localhost", // Replace the canonical --pr value with a non-numeric string.
"--repo", "owner/repo", for i, a := range args {
"--pr", "notanumber", if a == "--pr" && i+1 < len(args) {
"--reviewer-token", "tok", args[i+1] = "notanumber"
"--llm-base-url", "http://localhost", break
"--llm-api-key", "key", }
"--llm-model", "model",
} }
os.Args = args
main() main()
return return
} }
@@ -962,16 +955,9 @@ func TestMainSubprocess_InvalidPRNumber(t *testing.T) {
func TestMainSubprocess_InvalidTemperature(t *testing.T) { func TestMainSubprocess_InvalidTemperature(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
os.Args = []string{"review-bot", os.Args = append(baseSubprocessArgs(),
"--gitea-url", "http://localhost",
"--repo", "owner/repo",
"--pr", "1",
"--reviewer-token", "tok",
"--llm-base-url", "http://localhost",
"--llm-api-key", "key",
"--llm-model", "model",
"--llm-temperature", "5.0", "--llm-temperature", "5.0",
} )
main() main()
return return
} }
@@ -990,16 +976,9 @@ func TestMainSubprocess_InvalidTemperature(t *testing.T) {
func TestMainSubprocess_InvalidProvider(t *testing.T) { func TestMainSubprocess_InvalidProvider(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
os.Args = []string{"review-bot", os.Args = append(baseSubprocessArgs(),
"--gitea-url", "http://localhost",
"--repo", "owner/repo",
"--pr", "1",
"--reviewer-token", "tok",
"--llm-base-url", "http://localhost",
"--llm-api-key", "key",
"--llm-model", "model",
"--llm-provider", "invalid-provider", "--llm-provider", "invalid-provider",
} )
main() main()
return return
} }
@@ -1015,6 +994,25 @@ func TestMainSubprocess_InvalidProvider(t *testing.T) {
} }
} }
// baseSubprocessArgs returns the base set of required flags for subprocess tests
// that need a fully-configured main() invocation. Each test appends its own
// test-specific flags on top of this base.
//
// Using a helper here means that when the set of required flags changes, only
// this function needs updating (instead of every test that passes all flags).
func baseSubprocessArgs() []string {
return []string{
"review-bot",
"--vcs-url", "https://gitea.example.com",
"--repo", "owner/repo",
"--pr", "1",
"--reviewer-token", "tok",
"--llm-base-url", "https://api.example.com",
"--llm-api-key", "key",
"--llm-model", "gpt-4",
}
}
// cleanEnv returns environ without any GITEA/LLM/REVIEWER/VCS env vars that would // cleanEnv returns environ without any GITEA/LLM/REVIEWER/VCS env vars that would
// interfere with testing missing-flag scenarios. // interfere with testing missing-flag scenarios.
func cleanEnv() []string { func cleanEnv() []string {
@@ -1389,13 +1387,14 @@ func TestFetchPatterns_MultipleRepos(t *testing.T) {
func TestMainSubprocess_MissingLLMBaseURL(t *testing.T) { func TestMainSubprocess_MissingLLMBaseURL(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
// Note: cannot use baseSubprocessArgs() here because --llm-base-url and
// --llm-api-key are intentionally omitted to test the missing-URL error.
os.Args = []string{"review-bot", os.Args = []string{"review-bot",
"--vcs-url", "https://gitea.example.com", "--vcs-url", "https://gitea.example.com",
"--repo", "owner/repo", "--repo", "owner/repo",
"--pr", "1", "--pr", "1",
"--reviewer-token", "tok", "--reviewer-token", "tok",
"--llm-model", "gpt-4", "--llm-model", "gpt-4",
// --llm-base-url and --llm-api-key intentionally omitted
} }
main() main()
return return
@@ -1417,6 +1416,8 @@ func TestMainSubprocess_MissingLLMBaseURL(t *testing.T) {
func TestMainSubprocess_MissingAICoreCredentials(t *testing.T) { func TestMainSubprocess_MissingAICoreCredentials(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
// Note: cannot use baseSubprocessArgs() here because aicore provider
// does not require --llm-base-url / --llm-api-key; those are omitted.
os.Args = []string{"review-bot", os.Args = []string{"review-bot",
"--vcs-url", "https://gitea.example.com", "--vcs-url", "https://gitea.example.com",
"--repo", "owner/repo", "--repo", "owner/repo",
@@ -1446,17 +1447,10 @@ func TestMainSubprocess_MissingAICoreCredentials(t *testing.T) {
func TestMainSubprocess_ConflictingPersonaFlags(t *testing.T) { func TestMainSubprocess_ConflictingPersonaFlags(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
os.Args = []string{"review-bot", os.Args = append(baseSubprocessArgs(),
"--vcs-url", "https://gitea.example.com",
"--repo", "owner/repo",
"--pr", "1",
"--reviewer-token", "tok",
"--llm-base-url", "https://api.example.com",
"--llm-api-key", "key",
"--llm-model", "gpt-4",
"--persona", "security", "--persona", "security",
"--persona-file", "custom.json", "--persona-file", "custom.json",
} )
main() main()
return return
} }
@@ -1477,9 +1471,9 @@ func TestMainSubprocess_ConflictingPersonaFlags(t *testing.T) {
func TestMainSubprocess_DeprecatedGiteaURLEnv(t *testing.T) { func TestMainSubprocess_DeprecatedGiteaURLEnv(t *testing.T) {
if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" { if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError) flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
// Set required flags but omit --vcs-url; GITEA_URL should be picked up. // Note: cannot use baseSubprocessArgs() here because --vcs-url must be
// The test will exit with an error after VCS init (no PR to fetch), but // omitted — this test verifies that GITEA_URL env var is picked up as a
// the deprecation warning must appear. // deprecated fallback when --vcs-url is absent.
os.Args = []string{"review-bot", os.Args = []string{"review-bot",
// No --vcs-url: should fall back to GITEA_URL env var // No --vcs-url: should fall back to GITEA_URL env var
"--repo", "owner/repo", "--repo", "owner/repo",
+19 -8
View File
@@ -37,22 +37,33 @@ func validateDocmapPath(localPath, resolvedRoot string) error {
return fmt.Errorf("cannot resolve path: %w", err) return fmt.Errorf("cannot resolve path: %w", err)
} }
// Lstat: do NOT follow symlinks. We want to inspect the entry itself. // Resolve ALL symlink components, not just the final one.
fi, err := os.Lstat(absPath) // os.Lstat only avoids following the *final* path component; intermediate
// directory symlinks are still followed. EvalSymlinks resolves every
// component, closing the directory-symlink bypass: a PR that commits
// .review-bot/ as a directory symlink pointing outside the repo would
// otherwise pass the filepath.Rel confinement check because the textual
// path is inside the root while the actual destination is not.
resolvedPath, err := filepath.EvalSymlinks(absPath)
if err != nil {
return fmt.Errorf("cannot resolve path (symlink): %w", err)
}
// Lstat the resolved path — at this point resolvedPath is symlink-free, so
// ModeSymlink will never be set. We keep the check as defense-in-depth.
fi, err := os.Lstat(resolvedPath)
if err != nil { if err != nil {
return fmt.Errorf("cannot stat file: %w", err) return fmt.Errorf("cannot stat file: %w", err)
} }
// Reject symlinks outright — a symlink can point to /dev/zero or arbitrary // Defense-in-depth: reject any remaining symlink indicator.
// host paths, bypassing both the confinement check and the size check.
if fi.Mode()&os.ModeSymlink != 0 { if fi.Mode()&os.ModeSymlink != 0 {
return fmt.Errorf("symlinks are not allowed") return fmt.Errorf("symlinks are not allowed")
} }
// Confine to resolvedRoot: the cleaned absolute path must be a descendant // Confine to resolvedRoot: use the fully-resolved path so that a directory
// of the repo root. This catches paths that escaped via "..", absolute // symlink inside the repo cannot carry the path outside the root.
// paths that happen to be outside the root, etc. rel, err := filepath.Rel(resolvedRoot, resolvedPath)
rel, err := filepath.Rel(resolvedRoot, absPath)
if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(os.PathSeparator)) { if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(os.PathSeparator)) {
return fmt.Errorf("path must be within --repo-root") return fmt.Errorf("path must be within --repo-root")
} }
+64 -15
View File
@@ -465,23 +465,34 @@ mappings:
} }
// TestValidateDocmapPath_Symlink verifies that --docmap pointing at a symlink // TestValidateDocmapPath_Symlink verifies that --docmap pointing at a symlink
// is rejected before the file is read (prevents /dev/zero DOS or arbitrary // whose resolved target is outside --repo-root is rejected (prevents reading
// host-file reads via PR-controlled symlinks). // arbitrary host files via PR-controlled symlinks).
//
// Note: after the EvalSymlinks fix (issue #150), in-repo symlinks whose
// targets also reside within the repo root are now allowed — the confinement
// check is applied to the resolved path, not the symlink entry itself. The
// security invariant is: the resolved destination must be within the root.
func TestValidateDocmapPath_Symlink(t *testing.T) { func TestValidateDocmapPath_Symlink(t *testing.T) {
dir := t.TempDir() dir := t.TempDir()
outside := t.TempDir()
// Create a real docmap file to serve as the symlink target. // Create a docmap file OUTSIDE the repo root to serve as the symlink
realDocmap := makeDocmapInDir(t, dir, ` // target. EvalSymlinks will resolve to this path, which the Rel check
mappings: // must then reject.
- paths: if err := os.MkdirAll(filepath.Join(outside, ".review-bot"), 0o755); err != nil {
- "lib/**" t.Fatalf("MkdirAll: %v", err)
docs: }
- docs/foo.md outsideDocmap := filepath.Join(outside, ".review-bot", "doc-map.yml")
`) if err := os.WriteFile(outsideDocmap, []byte("mappings: []\n"), 0o644); err != nil {
t.Fatalf("WriteFile: %v", err)
}
// Create a symlink inside dir pointing to the real docmap. // Create a symlink inside dir pointing to the file outside the repo.
if err := os.MkdirAll(filepath.Join(dir, ".review-bot"), 0o755); err != nil {
t.Fatalf("MkdirAll: %v", err)
}
symlinkPath := filepath.Join(dir, ".review-bot", "doc-map-link.yml") symlinkPath := filepath.Join(dir, ".review-bot", "doc-map-link.yml")
if err := os.Symlink(realDocmap, symlinkPath); err != nil { if err := os.Symlink(outsideDocmap, symlinkPath); err != nil {
t.Fatalf("Symlink: %v", err) t.Fatalf("Symlink: %v", err)
} }
@@ -490,10 +501,10 @@ mappings:
[]string{"--docmap", symlinkPath, "--repo-root", dir}, []string{"--docmap", symlinkPath, "--repo-root", dir},
) )
if code != 2 { if code != 2 {
t.Errorf("expected exit 2 for symlink docmap, got %d; stderr: %q", code, stderr) t.Errorf("expected exit 2 for out-of-repo symlink docmap, got %d; stderr: %q", code, stderr)
} }
if !strings.Contains(stderr, "symlink") && !strings.Contains(stderr, "invalid") { if !strings.Contains(stderr, "invalid") && !strings.Contains(stderr, "repo-root") {
t.Errorf("expected symlink rejection in stderr, got %q", stderr) t.Errorf("expected confinement rejection in stderr, got %q", stderr)
} }
} }
@@ -550,3 +561,41 @@ func TestValidateDocmapPath_SizeLimit(t *testing.T) {
t.Errorf("expected size limit error in stderr, got %q", stderr) t.Errorf("expected size limit error in stderr, got %q", stderr)
} }
} }
// TestValidateDocmapPath_DirSymlinkBypass verifies that a directory-symlink
// inside the repo pointing outside cannot be used to read arbitrary host files.
//
// Attack vector: a PR commits .review-bot/ as a directory symlink targeting a
// directory outside the repo. The textual path of the docmap file is inside
// the repo root, so the old Rel-only check passed — but the actual file is
// outside. This is closed by calling EvalSymlinks on the full path before the
// confinement check.
func TestValidateDocmapPath_DirSymlinkBypass(t *testing.T) {
repoDir := t.TempDir()
outsideDir := t.TempDir()
// Secret file outside the repo.
secretPath := filepath.Join(outsideDir, "secret.yml")
if err := os.WriteFile(secretPath, []byte("mappings: []\n"), 0o644); err != nil {
t.Fatalf("WriteFile: %v", err)
}
// Create .review-bot/ as a directory symlink pointing outside the repo.
reviewBotDir := filepath.Join(repoDir, ".review-bot")
if err := os.Symlink(outsideDir, reviewBotDir); err != nil {
t.Skipf("cannot create dir symlink (platform may not support it): %v", err)
}
// Textually inside repo — .review-bot/secret.yml — but resolves outside.
attackPath := filepath.Join(repoDir, ".review-bot", "secret.yml")
// Resolve repoDir to a symlink-free path, as runValidateDocmap does.
resolvedRoot, err := filepath.EvalSymlinks(repoDir)
if err != nil {
t.Fatalf("EvalSymlinks(repoDir): %v", err)
}
if err := validateDocmapPath(attackPath, resolvedRoot); err == nil {
t.Error("expected rejection of dir-symlink bypass, got nil error")
}
}
+3 -3
View File
@@ -15,9 +15,9 @@ func TestIsBlockedIPForwarding(t *testing.T) {
ip string ip string
blocked bool blocked bool
}{ }{
{"127.0.0.1", true}, // loopback — must be blocked {"127.0.0.1", true}, // loopback — must be blocked
{"192.168.1.1", true}, // RFC1918 — must be blocked {"192.168.1.1", true}, // RFC1918 — must be blocked
{"8.8.8.8", false}, // public — must not be blocked {"8.8.8.8", false}, // public — must not be blocked
{"2001:4860:4860::8888", false}, // public IPv6 — must not be blocked {"2001:4860:4860::8888", false}, // public IPv6 — must not be blocked
} }
for _, tc := range cases { for _, tc := range cases {
+18 -2
View File
@@ -52,15 +52,31 @@ func ParseDocMapConfig(localPath string) (*DocMapConfig, error) {
if err != nil { if err != nil {
return nil, fmt.Errorf("read doc-map file %q: %w", localPath, err) return nil, fmt.Errorf("read doc-map file %q: %w", localPath, err)
} }
return parseDocMapBytes(data, localPath)
}
// ParseDocMapConfigContent parses a doc-map YAML config from an in-memory
// string. The source parameter is used only for error messages and log entries
// (e.g. "main:main@<ref>").
//
// Use this when the config content has been fetched from a trusted VCS ref
// rather than read from the local workspace.
func ParseDocMapConfigContent(content, source string) (*DocMapConfig, error) {
data := []byte(content)
return parseDocMapBytes(data, source)
}
// parseDocMapBytes is the shared YAML parse implementation used by
// ParseDocMapConfig and ParseDocMapConfigContent.
func parseDocMapBytes(data []byte, source string) (*DocMapConfig, error) {
var cfg DocMapConfig var cfg DocMapConfig
if err := yaml.UnmarshalWithOptions(data, &cfg, yaml.Strict()); err != nil { if err := yaml.UnmarshalWithOptions(data, &cfg, yaml.Strict()); err != nil {
// Re-parse without strict mode to log which keys are unknown. // Re-parse without strict mode to log which keys are unknown.
var relaxed DocMapConfig var relaxed DocMapConfig
if err2 := yaml.Unmarshal(data, &relaxed); err2 != nil { if err2 := yaml.Unmarshal(data, &relaxed); err2 != nil {
return nil, fmt.Errorf("parse doc-map YAML %q: %w", localPath, err) return nil, fmt.Errorf("parse doc-map YAML %q: %w", source, err)
} }
slog.Warn("doc-map YAML contains unknown keys (ignored)", "file", localPath, "error", err) slog.Warn("doc-map YAML contains unknown keys (ignored)", "file", source, "error", err)
cfg = relaxed cfg = relaxed
} }
return &cfg, nil return &cfg, nil
+60
View File
@@ -510,3 +510,63 @@ func TestFileCoveredByDocMap_EmptyConfig(t *testing.T) {
t.Error("expected false for empty config, got true") t.Error("expected false for empty config, got true")
} }
} }
// ============================================================
// ParseDocMapConfigContent
// ============================================================
func TestParseDocMapConfigContent_Valid(t *testing.T) {
content := `
mappings:
- paths:
- "lib/foo/**"
docs:
- docs/foo.md
`
cfg, err := ParseDocMapConfigContent(content, "owner/repo@main:.review-bot/doc-map.yml")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(cfg.Mappings) != 1 {
t.Fatalf("expected 1 mapping, got %d", len(cfg.Mappings))
}
if len(cfg.Mappings[0].Docs) != 1 || cfg.Mappings[0].Docs[0] != "docs/foo.md" {
t.Errorf("unexpected mapping: %+v", cfg.Mappings[0])
}
}
func TestParseDocMapConfigContent_EmptyContent(t *testing.T) {
cfg, err := ParseDocMapConfigContent("", "test-source")
if err != nil {
t.Fatalf("unexpected error for empty content: %v", err)
}
if len(cfg.Mappings) != 0 {
t.Errorf("expected 0 mappings for empty content, got %d", len(cfg.Mappings))
}
}
func TestParseDocMapConfigContent_InvalidYAML(t *testing.T) {
_, err := ParseDocMapConfigContent("mappings: [{{invalid", "test-source")
if err == nil {
t.Fatal("expected error for invalid YAML, got nil")
}
}
func TestParseDocMapConfigContent_UnknownKeys(t *testing.T) {
content := `
mappings:
- paths:
- "lib/**"
docs:
- docs/foo.md
unknown_top_level_key: "should be warned but not fatal"
`
// Unknown top-level keys produce a warning but not an error.
cfg, err := ParseDocMapConfigContent(content, "test-source")
if err != nil {
t.Fatalf("unexpected error for unknown keys: %v", err)
}
if len(cfg.Mappings) == 0 {
t.Error("expected mappings to be parsed despite unknown key")
}
}