feat(#143 ): fetch doc-map config from trusted VCS ref

The doc-map YAML config was previously read from the local workspace (the PR branch checkout). A malicious PR author could modify .review-bot/doc-map.yml to map any path glob to sensitive design docs, causing review-bot to fetch and inject those docs into the LLM prompt. Fix: add --doc-map-trusted-ref (DOC_MAP_TRUSTED_REF) flag. When set to a trusted ref (e.g. 'main'), the doc-map config is fetched from the VCS API at that ref instead of from local workspace. A 404 from VCS is a hard error (no silent fallback to local copy). When unset, the local workspace is used with a security warning in the logs pointing operators to the new flag. Changes: - review/docmap.go: add ParseDocMapConfigContent + parseDocMapBytes helper to parse from in-memory content (fetched via VCS API) - cmd/review-bot/main.go: add --doc-map-trusted-ref flag; Step 6c branches on trusted-ref to fetch vs local-workspace load - .gitea/actions/review/action.yml: add doc-map-trusted-ref input - README.md: document new input - CHANGELOG.md: security and feature entries Tests: - TestParseDocMapConfigContent_Valid/Empty/InvalidYAML/UnknownKeys in review/docmap_test.go Coverage: 53.0% cmd/review-bot
chore(fmt): align test comments in gitea/ipcheck_test.go
2026-05-15 10:39:43 +00:00 · 2026-05-15 10:23:11 +00:00 · 2026-05-15 10:18:34 +00:00
8 changed files with 194 additions and 68 deletions
@@ -141,6 +141,16 @@ inputs:
    description: 'Maximum bytes of injected doc content from doc-map (default 102400 = 100KB)'
    required: false
    default: '102400'
  doc-map-trusted-ref:
    description: >-
      Git ref (branch, tag, or SHA) from which to fetch the doc-map config file
      via VCS API instead of reading it from the local workspace. Recommended
      when using doc-map: set this to the default branch (e.g. 'main') so a
      malicious PR cannot modify the doc-map config to inject arbitrary design
      docs into the LLM prompt. When unset, the config is read from the local
      workspace (the PR branch) with a security warning in the logs.
    required: false
    default: ''
 runs:
  using: 'composite'
@@ -507,6 +517,7 @@ runs:
        PERSONA_FILE: ${{ inputs.persona-file }}
        DOC_MAP_FILE: ${{ inputs.doc-map }}
        DOC_MAP_MAX_BYTES: ${{ inputs.doc-map-max-bytes }}
        DOC_MAP_TRUSTED_REF: ${{ inputs.doc-map-trusted-ref }}
        AICORE_CLIENT_ID: ${{ inputs.aicore-client-id }}
        AICORE_CLIENT_SECRET: ${{ inputs.aicore-client-secret }}
        AICORE_AUTH_URL: ${{ inputs.aicore-auth-url }}
@@ -6,12 +6,19 @@
 - **`validateDocmapPath`: add `EvalSymlinks` to close directory-symlink bypass** ([#150](https://gitea.weiker.me/rodin/review-bot/issues/150)): The previous implementation used `os.Lstat` which only avoids following the *final* path component. An intermediate directory symlink (e.g. `.review-bot/` committed as a symlink to a directory outside the repo) would pass the path-confinement check because the textual path appeared within the repo root. `filepath.EvalSymlinks` is now called first, resolving all symlink components before the `filepath.Rel` confinement check. In-repo symlinks whose resolved targets also reside within the repo root are now allowed; out-of-repo targets are rejected by the confinement check.
 - **`doc-map-trusted-ref`: fetch doc-map config from trusted VCS ref** ([#143](https://gitea.weiker.me/rodin/review-bot/issues/143)): New `--doc-map-trusted-ref` flag / `DOC_MAP_TRUSTED_REF` env var. When set, the doc-map YAML config is fetched from the specified VCS ref (e.g. `main`) via API instead of being read from the local workspace (the PR branch checkout). This prevents a malicious PR from modifying `.review-bot/doc-map.yml` to inject arbitrary design docs into the LLM prompt. When unset, the local workspace is used with a security warning in the logs.
 ### Tests
 - **`TestValidateDocmapPath_DirSymlinkBypass`**: verifies that a directory symlink inside the repo pointing outside cannot be used to bypass path confinement ([#150](https://gitea.weiker.me/rodin/review-bot/issues/150)).
 - **`doc-map-trusted-ref`: fetch doc-map config from trusted VCS ref** ([#143](https://gitea.weiker.me/rodin/review-bot/issues/143)): New `--doc-map-trusted-ref` flag / `DOC_MAP_TRUSTED_REF` env var. When set, the doc-map YAML config is fetched from the specified VCS ref (e.g. `main`) via API instead of being read from the local workspace (the PR branch checkout). This prevents a malicious PR from modifying `.review-bot/doc-map.yml` to inject arbitrary design docs into the LLM prompt. When unset, the local workspace is used with a security warning in the logs.
 >>>>>>> 3222c76 (feat(#143): fetch doc-map config from trusted VCS ref)
 ### Added
 - **`doc-map-trusted-ref` input** (`--doc-map-trusted-ref` flag / `DOC_MAP_TRUSTED_REF` env var): Git ref (branch, tag, or SHA) from which to fetch the doc-map config via VCS API. Recommended for all `doc-map` users. Example: `doc-map-trusted-ref: main`. ([#143](https://gitea.weiker.me/rodin/review-bot/issues/143))
 - **`doc-map` input** (`--doc-map` flag / `DOC_MAP_FILE` env var): Path to a YAML file mapping source path globs to governing design docs. review-bot intersects the map with changed PR paths and injects matching docs into the system prompt under a `## Design Documents` heading. ([#137](https://gitea.weiker.me/rodin/review-bot/issues/137))
 - **`doc-map-max-bytes` input** (`--doc-map-max-bytes` flag / `DOC_MAP_MAX_BYTES` env var): Cap on total injected design doc content in bytes. Default: 102400 (100 KB). Prevents accidental context overflow when a PR touches many modules.
 - **`DesignDocs` budget section**: Design docs are included in the context budget and trimmed after conventions, before file context, if the total exceeds the model's context limit.
@@ -210,6 +210,7 @@ AI Core handles OAuth token management and deployment discovery automatically. M
 | `system-prompt-file` | No | `""` | Local file with additional system prompt instructions |
 | `doc-map` | No | `""` | Path to a YAML file mapping source path globs to governing design docs |
 | `doc-map-max-bytes` | No | `102400` | Maximum bytes of injected doc content from doc-map (default 100KB) |
 | `doc-map-trusted-ref` | No | `""` | Git ref (e.g. `main`) to fetch the doc-map config from via VCS API instead of local workspace. **Recommended for security** — prevents a PR from modifying the doc-map config to inject arbitrary docs. |
 | `persona` | No | `""` | Built-in persona name (security, architect, docs) |
 | `persona-file` | No | `""` | Path to persona file (YAML or JSON) with custom review focus |
 | `temperature` | No | `0` | LLM temperature (0 = server default) |
@@ -101,6 +101,7 @@ func main() {
 	aicoreResourceGroup := flag.String("aicore-resource-group", envOrDefault("AICORE_RESOURCE_GROUP", "default"), "SAP AI Core resource group (for provider=aicore)")
 	docMapFile := flag.String("doc-map", envOrDefault("DOC_MAP_FILE", ""), "Path to YAML file mapping source path globs to governing design docs")
 	docMapMaxBytes := flag.Int("doc-map-max-bytes", envOrDefaultInt("DOC_MAP_MAX_BYTES", review.DefaultDocMapMaxBytes), "Maximum bytes of injected doc content (default 102400)")
 	docMapTrustedRef := flag.String("doc-map-trusted-ref", envOrDefault("DOC_MAP_TRUSTED_REF", ""), "Git ref (e.g. main) to fetch the doc-map config from via VCS API instead of local workspace. Recommended to prevent PR branch from controlling which docs are injected.")
 	flag.Parse()
@@ -368,10 +369,45 @@ func main() {
 	// Step 6c: Load path-scoped design docs if doc-map specified
 	designDocs := ""
 	if *docMapFile != "" {
-		docMapCfg, err := review.ParseDocMapConfig(resolvedDocMapFile)
+		var docMapCfg *review.DocMapConfig
-		if err != nil {
+
-			slog.Error("failed to parse doc-map file", "file", *docMapFile, "error", err)
+		if *docMapTrustedRef != "" {
-			os.Exit(1)
+			// Fetch doc-map config from a trusted VCS ref (e.g. the default branch).
 			// This prevents a malicious PR from modifying the doc-map config to
 			// inject arbitrary docs into the LLM prompt.
 			slog.Info("doc-map: fetching config from trusted ref",
 				"path", *docMapFile,
 				"ref", *docMapTrustedRef)
 			content, fetchErr := vcs.GetFileContentRef(ctx, owner, repoName, *docMapFile, *docMapTrustedRef)
 			if fetchErr != nil {
 				slog.Error("doc-map: failed to fetch config from trusted ref",
 					"path", *docMapFile,
 					"ref", *docMapTrustedRef,
 					"error", fetchErr)
 				os.Exit(1)
 			}
 			source := fmt.Sprintf("%s/%s@%s:%s", owner, repoName, *docMapTrustedRef, *docMapFile)
 			var parseErr error
 			docMapCfg, parseErr = review.ParseDocMapConfigContent(content, source)
 			if parseErr != nil {
 				slog.Error("doc-map: failed to parse fetched config",
 					"source", source,
 					"error", parseErr)
 				os.Exit(1)
 			}
 		} else {
 			// Local workspace fallback — the doc-map is read from the PR branch checkout.
 			// SECURITY WARNING: a malicious PR can modify this file to inject arbitrary
 			// docs. Set --doc-map-trusted-ref (or DOC_MAP_TRUSTED_REF) to a trusted ref
 			// (e.g. "main") to fetch the config from the default branch instead.
 			slog.Warn("doc-map: loading config from local workspace (PR branch) — " +
 				"set --doc-map-trusted-ref to fetch from a trusted ref for security")
 			var parseErr error
 			docMapCfg, parseErr = review.ParseDocMapConfig(resolvedDocMapFile)
 			if parseErr != nil {
 				slog.Error("failed to parse doc-map file", "file", *docMapFile, "error", parseErr)
 				os.Exit(1)
 			}
 		}
 		// Collect changed file paths from the PR for intersection.
@@ -385,10 +421,11 @@ func main() {
 		if len(matchedDocs) > 0 {
 			docMapOpts := review.DocMapOptions{MaxBytes: *docMapMaxBytes}
-			designDocs, err = review.LoadMatchingDocs(ctx, vcs, owner, repoName, matchedDocs, docMapOpts)
+			var loadErr error
-			if err != nil {
+			designDocs, loadErr = review.LoadMatchingDocs(ctx, vcs, owner, repoName, matchedDocs, docMapOpts)
 			if loadErr != nil {
 				// Non-fatal: individual missing files are already warned; log and continue.
-				slog.Warn("doc-map: partial failure loading docs", "error", err)
+				slog.Warn("doc-map: partial failure loading docs", "error", loadErr)
 			}
 			if designDocs != "" {
 				slog.Info("doc-map: injected design docs", "matched", len(matchedDocs), "bytes", len(designDocs))
@@ -880,16 +880,9 @@ func TestMainSubprocess_MissingFlags(t *testing.T) {
 func TestMainSubprocess_InvalidReviewerName(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
-		os.Args = []string{"review-bot",
+		os.Args = append(baseSubprocessArgs(),
 			"--gitea-url", "http://localhost",
 			"--repo", "owner/repo",
 			"--pr", "1",
 			"--reviewer-name", "invalid name",
-			"--reviewer-token", "tok",
+		)
 			"--llm-base-url", "http://localhost",
 			"--llm-api-key", "key",
 			"--llm-model", "model",
 		}
 		main()
 		return
 	}
@@ -908,15 +901,15 @@ func TestMainSubprocess_InvalidReviewerName(t *testing.T) {
 func TestMainSubprocess_InvalidRepo(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
-		os.Args = []string{"review-bot",
+		args := baseSubprocessArgs()
-			"--gitea-url", "http://localhost",
+		// Replace the canonical --repo value with an invalid one.
-			"--repo", "invalidrepo",
+		for i, a := range args {
-			"--pr", "1",
+			if a == "--repo" && i+1 < len(args) {
-			"--reviewer-token", "tok",
+				args[i+1] = "invalidrepo"
-			"--llm-base-url", "http://localhost",
+				break
-			"--llm-api-key", "key",
+			}
 			"--llm-model", "model",
 		}
 		os.Args = args
 		main()
 		return
 	}
@@ -935,15 +928,15 @@ func TestMainSubprocess_InvalidRepo(t *testing.T) {
 func TestMainSubprocess_InvalidPRNumber(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
-		os.Args = []string{"review-bot",
+		args := baseSubprocessArgs()
-			"--gitea-url", "http://localhost",
+		// Replace the canonical --pr value with a non-numeric string.
-			"--repo", "owner/repo",
+		for i, a := range args {
-			"--pr", "notanumber",
+			if a == "--pr" && i+1 < len(args) {
-			"--reviewer-token", "tok",
+				args[i+1] = "notanumber"
-			"--llm-base-url", "http://localhost",
+				break
-			"--llm-api-key", "key",
+			}
 			"--llm-model", "model",
 		}
 		os.Args = args
 		main()
 		return
 	}
@@ -962,16 +955,9 @@ func TestMainSubprocess_InvalidPRNumber(t *testing.T) {
 func TestMainSubprocess_InvalidTemperature(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
-		os.Args = []string{"review-bot",
+		os.Args = append(baseSubprocessArgs(),
 			"--gitea-url", "http://localhost",
 			"--repo", "owner/repo",
 			"--pr", "1",
 			"--reviewer-token", "tok",
 			"--llm-base-url", "http://localhost",
 			"--llm-api-key", "key",
 			"--llm-model", "model",
 			"--llm-temperature", "5.0",
-		}
+		)
 		main()
 		return
 	}
@@ -990,16 +976,9 @@ func TestMainSubprocess_InvalidTemperature(t *testing.T) {
 func TestMainSubprocess_InvalidProvider(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
-		os.Args = []string{"review-bot",
+		os.Args = append(baseSubprocessArgs(),
 			"--gitea-url", "http://localhost",
 			"--repo", "owner/repo",
 			"--pr", "1",
 			"--reviewer-token", "tok",
 			"--llm-base-url", "http://localhost",
 			"--llm-api-key", "key",
 			"--llm-model", "model",
 			"--llm-provider", "invalid-provider",
-		}
+		)
 		main()
 		return
 	}
@@ -1015,6 +994,25 @@ func TestMainSubprocess_InvalidProvider(t *testing.T) {
 	}
 }
 // baseSubprocessArgs returns the base set of required flags for subprocess tests
 // that need a fully-configured main() invocation. Each test appends its own
 // test-specific flags on top of this base.
 //
 // Using a helper here means that when the set of required flags changes, only
 // this function needs updating (instead of every test that passes all flags).
 func baseSubprocessArgs() []string {
 	return []string{
 		"review-bot",
 		"--vcs-url", "https://gitea.example.com",
 		"--repo", "owner/repo",
 		"--pr", "1",
 		"--reviewer-token", "tok",
 		"--llm-base-url", "https://api.example.com",
 		"--llm-api-key", "key",
 		"--llm-model", "gpt-4",
 	}
 }
 // cleanEnv returns environ without any GITEA/LLM/REVIEWER/VCS env vars that would
 // interfere with testing missing-flag scenarios.
 func cleanEnv() []string {
@@ -1389,13 +1387,14 @@ func TestFetchPatterns_MultipleRepos(t *testing.T) {
 func TestMainSubprocess_MissingLLMBaseURL(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
 		// Note: cannot use baseSubprocessArgs() here because --llm-base-url and
 		// --llm-api-key are intentionally omitted to test the missing-URL error.
 		os.Args = []string{"review-bot",
 			"--vcs-url", "https://gitea.example.com",
 			"--repo", "owner/repo",
 			"--pr", "1",
 			"--reviewer-token", "tok",
 			"--llm-model", "gpt-4",
 			// --llm-base-url and --llm-api-key intentionally omitted
 		}
 		main()
 		return
@@ -1417,6 +1416,8 @@ func TestMainSubprocess_MissingLLMBaseURL(t *testing.T) {
 func TestMainSubprocess_MissingAICoreCredentials(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
 		// Note: cannot use baseSubprocessArgs() here because aicore provider
 		// does not require --llm-base-url / --llm-api-key; those are omitted.
 		os.Args = []string{"review-bot",
 			"--vcs-url", "https://gitea.example.com",
 			"--repo", "owner/repo",
@@ -1446,17 +1447,10 @@ func TestMainSubprocess_MissingAICoreCredentials(t *testing.T) {
 func TestMainSubprocess_ConflictingPersonaFlags(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
-		os.Args = []string{"review-bot",
+		os.Args = append(baseSubprocessArgs(),
 			"--vcs-url", "https://gitea.example.com",
 			"--repo", "owner/repo",
 			"--pr", "1",
 			"--reviewer-token", "tok",
 			"--llm-base-url", "https://api.example.com",
 			"--llm-api-key", "key",
 			"--llm-model", "gpt-4",
 			"--persona", "security",
 			"--persona-file", "custom.json",
-		}
+		)
 		main()
 		return
 	}
@@ -1477,9 +1471,9 @@ func TestMainSubprocess_ConflictingPersonaFlags(t *testing.T) {
 func TestMainSubprocess_DeprecatedGiteaURLEnv(t *testing.T) {
 	if os.Getenv("TEST_SUBPROCESS_MAIN") == "1" {
 		flag.CommandLine = flag.NewFlagSet(os.Args[0], flag.ExitOnError)
-		// Set required flags but omit --vcs-url; GITEA_URL should be picked up.
+		// Note: cannot use baseSubprocessArgs() here because --vcs-url must be
-		// The test will exit with an error after VCS init (no PR to fetch), but
+		// omitted — this test verifies that GITEA_URL env var is picked up as a
-		// the deprecation warning must appear.
+		// deprecated fallback when --vcs-url is absent.
 		os.Args = []string{"review-bot",
 			// No --vcs-url: should fall back to GITEA_URL env var
 			"--repo", "owner/repo",
@@ -15,9 +15,9 @@ func TestIsBlockedIPForwarding(t *testing.T) {
 		ip      string
 		blocked bool
 	}{
-		{"127.0.0.1", true},        // loopback — must be blocked
+		{"127.0.0.1", true},             // loopback — must be blocked
-		{"192.168.1.1", true},      // RFC1918 — must be blocked
+		{"192.168.1.1", true},           // RFC1918 — must be blocked
-		{"8.8.8.8", false},         // public — must not be blocked
+		{"8.8.8.8", false},              // public — must not be blocked
 		{"2001:4860:4860::8888", false}, // public IPv6 — must not be blocked
 	}
 	for _, tc := range cases {
@@ -52,15 +52,31 @@ func ParseDocMapConfig(localPath string) (*DocMapConfig, error) {
 	if err != nil {
 		return nil, fmt.Errorf("read doc-map file %q: %w", localPath, err)
 	}
 	return parseDocMapBytes(data, localPath)
 }
 // ParseDocMapConfigContent parses a doc-map YAML config from an in-memory
 // string. The source parameter is used only for error messages and log entries
 // (e.g. "main:main@<ref>").
 //
 // Use this when the config content has been fetched from a trusted VCS ref
 // rather than read from the local workspace.
 func ParseDocMapConfigContent(content, source string) (*DocMapConfig, error) {
 	data := []byte(content)
 	return parseDocMapBytes(data, source)
 }
 // parseDocMapBytes is the shared YAML parse implementation used by
 // ParseDocMapConfig and ParseDocMapConfigContent.
 func parseDocMapBytes(data []byte, source string) (*DocMapConfig, error) {
 	var cfg DocMapConfig
 	if err := yaml.UnmarshalWithOptions(data, &cfg, yaml.Strict()); err != nil {
 		// Re-parse without strict mode to log which keys are unknown.
 		var relaxed DocMapConfig
 		if err2 := yaml.Unmarshal(data, &relaxed); err2 != nil {
-			return nil, fmt.Errorf("parse doc-map YAML %q: %w", localPath, err)
+			return nil, fmt.Errorf("parse doc-map YAML %q: %w", source, err)
 		}
-		slog.Warn("doc-map YAML contains unknown keys (ignored)", "file", localPath, "error", err)
+		slog.Warn("doc-map YAML contains unknown keys (ignored)", "file", source, "error", err)
 		cfg = relaxed
 	}
 	return &cfg, nil
@@ -510,3 +510,63 @@ func TestFileCoveredByDocMap_EmptyConfig(t *testing.T) {
 		t.Error("expected false for empty config, got true")
 	}
 }
 // ============================================================
 // ParseDocMapConfigContent
 // ============================================================
 func TestParseDocMapConfigContent_Valid(t *testing.T) {
 	content := `
 mappings:
  - paths:
      - "lib/foo/**"
    docs:
      - docs/foo.md
 `
 	cfg, err := ParseDocMapConfigContent(content, "owner/repo@main:.review-bot/doc-map.yml")
 	if err != nil {
 		t.Fatalf("unexpected error: %v", err)
 	}
 	if len(cfg.Mappings) != 1 {
 		t.Fatalf("expected 1 mapping, got %d", len(cfg.Mappings))
 	}
 	if len(cfg.Mappings[0].Docs) != 1 || cfg.Mappings[0].Docs[0] != "docs/foo.md" {
 		t.Errorf("unexpected mapping: %+v", cfg.Mappings[0])
 	}
 }
 func TestParseDocMapConfigContent_EmptyContent(t *testing.T) {
 	cfg, err := ParseDocMapConfigContent("", "test-source")
 	if err != nil {
 		t.Fatalf("unexpected error for empty content: %v", err)
 	}
 	if len(cfg.Mappings) != 0 {
 		t.Errorf("expected 0 mappings for empty content, got %d", len(cfg.Mappings))
 	}
 }
 func TestParseDocMapConfigContent_InvalidYAML(t *testing.T) {
 	_, err := ParseDocMapConfigContent("mappings: [{{invalid", "test-source")
 	if err == nil {
 		t.Fatal("expected error for invalid YAML, got nil")
 	}
 }
 func TestParseDocMapConfigContent_UnknownKeys(t *testing.T) {
 	content := `
 mappings:
  - paths:
      - "lib/**"
    docs:
      - docs/foo.md
 unknown_top_level_key: "should be warned but not fatal"
 `
 	// Unknown top-level keys produce a warning but not an error.
 	cfg, err := ParseDocMapConfigContent(content, "test-source")
 	if err != nil {
 		t.Fatalf("unexpected error for unknown keys: %v", err)
 	}
 	if len(cfg.Mappings) == 0 {
 		t.Error("expected mappings to be parsed despite unknown key")
 	}
 }