The binary detects VCS type from VCS_TYPE env var, but action.yml did not
pass it to the Run review step. This caused the binary to fall back to a
URL heuristic (github.com substring), which misclassifies GitHub Enterprise
Server hosts whose URL does not contain 'github'.
The 'Determine version' step already outputs vcs_type — wire it through to
the Run review env block so explicit VCS_TYPE always takes precedence.
- New --doc-map flag (DOC_MAP_FILE env var): path to YAML config mapping
source path globs to governing design docs
- New --doc-map-max-bytes flag (DOC_MAP_MAX_BYTES env var): cap on total
injected doc content, default 100KB
- review/docmap.go: DocMapConfig parsing, glob matching with ** support,
doc loading via VCS with directory expansion and size guard
- budget.Sections: new DesignDocs field, trimmed after conventions
- budget.buildResult: injects DesignDocs under ## Design Documents heading
- action.yml: doc-map and doc-map-max-bytes inputs wired to env vars
- CHANGELOG.md: created with unreleased entry
- Tests: ParseDocMapConfig, MatchDocs, globMatch, LoadMatchingDocs
Python's ipaddress module does NOT classify 100.64.0.0/10 (RFC6598
carrier-grade NAT) as private/loopback/link_local/multicast/reserved.
This means a SERVER_URL resolving to a CGN address would bypass the
Python SSRF check and reach curl with ACTION_TOKEN.
Add an explicit network membership check for 100.64.0.0/10 to both
Python validation blocks in action.yml:
- _ssrf_check.py (VCS URL pre-flight check)
- _ssrf_check_install.py (binary download URL check)
The Go-level IsBlockedIP already covers this range correctly (ipcheck.go);
this fix closes the gap in the action.yml Python layer.
Also update comments to mention RFC6598 explicitly.
- Clone http.DefaultTransport instead of bare &http.Transport{} to preserve
ProxyFromEnvironment, TLSHandshakeTimeout, IdleConnTimeout, connection
pooling, and HTTP/2 support (fixes transport regression).
- Add IPv6-mapped IPv4 normalization in action.yml Python SSRF checks to
prevent bypass via ::ffff:10.0.0.1 style AAAA records.
- Reject URLs with user-info (user:pass@host) in action.yml Python checks
to match validate-url subcommand behavior.
- Add test verifying DefaultTransport settings are preserved.
MAJOR fixes:
- gitea/ipcheck.go: replace startup panic with init()+error list pattern
Hard-coded CIDRs that fail to parse now recorded in blockedCIDRParseErrors
instead of panicking. TestBlockedCIDRsValid catches programming errors
in CI without violating CONVENTIONS.md 'never panic' rule.
- .gitea/actions/review/action.yml: re-validate SERVER_URL at start of
'Install review-bot' step to close DNS rebinding window between
'Determine version' and install-step curl calls.
MINOR fixes:
- gitea/client.go: add Timeout: 10*time.Second to net.Dialer per PLAN.md spec
- cmd/review-bot/validateurl.go: switch isValidateError to errors.As so
wrapped *validateError values are also detected
- gitea/ipcheck_test.go: clarify 198.51.100.1 (RFC5737 TEST-NET-2) comment;
add TestBlockedCIDRsValid to surface CIDR parse errors as test failures
NIT fixes:
- .gitea/actions/review/action.yml: refactor Python list comprehension in
SSRF check to for-loop (avoids side-effect-only comprehension, runner compat)
- gitea/export_test.go: expand comment explaining white-box test pattern
(why package gitea not gitea_test, Go stdlib precedent)
Remove PLAN.md (implementation complete)
## Changes
### Go: IP-level SSRF protection in gitea.Client (primary defense)
- Add gitea/ipcheck.go with IsBlockedIP() covering all blocked CIDR ranges:
loopback (127.0.0.0/8, ::1), RFC1918 (10/8, 172.16/12, 192.168/16),
link-local (169.254/16, fe80::/10), ULA (fc00::/7), CGN (100.64/10),
multicast, reserved, and unspecified ranges.
- IPv6-mapped IPv4 addresses (::ffff:x.x.x.x) are normalized before checking.
- Add safeDialContext to gitea.Client: resolves DNS, rejects any IP in a
blocked CIDR, then dials the resolved IP directly to narrow the DNS rebinding
window. NewClient now uses this safe transport by default.
- Add WithUnsafeDialer() for test code using httptest.Server (127.0.0.1).
- Update NewTestClient helper in export_test.go for all gitea unit tests.
- Update SetHTTPClient(nil) to restore the safe transport (not the plain one).
### Go: validate-url subcommand (defense-in-depth for future bash callers)
- Add 'review-bot validate-url <url>' subcommand: validates https scheme,
no user-info, resolves hostname, rejects any blocked IP.
- Exit 0=safe, 1=blocked, 2=validation error/dns failure.
- Add outWriter/errWriter vars to main.go for testable output capture.
### action.yml: Python3 IP check in 'Determine version' step
- After the https scheme validation, resolve SERVER_URL hostname with
socket.getaddrinfo and reject any result where
ipaddress.ip_address(ip).is_private/is_loopback/is_link_local/etc. is true.
- python3 is required on ubuntu-* runners (noted in existing comments).
- Covers the version-check curl that sends ACTION_TOKEN to SERVER_URL.
- SERVER_URL for install-step curls is covered by the same pre-check.
### Tests
- gitea/ipcheck_test.go: 30+ cases covering all blocked families + public IPs
- gitea/client_test.go: safe transport presence, WithUnsafeDialer, SSRF blocking
- cmd/review-bot/validateurl_test.go: scheme validation, user-info, exit codes
Closes#123
- Detect VCS host type from github.api_url (present on GitHub/GHES, absent on Gitea)
- Add action-repo input: specifies repo hosting review-bot releases, separate from
the reviewed repo. Defaults to github.action_repository, then rodin/review-bot.
- Add action-repo-token input: auth for release downloads (defaults to github.token
on GitHub, reviewer-token on Gitea).
- GitHub/GHES path: use github.api_url for version resolution and REST API asset
download endpoint (required for private repos; web URLs redirect to S3 and don't
support Authorization headers reliably).
- Gitea path: use validated SERVER_URL with direct download (no -L; prevents
Authorization forwarding on potential CDN redirects).
- Security hardening:
- inputs.vcs-url is IGNORED on GitHub/GHES to prevent token exfiltration
- SERVER_URL validated for https scheme and no whitespace on Gitea path
- action-repo validated against owner/repo pattern (prevent path traversal)
- VERSION validated for no slashes/whitespace
- TOKEN validated for no control characters (header injection defense)
- ACTION_TOKEN passed via ::add-mask:: + GITHUB_ENV (not step output, which
can leak in debug logs)
- set -euo pipefail in both script steps
- Multi-arch support: OS/arch detection via uname (linux/darwin, amd64/arm64)
for cache key and binary name — incorporates changes from #124
- Run review step: passes VCS_URL from step output (server_url) instead of
direct input expression
Closes#120
- Add --vcs-url flag as primary (reads VCS_URL env var)
- Keep --gitea-url and GITEA_URL as deprecated fallbacks with warnings
- Update action.yml: rename gitea-url input to vcs-url, pass VCS_URL to binary
- Update ci.yml: use VCS_URL env var in Run review step
- Update integration tests: INTEGRATION_GITEA_URL -> INTEGRATION_VCS_URL
- Update README: --vcs-url / VCS_URL with fallback note in env var table
Backward compat: existing GITEA_URL users get a deprecation warning and
continue to work unchanged until they migrate to VCS_URL.
- sha256sum is not available on macOS; use shasum -a 256 on darwin.
Select based on steps.version.outputs.os which is already computed.
Fixes MAJOR finding from gpt-review-bot (PR #127 review).
- Anchor checksum grep with ^ to avoid matching on partial lines.
Fixes MINOR finding from gpt-review-bot (PR #127 review).
Add native SAP AI Core provider that handles OAuth token management and
deployment discovery automatically. This eliminates the need for the
external LLM proxy when running in SAP environments.
Changes:
- Add AICoreClient with OAuth token caching and deployment URL discovery
- Support both Anthropic and OpenAI models via AI Core deployments
- Update CI to use native AI Core provider
- Update action inputs to accept AI Core credentials
- Update README with AI Core configuration examples
Model names must match AI Core deployment names (e.g. anthropic--claude-4.6-sonnet, gpt-5).
MAJOR fixes:
- Remove external YAML dependency (github.com/goccy/go-yaml)
Per project convention: Go standard library only, zero dependencies.
Convert all persona files from YAML to JSON format.
- Fix TestValidateWorkspacePath error expectation
Go 1.21+ filepath.Join normalizes absolute paths differently.
MINOR fixes:
- Remove custom contains helper in persona_test.go (use strings.Contains)
- Add Unicode-safe CapitalizeFirst function for header titles
- ListBuiltinPersonas returns empty slice instead of nil on error
- Fix test comment about filepath.Join behavior
Documentation:
- Update README to reflect JSON-only persona format
- Update design doc with note about JSON decision
- Fix action.yml description for persona-file input
Add persona system for specialized review roles. Each persona defines:
- A specific review focus (security, architecture, documentation)
- Custom system prompt additions
- Personality/tone adjustments
Built-in personas: security, architect, docs
Custom personas: load from JSON via persona-file flag
Includes workspace validation to prevent path traversal attacks.
Closes#51
Security (MAJOR):
- Add filepath.EvalSymlinks after Clean for system-prompt-file
- Re-validate resolved path is still within workspace
- Prevents symlink → /etc/shadow exfiltration via malicious repo
Worst-wins:
- Check BEFORE posting (not after) — no delete+repost dance
- Identify sibling bots by <!-- review-bot: prefix in body
- Only escalates for bot reviews, not human REQUEST_CHANGES
- If sibling bot has REQUEST_CHANGES and we would APPROVE → post
REQUEST_CHANGES instead
Addresses security review finding #1 (MAJOR) and sonnet finding #1.
Sentinel-based cleanup:
- Reviews embed <!-- review-bot:NAME --> in body (hidden HTML comment)
- Cleanup matches by sentinel, not token identity
- Each reviewer-name is a logical identity (sonnet, gpt, security)
- Same token can run multiple review types without conflict
- No extra API scopes needed
System prompt file (--system-prompt-file / SYSTEM_PROMPT_FILE):
- Loads a local file with additional review instructions
- Appended to system base as "Additional Review Instructions"
- Enables specialized reviews (security, performance, etc.)
- Partially addresses #5
Security review:
- SECURITY_REVIEW.md prompt focused on vulnerabilities
- 3rd CI matrix entry using same token, different prompt
- Focus: injection, auth, secrets, input validation, crypto, races
CI changes:
- REVIEWER_NAME passed from matrix.name
- SYSTEM_PROMPT_FILE passed from matrix (empty for standard reviews)
- 3 reviewers: sonnet (general), gpt (general), security (focused)
- PostReview now returns *Review (id + user login from response)
- Delete flow: post first, then delete stale reviews by same user
- No read:user scope needed (identity from POST response)
- Removed GetAuthenticatedUser (requires scope we lack)
- ListReviews: full pagination (loops until partial page)
- envOrDefaultBool: case-insensitive, whitespace-trimmed
- action.yml: document accepted boolean values
- Tests updated for new PostReview signature
Before posting a review, the bot now:
1. Calls GET /api/v1/user to identify its own login
2. Lists all reviews on the PR
3. Deletes any existing reviews from itself
4. Posts the fresh review
This keeps PR threads clean — one review per bot at any time.
New Gitea client methods:
- GetAuthenticatedUser() — token self-identification
- ListReviews() — fetch reviews on a PR
- DeleteReview() — delete a review by ID
Flag: --update-existing / UPDATE_EXISTING (default true)
Set to false to preserve old behavior (stack reviews).
All delete failures are non-fatal (logged as warnings).
Closes#6
New flag: --llm-timeout / LLM_TIMEOUT (seconds, default 300)
New builder: llmClient.WithTimeout(duration)
Composite action: new timeout input
Keeps 5 minutes as the sensible default but allows tuning for
larger repos or slower models.
Major improvements to review quality:
1. Full file context: fetch complete content of all modified files from
the PR branch and include as reference. This eliminates false-positive
"missing import" findings since the model sees the entire file.
2. Patterns repo: new --patterns-repo / PATTERNS_REPO flag fetches
language idiom files from a separate Gitea repo (e.g. rodin/elixir-patterns)
and includes them as review criteria.
3. Multi-file patterns: --patterns-files / PATTERNS_FILES accepts
comma-separated file paths to fetch from the patterns repo.
New API methods:
- GetPullRequestFiles: list changed files in a PR
- GetFileContentRef: fetch file content from a specific branch/ref
Prompt changes:
- BuildSystemPrompt now accepts (conventions, patterns)
- BuildUserPrompt now accepts fileContext parameter
- File context displayed before diff for model reference
- Patterns presented as "review criteria" in system prompt
Composite action updated with patterns-repo and patterns-files inputs.
- Add temperature range validation (must be 0-2, fatal on invalid)
- release.yml: use python3 for robust JSON parsing instead of sed
- Composite action: add header comment confirming Gitea Actions compat
- All findings from review #385 addressed
- Composite action: cache to runner.temp instead of /usr/local/bin
(avoids permission issues on runners)
- Document that temperature=0 means server default (omitted from request)
- Note: strconv import already exists (false positive from GPT-5)
- .gitea/actions/review/action.yml: composite action with caching
Consumers just use:
uses: https://gitea.weiker.me/rodin/review-bot/.gitea/actions/review@v0.1.0
No Go toolchain needed, binary cached by version tag.
- Remove install.sh (replaced by composite action)
- CI workflow: use matrix strategy to parallelize reviews
- Self-review still builds from source (pre-release)