- Clone http.DefaultTransport instead of bare &http.Transport{} to preserve
ProxyFromEnvironment, TLSHandshakeTimeout, IdleConnTimeout, connection
pooling, and HTTP/2 support (fixes transport regression).
- Add IPv6-mapped IPv4 normalization in action.yml Python SSRF checks to
prevent bypass via ::ffff:10.0.0.1 style AAAA records.
- Reject URLs with user-info (user:pass@host) in action.yml Python checks
to match validate-url subcommand behavior.
- Add test verifying DefaultTransport settings are preserved.
Previously safeDialContext only dialed the first resolved IP. If the
connection failed, it returned an error without trying other IPs.
Now it iterates all validated IPs and returns the first successful
connection, or the last error if all fail. This matches the resilience
behavior of a plain net.Dialer on multi-IP hostnames.
Addresses review finding: safeDialContext only dials first resolved IP.
All IPs are still validated before any dial attempt is made.
MAJOR fixes:
- gitea/ipcheck.go: replace startup panic with init()+error list pattern
Hard-coded CIDRs that fail to parse now recorded in blockedCIDRParseErrors
instead of panicking. TestBlockedCIDRsValid catches programming errors
in CI without violating CONVENTIONS.md 'never panic' rule.
- .gitea/actions/review/action.yml: re-validate SERVER_URL at start of
'Install review-bot' step to close DNS rebinding window between
'Determine version' and install-step curl calls.
MINOR fixes:
- gitea/client.go: add Timeout: 10*time.Second to net.Dialer per PLAN.md spec
- cmd/review-bot/validateurl.go: switch isValidateError to errors.As so
wrapped *validateError values are also detected
- gitea/ipcheck_test.go: clarify 198.51.100.1 (RFC5737 TEST-NET-2) comment;
add TestBlockedCIDRsValid to surface CIDR parse errors as test failures
NIT fixes:
- .gitea/actions/review/action.yml: refactor Python list comprehension in
SSRF check to for-loop (avoids side-effect-only comprehension, runner compat)
- gitea/export_test.go: expand comment explaining white-box test pattern
(why package gitea not gitea_test, Go stdlib precedent)
Remove PLAN.md (implementation complete)
## Changes
### Go: IP-level SSRF protection in gitea.Client (primary defense)
- Add gitea/ipcheck.go with IsBlockedIP() covering all blocked CIDR ranges:
loopback (127.0.0.0/8, ::1), RFC1918 (10/8, 172.16/12, 192.168/16),
link-local (169.254/16, fe80::/10), ULA (fc00::/7), CGN (100.64/10),
multicast, reserved, and unspecified ranges.
- IPv6-mapped IPv4 addresses (::ffff:x.x.x.x) are normalized before checking.
- Add safeDialContext to gitea.Client: resolves DNS, rejects any IP in a
blocked CIDR, then dials the resolved IP directly to narrow the DNS rebinding
window. NewClient now uses this safe transport by default.
- Add WithUnsafeDialer() for test code using httptest.Server (127.0.0.1).
- Update NewTestClient helper in export_test.go for all gitea unit tests.
- Update SetHTTPClient(nil) to restore the safe transport (not the plain one).
### Go: validate-url subcommand (defense-in-depth for future bash callers)
- Add 'review-bot validate-url <url>' subcommand: validates https scheme,
no user-info, resolves hostname, rejects any blocked IP.
- Exit 0=safe, 1=blocked, 2=validation error/dns failure.
- Add outWriter/errWriter vars to main.go for testable output capture.
### action.yml: Python3 IP check in 'Determine version' step
- After the https scheme validation, resolve SERVER_URL hostname with
socket.getaddrinfo and reject any result where
ipaddress.ip_address(ip).is_private/is_loopback/is_link_local/etc. is true.
- python3 is required on ubuntu-* runners (noted in existing comments).
- Covers the version-check curl that sends ACTION_TOKEN to SERVER_URL.
- SERVER_URL for install-step curls is covered by the same pre-check.
### Tests
- gitea/ipcheck_test.go: 30+ cases covering all blocked families + public IPs
- gitea/client_test.go: safe transport presence, WithUnsafeDialer, SSRF blocking
- cmd/review-bot/validateurl_test.go: scheme validation, user-info, exit codes
Closes#123
This PR addresses issue #120 by adding GitHub/GHES support to the composite
action. Key improvements:
- Detects VCS host type via github.api_url presence (GitHub/GHES vs Gitea)
- Uses correct API paths: /api/v3 for GitHub, /api/v1 for Gitea
- Prevents token exfiltration by ignoring inputs.vcs-url on GitHub runners
- Validates inputs (action-repo format, URL scheme, tokens)
- Adds multi-arch support (uname-derived binary/OS detection)
- Reuses computed values across steps via GITHUB_OUTPUT
- Properly handles private release assets on GitHub via REST API
- Backward compatible with existing Gitea workflows
Reviewed and approved by sonnet-review, gpt-review, and security-review.
- Detect VCS host type from github.api_url (present on GitHub/GHES, absent on Gitea)
- Add action-repo input: specifies repo hosting review-bot releases, separate from
the reviewed repo. Defaults to github.action_repository, then rodin/review-bot.
- Add action-repo-token input: auth for release downloads (defaults to github.token
on GitHub, reviewer-token on Gitea).
- GitHub/GHES path: use github.api_url for version resolution and REST API asset
download endpoint (required for private repos; web URLs redirect to S3 and don't
support Authorization headers reliably).
- Gitea path: use validated SERVER_URL with direct download (no -L; prevents
Authorization forwarding on potential CDN redirects).
- Security hardening:
- inputs.vcs-url is IGNORED on GitHub/GHES to prevent token exfiltration
- SERVER_URL validated for https scheme and no whitespace on Gitea path
- action-repo validated against owner/repo pattern (prevent path traversal)
- VERSION validated for no slashes/whitespace
- TOKEN validated for no control characters (header injection defense)
- ACTION_TOKEN passed via ::add-mask:: + GITHUB_ENV (not step output, which
can leak in debug logs)
- set -euo pipefail in both script steps
- Multi-arch support: OS/arch detection via uname (linux/darwin, amd64/arm64)
for cache key and binary name — incorporates changes from #124
- Run review step: passes VCS_URL from step output (server_url) instead of
direct input expression
Closes#120
- Add --vcs-url flag as primary (reads VCS_URL env var)
- Keep --gitea-url and GITEA_URL as deprecated fallbacks with warnings
- Update action.yml: rename gitea-url input to vcs-url, pass VCS_URL to binary
- Update ci.yml: use VCS_URL env var in Run review step
- Update integration tests: INTEGRATION_GITEA_URL -> INTEGRATION_VCS_URL
- Update README: --vcs-url / VCS_URL with fallback note in env var table
Backward compat: existing GITEA_URL users get a deprecation warning and
continue to work unchanged until they migrate to VCS_URL.
- sha256sum is not available on macOS; use shasum -a 256 on darwin.
Select based on steps.version.outputs.os which is already computed.
Fixes MAJOR finding from gpt-review-bot (PR #127 review).
- Anchor checksum grep with ^ to avoid matching on partial lines.
Fixes MINOR finding from gpt-review-bot (PR #127 review).
- Fix AllowInsecureHTTP doc comment: say '_test.go file in the same
package' instead of 'export_test.go' (MINOR #1)
- Remove dead u.Fragment = "" from redactURL: HTTP requests never
carry fragments over the wire per RFC 7230 §5.1 (MINOR #2)
- Use 127.0.0.1:1 in scheme-rejection tests to make intent clearer
that no network call should occur (NIT #3)
- Restore timer.Stop() no-op in case <-timer.C for symmetry with ctx.Done
- Add missing blank line between TestNoInsecureOption_RejectsHTTP and
TestNoInsecureOption_RejectsUppercaseHTTP
- Remove double blank line before TestAllowInsecureHTTP_WithoutEnvVar_Rejected
Resolves review comments from sonnet-review-bot on PR #113.
Address review feedback on PR #113:
- MAJOR (both reviews): Replace strings.HasPrefix(reqURL, "http://")
with url.Parse + strings.EqualFold for case-insensitive scheme
comparison per RFC 3986. Prevents bypass via HTTP:// or Http://.
- MINOR (security): Enhance redactURL to strip userinfo component
(user:pass@host) in addition to query params, preventing credential
leakage in error messages and logs.
- NIT (gpt): Remove redundant timer.Stop() after timer.C fires —
it's a no-op and the comment was misleading.
- Add tests for uppercase/mixed-case HTTP scheme rejection and
userinfo redaction.
Three-layer defense for the AllowInsecureHTTP client option:
1. Environment gate: AllowInsecureHTTP() requires REVIEW_BOT_ALLOW_INSECURE=1
env var. Without it, the option is silently ignored with a slog.Warn.
2. Warning log on activation: When the env gate IS satisfied, a slog.Warn
fires at client construction time so operators notice in production logs.
3. Test bypass: AllowInsecureHTTPForTest() skips the env gate entirely,
keeping test code clean (no t.Setenv needed in every test).
Additionally, doRequest now rejects HTTP URLs unless allowInsecureHTTP is set
on the client, providing defense-in-depth against credential leakage.
Closes#96
Add commitID parameter to gitea.Client.PostReview so the review is
anchored to the specific commit that was evaluated. The caller
(cmd/review-bot) already computes evaluatedSHA from pr.Head.Sha;
this wires it through to the Gitea API payload.
When commitID is empty, omitempty drops it from the JSON and Gitea
defaults to the current PR head (backward-compatible).
Closes#107
- Replace Unicode arrows (→) with ASCII (->) in error messages and
comments for log compatibility (gpt-review NITs #19626, #19628)
- Improve guard comment to clarify it exists for testability, not
runtime safety (sonnet-review NIT #19619)
- Add cross-reference comments noting intentional duplication between
gitea/client.go and github/client.go (sonnet-review #19618,
gpt-review #19625, #19627)
Pushed back on:
- internal/ package for dedup: structural overhead not warranted for
a single ~25-line function
- strings.EqualFold for scheme: Go's url.Parse normalizes schemes to
lowercase, making case-insensitive comparison unnecessary
Add defaultCheckRedirect to both GitHub and Gitea clients that rejects:
- HTTPS→HTTP protocol downgrades (prevents plaintext leakage)
- Cross-host redirects entirely (prevents consuming untrusted responses)
Same-host, same-or-upgraded-scheme redirects remain allowed.
Both NewClient constructors wire the policy, and SetHTTPClient(nil)
restores it. Callers providing a non-nil client are responsible for
configuring their own safe redirect policy.
Closes#95
- Extract doGetWithReader to share retry/backoff logic between doGet and
doGetLimited, eliminating ~60 lines of duplicated code (addresses MINOR
finding from all reviewers).
- redactURL now strips userinfo credentials (user:pass@host) in addition
to query parameters (addresses security-review-bot finding).
- GetPullRequestDiff treats MaxDiffSize == math.MaxInt64 as disabled,
preventing the silent enforcement bypass where the overflow clamp makes
the size check unreachable (addresses security-review-bot finding).
- Improved error message wording: 'response exceeds N bytes' (NIT fix).
- Add concurrency safety note to MaxDiffSize field documentation,
mirroring the existing note on RetryBackoff
- Consolidate six individual test functions into a single table-driven
test (TestGetPullRequestDiff_SizeLimits) reducing repetition
- Add //nolint:errcheck annotation to test handler w.Write calls
- Clamp maxBytes+1 to prevent integer overflow to negative when
maxBytes == math.MaxInt64 (falls back to math.MaxInt64)
- Update MaxDiffSize doc: 'any negative value' disables the limit,
matching actual behavior of 'maxSize < 0' check
Add a configurable MaxDiffSize field to Client that limits how much
data GetPullRequestDiff will read into memory. The default is 10 MB
(DefaultMaxDiffSize). When the diff exceeds the limit, ErrDiffTooLarge
is returned, allowing callers to skip position translation gracefully.
Implementation uses io.LimitReader to read maxBytes+1, detecting
overflow without buffering the entire response. Setting MaxDiffSize
to -1 disables the limit entirely.
Closes#92
- Add timer.Stop() to the timer.C branch to prevent timer leak on the
normal path (previously only called in ctx.Done branch)
- Rename struct field 'http' to 'httpClient' to avoid shadowing the
net/http import
Addresses self-review findings on PR #110.
- Add TODO comment on unused baseURL field explaining its intended
future use by higher-level exported methods
- Rewrite TestDoRequest_ContextCanceled to actually exercise the
timer-cancel path in the retry select (goroutine cancels context
while client is blocked in timer.C select)
- Change Retry-After: 1 to Retry-After: 0 in IntegerSeconds test
to avoid unnecessary 1s sleep during test runs
- Add doc note on SetRetryBackoff explaining that an empty non-nil
slice silently drops Retry-After delays
- Fix data race: copy retryBackoff slice per-request to prevent
concurrent doRequest calls from racing on shared state
- Fix parseRetryAfter: trim whitespace before parsing for robustness
- Fix parseRetryAfter: treat delta-seconds of 0 as valid per RFC 7231
- Add bounded read on success path (10 MB limit) for defense-in-depth
- Clean up TestDoRequest_429_RetryAfter_IntegerSeconds: remove dead
code block and commented-out reasoning
- Fix import ordering: context before errors (goimports compliance)
Implement the github package client with Retry-After header parsing that
supports both integer seconds (e.g. "Retry-After: 120") and HTTP-date
format (e.g. "Retry-After: Thu, 01 Dec 2025 16:00:00 GMT") per RFC 7231
§7.1.3.
Key design decisions:
- Use http.ParseTime which handles RFC 1123, RFC 850, and ASCTIME formats
- Cap maximum retry delay at 60s (maxRetryAfter) to prevent stalling
- If HTTP-date is in the past, use delay of 0 (retry immediately)
- Inject time.Now via c.now field for deterministic testing
Closes#94
Update the approved dependency table to document go-yaml subpackage
usage (ast, parser) and remove the deviation comment now that the
proper allowlist process is being followed.
Closes#91
- Clarify depth-aware short-circuit comment to unambiguously describe
the relationship between current depth and previous validation depth
- Add comment to MappingValueNode case explaining intentional depth+2
behavior from parent MappingNode perspective
- Restructure unmarshalYAMLWithDepthLimit doc comment as bullet list
covering all three safety checks (depth, multi-doc, strict fields)
- Replace t.Error with t.Fatal in TestYAMLEmptyFileRejection to remove
redundant nil guard on subsequent err.Error() call
- Add explicit case for *ast.MergeKeyNode in checkYAMLDepth switch to
make it clear this is an intentional leaf (no children to recurse)
rather than relying on the default case. Prevents future library
changes from silently bypassing depth checks.
- Add MaxPersonaFileSize bound check at the top of ParsePersonaBytes.
While callers already check size, the public API should defend itself
(defense in depth) against arbitrarily large inputs that could cause
excessive memory/CPU before AST validation runs.
- Add tests for both behaviors.
Addresses review #2879 findings.
- Add safety note on Strict() decoder not expanding aliases recursively,
since alias resolution uses the pre-validated AST (finding #1)
- Document that ast.Node map keys rely on pointer identity, which holds
because all goccy/go-yaml AST types are pointer receivers (finding #2)
- Clarify AnchorNode comment: effective depth budget is reduced for
anchor+alias pairs, not literally halved (finding #3)
- Improve test depth trace comment for accuracy (finding #4)
- Add HTML comment in CONVENTIONS.md referencing #91 for the two-step
process deviation (finding #5)
- Fix validated map comment: says 'minimum depth' but stores the maximum
depth at which a node was validated (overwritten on deeper visits).
- Replace dec.More() with explicit dec.Decode check for trailing JSON
content. More() is documented for use inside arrays/objects; the
explicit EOF check is clearer at the top-level stream.
MINOR fixes:
- docs/DESIGN-57-yaml-persona.md: fix Error Cases table entry to reflect
custom AST walk (checkYAMLDepth) instead of stale library-level reference
- review/persona.go: add EOF check after JSON decode to reject trailing
garbage after a valid JSON object (prevents silent acceptance of malformed
input like '{"name":"x"}garbage')
- review/persona_test.go: add TestJSONTrailingContentRejected test
NIT fixes:
- review/persona.go: add default case to checkYAMLDepth switch with
explanatory comment about scalar leaf nodes
- review/persona.go: document AnchorNode depth+1 conservative asymmetry
- review/persona.go: simplify redundant if-guard in ListBuiltinPersonas
- Move nodeCount increment after cycle detection to avoid over-counting
cyclic references (sonnet #2)
- Use underscores in test case names used as filenames (sonnet #3)
- Fix function comment: 'prevent silent data loss' → 'prevent confusing
behavior where additional documents are silently ignored' (sonnet #4)
- Mark design doc pseudocode as historical since implementation uses
goccy/go-yaml ast.Node, not gopkg.in/yaml.v3 yaml.Node (sonnet #5)
The global 'seen' set allowed anchored subtrees validated at a shallow
depth to be skipped when later referenced via alias at a greater depth.
This could let effective nesting exceed MaxYAMLDepth, enabling DoS.
Fix: replace the single 'seen' set with two tracking maps:
- validated (node -> min depth): only short-circuits when current depth
<= previously validated depth; re-checks at deeper contexts.
- visiting (node -> bool): per-path recursion stack for true cycle
detection (breaks alias loops without suppressing depth checks).
Add TestYAMLAliasDepthBypass that constructs a document with an
anchored 15-level subtree referenced via alias under 6 levels of
nesting, verifying the combined effective depth (22) is rejected.
Addresses security-review-bot findings on review #2774.