validateurl.go is VCS-generic but imported gitea.IsBlockedIP, creating an
unexpected generic→Gitea-specific dependency. Extract IsBlockedIP and its
CIDR list to internal/netutil/ipcheck.go (a neutral shared package).
- gitea/ipcheck.go becomes a thin forwarding wrapper (preserves API compat
for callers within the gitea package)
- gitea/ipcheck_test.go replaced with a forwarding smoke test; full coverage
moves to internal/netutil/ipcheck_test.go
- validateurl.go now imports internal/netutil directly
gitea: Add 4 tests for GetTimelineReviewCommentIDForReview (was 0% coverage):
- Success: find review in timeline by user login + body prefix match
- ReviewFetchError: 404 on review API
- EmptyBody: review with empty body returns error
- NotFoundInTimeline: body matches but user login doesn't
github: Add 3 tests for GetAllFilesInPath (was 0% coverage):
- DirectoryWithFiles: lists directory, fetches base64-encoded file content
- 404FallsBackToFile: 404 on dir path returns error when file also 404s
- DirectoryWithSubdir: recursive directory traversal
Coverage changes:
- gitea: 80.0% → 85.2%
- github: 79.9% → 86.3%
- Clone http.DefaultTransport instead of bare &http.Transport{} to preserve
ProxyFromEnvironment, TLSHandshakeTimeout, IdleConnTimeout, connection
pooling, and HTTP/2 support (fixes transport regression).
- Add IPv6-mapped IPv4 normalization in action.yml Python SSRF checks to
prevent bypass via ::ffff:10.0.0.1 style AAAA records.
- Reject URLs with user-info (user:pass@host) in action.yml Python checks
to match validate-url subcommand behavior.
- Add test verifying DefaultTransport settings are preserved.
Previously safeDialContext only dialed the first resolved IP. If the
connection failed, it returned an error without trying other IPs.
Now it iterates all validated IPs and returns the first successful
connection, or the last error if all fail. This matches the resilience
behavior of a plain net.Dialer on multi-IP hostnames.
Addresses review finding: safeDialContext only dials first resolved IP.
All IPs are still validated before any dial attempt is made.
MAJOR fixes:
- gitea/ipcheck.go: replace startup panic with init()+error list pattern
Hard-coded CIDRs that fail to parse now recorded in blockedCIDRParseErrors
instead of panicking. TestBlockedCIDRsValid catches programming errors
in CI without violating CONVENTIONS.md 'never panic' rule.
- .gitea/actions/review/action.yml: re-validate SERVER_URL at start of
'Install review-bot' step to close DNS rebinding window between
'Determine version' and install-step curl calls.
MINOR fixes:
- gitea/client.go: add Timeout: 10*time.Second to net.Dialer per PLAN.md spec
- cmd/review-bot/validateurl.go: switch isValidateError to errors.As so
wrapped *validateError values are also detected
- gitea/ipcheck_test.go: clarify 198.51.100.1 (RFC5737 TEST-NET-2) comment;
add TestBlockedCIDRsValid to surface CIDR parse errors as test failures
NIT fixes:
- .gitea/actions/review/action.yml: refactor Python list comprehension in
SSRF check to for-loop (avoids side-effect-only comprehension, runner compat)
- gitea/export_test.go: expand comment explaining white-box test pattern
(why package gitea not gitea_test, Go stdlib precedent)
Remove PLAN.md (implementation complete)
## Changes
### Go: IP-level SSRF protection in gitea.Client (primary defense)
- Add gitea/ipcheck.go with IsBlockedIP() covering all blocked CIDR ranges:
loopback (127.0.0.0/8, ::1), RFC1918 (10/8, 172.16/12, 192.168/16),
link-local (169.254/16, fe80::/10), ULA (fc00::/7), CGN (100.64/10),
multicast, reserved, and unspecified ranges.
- IPv6-mapped IPv4 addresses (::ffff:x.x.x.x) are normalized before checking.
- Add safeDialContext to gitea.Client: resolves DNS, rejects any IP in a
blocked CIDR, then dials the resolved IP directly to narrow the DNS rebinding
window. NewClient now uses this safe transport by default.
- Add WithUnsafeDialer() for test code using httptest.Server (127.0.0.1).
- Update NewTestClient helper in export_test.go for all gitea unit tests.
- Update SetHTTPClient(nil) to restore the safe transport (not the plain one).
### Go: validate-url subcommand (defense-in-depth for future bash callers)
- Add 'review-bot validate-url <url>' subcommand: validates https scheme,
no user-info, resolves hostname, rejects any blocked IP.
- Exit 0=safe, 1=blocked, 2=validation error/dns failure.
- Add outWriter/errWriter vars to main.go for testable output capture.
### action.yml: Python3 IP check in 'Determine version' step
- After the https scheme validation, resolve SERVER_URL hostname with
socket.getaddrinfo and reject any result where
ipaddress.ip_address(ip).is_private/is_loopback/is_link_local/etc. is true.
- python3 is required on ubuntu-* runners (noted in existing comments).
- Covers the version-check curl that sends ACTION_TOKEN to SERVER_URL.
- SERVER_URL for install-step curls is covered by the same pre-check.
### Tests
- gitea/ipcheck_test.go: 30+ cases covering all blocked families + public IPs
- gitea/client_test.go: safe transport presence, WithUnsafeDialer, SSRF blocking
- cmd/review-bot/validateurl_test.go: scheme validation, user-info, exit codes
Closes#123
Add commitID parameter to gitea.Client.PostReview so the review is
anchored to the specific commit that was evaluated. The caller
(cmd/review-bot) already computes evaluatedSHA from pr.Head.Sha;
this wires it through to the Gitea API payload.
When commitID is empty, omitempty drops it from the JSON and Gitea
defaults to the current PR head (backward-compatible).
Closes#107
- Replace Unicode arrows (→) with ASCII (->) in error messages and
comments for log compatibility (gpt-review NITs #19626, #19628)
- Improve guard comment to clarify it exists for testability, not
runtime safety (sonnet-review NIT #19619)
- Add cross-reference comments noting intentional duplication between
gitea/client.go and github/client.go (sonnet-review #19618,
gpt-review #19625, #19627)
Pushed back on:
- internal/ package for dedup: structural overhead not warranted for
a single ~25-line function
- strings.EqualFold for scheme: Go's url.Parse normalizes schemes to
lowercase, making case-insensitive comparison unnecessary
Add defaultCheckRedirect to both GitHub and Gitea clients that rejects:
- HTTPS→HTTP protocol downgrades (prevents plaintext leakage)
- Cross-host redirects entirely (prevents consuming untrusted responses)
Same-host, same-or-upgraded-scheme redirects remain allowed.
Both NewClient constructors wire the policy, and SetHTTPClient(nil)
restores it. Callers providing a non-nil client are responsible for
configuring their own safe redirect policy.
Closes#95
- Extract doGetWithReader to share retry/backoff logic between doGet and
doGetLimited, eliminating ~60 lines of duplicated code (addresses MINOR
finding from all reviewers).
- redactURL now strips userinfo credentials (user:pass@host) in addition
to query parameters (addresses security-review-bot finding).
- GetPullRequestDiff treats MaxDiffSize == math.MaxInt64 as disabled,
preventing the silent enforcement bypass where the overflow clamp makes
the size check unreachable (addresses security-review-bot finding).
- Improved error message wording: 'response exceeds N bytes' (NIT fix).
- Add concurrency safety note to MaxDiffSize field documentation,
mirroring the existing note on RetryBackoff
- Consolidate six individual test functions into a single table-driven
test (TestGetPullRequestDiff_SizeLimits) reducing repetition
- Add //nolint:errcheck annotation to test handler w.Write calls
- Clamp maxBytes+1 to prevent integer overflow to negative when
maxBytes == math.MaxInt64 (falls back to math.MaxInt64)
- Update MaxDiffSize doc: 'any negative value' disables the limit,
matching actual behavior of 'maxSize < 0' check
Add a configurable MaxDiffSize field to Client that limits how much
data GetPullRequestDiff will read into memory. The default is 10 MB
(DefaultMaxDiffSize). When the diff exceeds the limit, ErrDiffTooLarge
is returned, allowing callers to skip position translation gracefully.
Implementation uses io.LimitReader to read maxBytes+1, detecting
overflow without buffering the entire response. Setting MaxDiffSize
to -1 disables the limit entirely.
Closes#92
Add defensive check for empty Name and Path fields when unmarshaling
a single ContentEntry in the fallback path. While Gitea API won't
return empty objects for valid file paths, this guard:
- Explicitly documents the invariant we expect
- Catches potential API behavior changes early
- Costs nothing at runtime
Addresses [MINOR] from sonnet-review-bot on PR #74.
When ListContents is called with a path that points to a file (not a
directory), Gitea returns a single JSON object instead of an array.
Previously this caused json.Unmarshal to fail with:
json: cannot unmarshal object into Go value of type []gitea.ContentEntry
Now ListContents tries array unmarshal first, and falls back to single
object unmarshal, wrapping it in a slice. This allows patterns-files
config to specify individual files like 'README.md' without triggering
a parse error.
Also updates TestGetAllFilesInPath_File to reflect actual Gitea behavior
(single object response, not 404).
Fixes#73
Gitea API rejects "." with HTTP 500 (malformed path component).
When patterns-files is set to ".", normalize it to empty string
before making the API call.
Fixes#70
Move lastErr assignment outside the retry condition so that both
network errors and HTTP 5xx paths return lastErr consistently.
Previously, on the final retry attempt, a network error would return
the raw err variable instead of lastErr. While they held the same
value in practice, the inconsistency was confusing when reading the
code.
Now both paths:
- Network errors: assign lastErr before checking retry, return lastErr
- HTTP 5xx: assign lastErr before checking retry, return lastErr
Addresses review finding #3 (MINOR) from sonnet review on PR #69.
1. Fix non-deterministic test TestDoGet_RetriesOnTemporaryNetError:
- Replace timing-dependent listener approach with mockTransport
- mockTransport allows controlled injection of net.OpError failures
- Test now makes deterministic assertions: exactly 3 attempts (2 fail + 1 success)
- Added SetHTTPClient() method for test transport injection
2. Sanitize error content in retry warning logs:
- Added sanitizeErrorForLog() helper that omits response body content
- For APIError: logs only 'HTTP <status>' instead of full body
- For other errors: preserves error type information
- Addresses security concern about logging server error content at WARN level
- Full error with body still returned to caller for proper error handling
Both changes have corresponding test coverage.
Addresses security review finding: retry warnings were logging the full
request URL which could inadvertently leak sensitive query parameters
if future callers pass them.
Added redactURL() helper that:
- Strips query parameters from URLs before logging (replaces with [redacted])
- Returns [invalid URL] for unparseable URLs to avoid leaking any data
- Preserves the base path for debugging context
The error itself (lastErr) is kept as-is since APIError.Error() already
truncates response bodies to 200 chars, and network errors don't contain
user-controlled data.
Address review feedback on isTemporaryNetError being too broad:
1. RetryBackoff field: Added doc comment clarifying it must be
configured before the first request (addresses concurrency concern).
2. isTemporaryNetError: Now inspects the underlying syscall error
instead of treating all net.OpError as retriable. Only retries on:
- ECONNREFUSED (connection refused)
- ECONNRESET (connection reset)
- ENETUNREACH (network unreachable)
- EHOSTUNREACH (host unreachable)
- ETIMEDOUT (connection timed out)
Permanent errors like EACCES, EPERM are no longer retried.
3. DNS errors: Changed from Temporary() to IsTimeout, since
"no such host" is permanent and shouldn't be retried.
4. Empty backoff slice: Added comment explaining that retry without
delay is intentional when caller explicitly configures it.
Addresses MINOR findings from sonnet-review-bot and gpt-review-bot.
Address review feedback:
1. Make backoff delays injectable via Client.RetryBackoff field
- Defaults to {1s, 2s} when nil for production
- Tests can set shorter values for fast execution
- Fixes slow unit tests that previously waited 3+ seconds
2. Add retry on temporary network errors (net.OpError, net.DNSError)
- Connection refused, network unreachable, DNS failures now retry
- Non-temporary network errors still fail immediately
- Context cancellation still respected during backoff
Added isTemporaryNetError helper and TestIsTemporaryNetError test.
Updated existing retry tests to use configurable short backoffs.
- Remove dead backoff[0] element; array now only contains retry delays
- Fix time.After timer leak by using time.NewTimer with timer.Stop()
- Add io.LimitReader (64KB) for error body reads to bound memory allocation
Addresses feedback from sonnet-review-bot, security-review-bot, and gpt-review-bot.
Previously findOwnReview returned only the single most-recent matching
review, so on PRs with multiple force-pushes only the latest old review
got superseded. The rest accumulated as unsuperseded stale reviews.
Changes:
- Add findAllOwnReviews() to collect all non-superseded matching reviews
- Loop over all old reviews in the supersede phase
- Add GetTimelineReviewCommentIDForReview() to find comment IDs by
review ID (fetches review body, matches in timeline by prefix)
- Each old review gets independently superseded and its inline comments
resolved
The old findOwnReview is kept for backward compat (tested, may be
useful as a utility).
Closes#27
After superseding an old review, resolves all its inline comments via
POST /pulls/comments/{id}/resolve. This clears unresolved conversation
markers from the PR timeline and diff view.
New API methods:
- ListReviewComments: paginated GET /repos/.../pulls/{n}/reviews/{id}/comments
- ResolveComment: POST /repos/.../pulls/comments/{id}/resolve
Behavior:
- Only resolves after successful supersede (gated on supersedeOK)
- Aggregates failures and logs at warn level
- Truncates error bodies to 256 bytes (security)
- Non-fatal: review still posts even if resolution fails
- Accept 204 No Content as success (idempotent operations)
- Truncate error response body to 256 bytes (prevent log leakage)
- Add unit tests for GetAuthenticatedUser and RequestReviewer
Closes#35
Before posting a review, the bot:
1. Discovers its own Gitea login via GET /user
2. Calls POST /requested_reviewers to add itself
This ensures the bot appears in the required-reviewers list without
manual configuration on the repo. The call is idempotent (no-op if
already requested).
Both failures are non-fatal (warn + continue) — the review still posts
even if the self-request fails.
Closes#34
- Remove reviewUnchanged() skip logic — every push gets a fresh review
- Remove edit-in-place (PATCH same body) — always POST new
- Supersede old review: PATCH with struck-through banner + collapsed
original body in <details> for historical reference
- Add commit footer to every review: 'Evaluated against <sha>'
- Remove --update-existing flag (no longer needed)
- Add CommitID field to Review struct
- Add TestBuildSupersededBody tests
- Add --log-format flag (text/json) and --verbosity flag (debug/info/warn/error)
- Replace all log.Printf with slog.Info/Debug/Warn with structured key-value attrs
- Replace all log.Fatalf with slog.Error + os.Exit(1)
- Convert gitea/client.go warnings to slog.Warn
- Add comprehensive tests for logger initialization and level filtering
Closes#23
Partially addresses #32
- URL-encode filename in release upload query param (MINOR)
- Truncate APIError.Body to 200 chars in Error() to avoid leaking
verbose server responses into logs (NIT)
- Add APIError type with StatusCode field so callers can inspect HTTP
status codes from Gitea API responses
- Add IsNotFound helper for ergonomic 404 checks
- GetAllFilesInPath now only falls back to single-file fetch on 404;
all other errors (auth failures, server errors, rate limits) propagate
- Release workflow asset uploads are now idempotent: existing assets
with the same name are deleted before re-upload on workflow re-runs
Closes#8Closes#10
Apply url.PathEscape to owner, repo, and sha path segments in all
methods that were previously interpolating raw values. Methods already
using PathEscape (ListReviews, DeleteReview, GetTimelineReviewCommentID,
EditComment) are unchanged.
This eliminates an inconsistency flagged in PRs #17, #20, and #22 and
prevents potential path-injection bugs for names with special characters.
Closes#24
1. First-run escalation regression (MAJOR): Add post-posting escalation
fallback. After posting APPROVED on first run, check if a sibling
from the same user has REQUEST_CHANGES — if so, mark ours as
superseded and re-post as REQUEST_CHANGES.
2. json.Marshal error handling (MINOR): Return error from EditComment
instead of ignoring it with blank identifier.
3. Redundant condition (NIT): Remove dead assignment in reviewUnchanged
where existingEvent was assigned from r.State then compared to itself.
Replace the delete-and-repost strategy with edit-in-place:
1. No existing review → POST new (first run)
2. Same state, same body → skip entirely (threads preserved)
3. Same state, body changed → PATCH body in place via timeline API
4. State change needed → PATCH old body to "Superseded", POST new
This preserves conversation threads on inline comments. Replies to
findings are never lost. The only time a new review is posted is on
first run or when the state transitions (APPROVED ↔ REQUEST_CHANGES).
New Gitea client methods:
- EditComment: PATCH /repos/{owner}/{repo}/issues/comments/{id}
- GetTimelineReviewCommentID: finds the comment ID for a review body
by scanning the issue timeline for the sentinel
Also simplifies shouldEscalate: removes the login parameter requirement
for pre-posting scenarios (uses findOwnReview to get login from existing
review instead).
Tests: findOwnReview (4 cases), EditComment (2 cases),
GetTimelineReviewCommentID (2 cases), shouldEscalate (8 cases updated).
- Hunk headers without comma ("@@ -1 +1 @@") now parse correctly by
splitting on comma OR space instead of comma only
- Explicit skip for "\ No newline at end of file" lines (was already
safe but now documents intent)
- Tests added for both edge cases (TDD: tests written first, confirmed
failure, then fixed)
Addresses sonnet findings #1 and #2 from PR #26 review.
Findings that reference a file+line within the diff are now posted as
inline comments directly on that line, in addition to appearing in the
summary table. Findings outside the diff range stay in the body only.
Implementation:
- gitea/diff.go: ParseDiffNewLines extracts new-file line numbers from
each hunk in the unified diff
- gitea/client.go: PostReview accepts optional []ReviewComment with
path + new_position + body (omitempty when nil)
- cmd/review-bot/main.go: maps findings → inline comments when the line
exists in the diff, passes them to PostReview
Tests:
- diff parser: multi-hunk, new files, empty diff, boundary lines
- PostReview: with comments, nil comments (omitted from payload)
Sentinel-based cleanup:
- Reviews embed <!-- review-bot:NAME --> in body (hidden HTML comment)
- Cleanup matches by sentinel, not token identity
- Each reviewer-name is a logical identity (sonnet, gpt, security)
- Same token can run multiple review types without conflict
- No extra API scopes needed
System prompt file (--system-prompt-file / SYSTEM_PROMPT_FILE):
- Loads a local file with additional review instructions
- Appended to system base as "Additional Review Instructions"
- Enables specialized reviews (security, performance, etc.)
- Partially addresses #5
Security review:
- SECURITY_REVIEW.md prompt focused on vulnerabilities
- 3rd CI matrix entry using same token, different prompt
- Focus: injection, auth, secrets, input validation, crypto, races
CI changes:
- REVIEWER_NAME passed from matrix.name
- SYSTEM_PROMPT_FILE passed from matrix (empty for standard reviews)
- 3 reviewers: sonnet (general), gpt (general), security (focused)
- PostReview now returns *Review (id + user login from response)
- Delete flow: post first, then delete stale reviews by same user
- No read:user scope needed (identity from POST response)
- Removed GetAuthenticatedUser (requires scope we lack)
- ListReviews: full pagination (loops until partial page)
- envOrDefaultBool: case-insensitive, whitespace-trimmed
- action.yml: document accepted boolean values
- Tests updated for new PostReview signature
Before posting a review, the bot now:
1. Calls GET /api/v1/user to identify its own login
2. Lists all reviews on the PR
3. Deletes any existing reviews from itself
4. Posts the fresh review
This keeps PR threads clean — one review per bot at any time.
New Gitea client methods:
- GetAuthenticatedUser() — token self-identification
- ListReviews() — fetch reviews on a PR
- DeleteReview() — delete a review by ID
Flag: --update-existing / UPDATE_EXISTING (default true)
Set to false to preserve old behavior (stack reviews).
All delete failures are non-fatal (logged as warnings).
Closes#6
- Add --version flag and log version on startup (closes#9)
- URL-escape ref query parameter in GetFileContentRef (closes#7)
- Add go vet to release workflow (closes#13)
Renamed local url variable to reqURL to avoid shadowing net/url package.
- Fix doc comments: WithTimeout and WithTemperature each get their own
- Add TestWithTimeout (verifies short timeout causes request failure)
- Log warning on directory recursion failure in GetAllFilesInPath
- Note: unexported fields is a breaking change, will document in release notes