feat(gitea): add retry logic for 5xx errors #69
Reference in New Issue
Block a user
Delete Branch "issue-68"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Adds retry logic with exponential backoff to
doGet()for handling transient HTTP 500 errors from the Gitea API.Problem
Pattern fetching via Gitea contents API intermittently returns HTTP 500, causing reviewers to run without pattern context.
Changes
doGet(), max 3 attemptsIsServerError()helper to detect 5xx responsesTesting
TestIsServerError: validates 5xx detection across edge casesTestDoGet_RetriesOn500: verifies recovery after transient errorsTestDoGet_FailsAfterMaxRetries: verifies proper failure after exhaustionTestDoGet_NoRetryOn4xx: ensures client errors do not retryTestDoGet_RespectsContextCancellation: validates cancellation during backoffAll existing tests pass.
Closes #68
Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
7279cdd2)Sonnet Review
Summary
The PR correctly adds retry logic with exponential backoff to
doGet()for 5xx errors, with proper context cancellation support, a well-testedIsServerErrorhelper, and clean error propagation for non-retryable status codes. The implementation is idiomatic Go and all CI checks pass.Findings
gitea/client.gobackoffslice is defined inline but only the delay values at indices 1 and 2 are ever used (index 0 is always 0 and theattempt > 0guard skips it). This is fine functionally but slightly misleading —backoff[0]is dead. A minor cleanliness issue, not a bug.gitea/client.goc.http.Do(req)(e.g., connection refused, DNS failure) are returned immediately without retry. Transient network errors are often just as worth retrying as 5xx responses. This is a design choice, but worth noting — the PR description only mentions 5xx so this is intentional, but it limits the usefulness of the retry in some failure modes.gitea/client_test.goTestDoGet_RetriesOn500incurs real 1-second backoff delays (attempt 1→2 waits 1s, attempt 2→3 waits 2s), making the test take ~3 seconds minimum. Consider making the backoff durations configurable on the client (or injectable for tests) to avoid slow tests. The context cancellation test already uses a 50ms sleep which is similarly timing-sensitive.Recommendation
APPROVE — Approve. The implementation is correct, well-tested, and follows the established patterns in the codebase. The two MINOR findings are design trade-offs (not bugs): dead
backoff[0]entry is cosmetic, and skipping retry on network errors is consistent with the stated goal of handling 5xx responses. The NIT about test speed is worth considering for future iteration but doesn't block this PR.Review by sonnet
Evaluated against
7279cdd2@@ -210,24 +216,56 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,return &review, nil}// doGet performs an HTTP GET request with retry on 5xx errors.[MINOR] The
backoffslice is defined inline but only the delay values at indices 1 and 2 are ever used (index 0 is always 0 and theattempt > 0guard skips it). This is fine functionally but slightly misleading —backoff[0]is dead. A minor cleanliness issue, not a bug.@@ -229,0 +239,4 @@req, err := http.NewRequestWithContext(ctx, http.MethodGet, reqURL, nil)if err != nil {return nil, err}[MINOR] Network/transport errors from
c.http.Do(req)(e.g., connection refused, DNS failure) are returned immediately without retry. Transient network errors are often just as worth retrying as 5xx responses. This is a design choice, but worth noting — the PR description only mentions 5xx so this is intentional, but it limits the usefulness of the retry in some failure modes.@@ -746,0 +864,4 @@ctx, cancel := context.WithCancel(context.Background())// Cancel immediately after first attempt would trigger retrygo func() {time.Sleep(50 * time.Millisecond)[NIT]
TestDoGet_RetriesOn500incurs real 1-second backoff delays (attempt 1→2 waits 1s, attempt 2→3 waits 2s), making the test take ~3 seconds minimum. Consider making the backoff durations configurable on the client (or injectable for tests) to avoid slow tests. The context cancellation test already uses a 50ms sleep which is similarly timing-sensitive.Self-review: PASS
No issues found — ready for human review.
Gates: All pass (
make precommitsucceeds)Coherence: Single-purpose change — retry logic for 5xx errors. Every line serves that goal.
Fit:
IsServerErrormirrorsIsNotFoundexactlyllm/client.goconventionsSolution:
doGetis the minimal change covering all API callsCompleteness:
go vetcleanOriginal reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
7279cdd2)Security Review
Summary
The change adds bounded retry with backoff for GET requests, respecting context cancellation and avoiding retries on 4xx. No new security vulnerabilities are introduced, and CI passed. One defense-in-depth improvement is suggested regarding limiting error-body reads.
Findings
gitea/client.goRecommendation
APPROVE — Overall, the retry logic is well-implemented: it retries only on 5xx, uses exponential backoff, respects context cancellation, and avoids leaking secrets in logs. To harden against potential resource exhaustion, cap the number of bytes read from error responses in doGet (e.g., read up to a few KB) before constructing APIError, similar to patterns used elsewhere (e.g., using io.LimitReader). With this minor improvement, the robustness against unexpected server behavior increases without changing functionality.
Review by security
Evaluated against
7279cdd2@@ -225,0 +233,4 @@case <-time.After(backoff[attempt]):case <-ctx.Done():return nil, ctx.Err()}[MINOR] On non-2xx responses, the code reads the entire response body with io.ReadAll and stores it in APIError. This can allow a malicious or misbehaving server to force large allocations. Consider limiting the size for error bodies (e.g., io.LimitReader) since the body is only used for error context.
Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
7279cdd2)Gpt Review
Summary
The retry logic for 5xx responses is implemented correctly and idiomatically, with context-aware backoff and comprehensive tests. CI is passing and the changes improve resilience without altering existing behavior for 2xx/4xx cases.
Findings
gitea/client.gogitea/client.gogitea/client.goRecommendation
APPROVE — The implementation cleanly adds 5xx retries with exponential backoff, respects context cancellation, and includes thorough tests that cover success, max retries, non-retryable 4xx, and cancellation cases. Minor improvements could be made by using a cancellable timer (time.NewTimer) during backoff to avoid unnecessary timers on cancellation, and by reusing the IsServerError helper within doGet for consistency. Overall, the changes are solid and ready to merge.
Review by gpt
Evaluated against
7279cdd2@@ -211,2 +217,4 @@}// doGet performs an HTTP GET request with retry on 5xx errors.// Retries up to 3 times with exponential backoff (1s, 2s delays).[NIT] The backoff slice includes a 0-duration element that is never indexed due to the attempt>0 check. This is harmless but could be simplified by defining delays only for attempts that actually wait (e.g., []time.Duration{1 * time.Second, 2 * time.Second}).
@@ -225,0 +227,4 @@if attempt > 0 {slog.Warn("retrying request after server error","attempt", attempt+1,"url", reqURL,[MINOR] Using time.After in a select with context cancellation can leave a timer to fire after cancellation. Consider using time.NewTimer/backoffTimer := time.NewTimer(d) and deferring backoffTimer.Stop() to avoid unnecessary timers if ctx is canceled before the delay elapses.
@@ -229,0 +257,4 @@return body, nil}lastErr = &APIError{StatusCode: resp.StatusCode, Body: string(body)}[NIT] The retry condition duplicates the 5xx check inline. For consistency and reuse, consider using the new IsServerError helper here (errors.As on APIError or check status) to keep logic centralized.
Review Feedback Addressed
Fixed (commit
23da7ee)Dead backoff[0] element (sonnet-review + gpt-review NIT)
0duration from the backoff slice{1s, 2s}time.After timer leak (gpt-review MINOR)
time.After(delay)withtime.NewTimer(delay)timer.Stop()on context cancellation pathUnbounded io.ReadAll on error body (security-review MINOR)
io.LimitReader(resp.Body, maxErrorBodyBytes)with 64KB limitAcknowledged (no action needed)
Pushed Back
IsServerError()helper operates onerrorvalues usingerrors.As()resp.StatusCode, not an errorstatusCode >= 500 && statusCode < 600checkOriginal reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
23da7eed)Sonnet Review
Summary
The retry logic is well-implemented and correct. The code is clean, idiomatic Go, handles edge cases properly (context cancellation, body size limits, 4xx pass-through), and is thoroughly tested. CI passes.
Findings
gitea/client.go}(end ofPostReview) and the// doGetcomment. The original had a blank line there; the diff accidentally removed it. Minor style issue, butgofmtdoesn't enforce blank lines between methods so it won't fail CI.gitea/client.goc.http.Do(req)are returned immediately without retry. Transient network errors (e.g., connection reset, EOF) are just as likely to be transient as 5xx responses. This is a deliberate design choice and acceptable, but worth a comment explaining why only 5xx is retried and not transport errors.gitea/client_test.goTestDoGet_RetriesOn500andTestDoGet_FailsAfterMaxRetrieswill be slow in CI because the actual 1s and 2s backoff delays fire. The tests don't inject a clock or configurable backoff. Since the convention file notesgo test ./...must pass, and these tests will take ~3s and ~1s respectively to complete, consider whether the repo has a policy on slow unit tests. This is a common trade-off with time-based tests and not a blocker.Recommendation
APPROVE — Approve. The implementation is correct and addresses the stated problem. The retry loop correctly handles: (1) no retry on 4xx, (2) immediate return on transport errors, (3) context cancellation during backoff via select, (4) capped error body reads to prevent unbounded memory use, and (5) proper body closing in all paths. The
IsServerErrorhelper follows the same pattern as the existingIsNotFound. Tests are comprehensive and cover all meaningful branches. The minor nits (missing blank line, no injectable clock for tests) are not blockers.Review by sonnet
Evaluated against
23da7eed@@ -210,3 +216,3 @@return &review, nil}// doGet performs an HTTP GET request with retry on 5xx errors.[NIT] Missing blank line between
}(end ofPostReview) and the// doGetcomment. The original had a blank line there; the diff accidentally removed it. Minor style issue, butgofmtdoesn't enforce blank lines between methods so it won't fail CI.@@ -229,0 +267,4 @@// Error path: limit how much we read from potentially malicious servererrBody, _ := io.ReadAll(io.LimitReader(resp.Body, maxErrorBodyBytes))resp.Body.Close()[NIT] Network errors from
c.http.Do(req)are returned immediately without retry. Transient network errors (e.g., connection reset, EOF) are just as likely to be transient as 5xx responses. This is a deliberate design choice and acceptable, but worth a comment explaining why only 5xx is retried and not transport errors.@@ -746,0 +837,4 @@client := NewClient(server.URL, "test-token")_, err := client.doGet(context.Background(), server.URL+"/test")if err == nil {[NIT]
TestDoGet_RetriesOn500andTestDoGet_FailsAfterMaxRetrieswill be slow in CI because the actual 1s and 2s backoff delays fire. The tests don't inject a clock or configurable backoff. Since the convention file notesgo test ./...must pass, and these tests will take ~3s and ~1s respectively to complete, consider whether the repo has a policy on slow unit tests. This is a common trade-off with time-based tests and not a blocker.Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
23da7eed)Security Review
Summary
The changes add bounded, context-aware retry logic for 5xx responses with proper error handling and body size limits, improving robustness without introducing security risks. CI passed and the implementation adheres to repository conventions.
Recommendation
APPROVE — The retry implementation is sound: it limits error body reads (mitigating potential resource exhaustion), respects context cancellation, and avoids retrying on 4xx. Logging includes only the URL and delay, and no secrets are exposed. Tests comprehensively cover new behavior. Proceed with merge.
Review by security
Evaluated against
23da7eedOriginal reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
23da7eed)Gpt Review
Summary
The changes introduce robust 5xx retry logic with exponential backoff, context-aware cancellation, and safe body handling, along with comprehensive tests. The implementation is idiomatic and aligns with repository error-handling conventions.
Findings
gitea/client.gogitea/client.gogitea/client.goRecommendation
APPROVE — Overall this is a solid improvement: 5xx retries are implemented safely with limited error-body reads, context cancellation during backoff, and thorough tests covering success, failure, and cancellation scenarios. The code is idiomatic and respects existing error-wrapping conventions. I recommend merging as-is. For a follow-up, consider making backoff parameters injectable to speed up tests and potentially retrying on transient transport errors. Also, evaluate whether WARN-level logging on each retry is the desired default or should be configurable.
Review by gpt
Evaluated against
23da7eed@@ -209,25 +215,69 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,}[MINOR] Hard-coded backoff durations (1s, 2s) make unit tests slow and can increase overall test suite runtime. Consider making backoff configurable (e.g., via a field on Client or a helper function) so tests can use shorter delays while production keeps the intended values.
[MINOR] doGet only retries on 5xx responses; transient transport-level errors from c.http.Do (e.g., temporary network failures) return immediately without retry. If desired, consider retrying on temporary network errors (e.g., timeouts) to further improve resilience.
@@ -219,1 +222,4 @@// backoff[i] is the delay before attempt i+1 (i.e., after attempt i fails).// First attempt (i=0) has no delay; retries wait 1s then 2s.backoff := []time.Duration{1 * time.Second, 2 * time.Second}[NIT] Logging at WARN level on every retry may be noisy for library consumers. Consider lowering to INFO/DEBUG or making retry logging optional/configurable to avoid cluttering logs in normal operation.
Address review feedback: 1. Make backoff delays injectable via Client.RetryBackoff field - Defaults to {1s, 2s} when nil for production - Tests can set shorter values for fast execution - Fixes slow unit tests that previously waited 3+ seconds 2. Add retry on temporary network errors (net.OpError, net.DNSError) - Connection refused, network unreachable, DNS failures now retry - Non-temporary network errors still fail immediately - Context cancellation still respected during backoff Added isTemporaryNetError helper and TestIsTemporaryNetError test. Updated existing retry tests to use configurable short backoffs.Review Feedback Addressed (round 2)
Addressed the two MINOR findings from gpt-review-bot:
1. Configurable Backoff Durations
Added
RetryBackofffield toClientstruct:{1s, 2s}when nil (production behavior unchanged)client.RetryBackoff = []time.Duration{1*time.Millisecond, 1*time.Millisecond}for fast execution2. Retry on Temporary Network Errors
Added
isTemporaryNetError()helper that detects:net.OpError(connection refused, network unreachable)net.DNSErrorwithTemporary() == truenet.ErrorwithTimeout() == truedoGetnow retries on these transient errors, improving resilience against flaky networks.All tests pass (
go test ./...), vet clean (go vet ./...).Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
090ae384)Sonnet Review
Summary
The PR adds well-structured retry logic with exponential backoff to
doGet(), handling 5xx errors and transient network failures. CI passes, the implementation is correct, and tests are comprehensive. There are a few minor issues worth noting but none that warrant blocking the merge.Findings
gitea/client.gogitea/client.goisTemporaryNetError, treating all*net.OpErroras retriable is overly broad. OpError covers connection refused (transient) but also permission denied and other permanent conditions. A more precise check would inspectopErr.Errfor syscall errors. That said, the comment acknowledges this and the practical impact on the Gitea use-case is low.gitea/client.godelay == 0(i.e., the backoff slice is shorter than expected or empty), the retry loop silently skips the warning log and delay. This is a valid edge case but could be surprising — if someone setsRetryBackoff = []time.Duration{}, all retries happen with zero delay and no logging. The code is correct but slightly implicit; a comment noting this would improve clarity.gitea/client_test.goTestDoGet_RetriesOnTemporaryNetErrorhas no meaningful assertions — the comment says 'actual retry behavior depends on timing' and the return values are discarded with_, _. This test exercises the code path but doesn't verify any postcondition, making it closer to a smoke test. It won't cause flakiness but also won't catch regressions. Consider whether it provides sufficient value or should be replaced with a more deterministic approach (e.g., using a custom http.Transport that fails N times then succeeds).gitea/client.golastErroruses camelCase. slog convention (and the existingslog.Warncall at line 323) uses lowercase snake-case or single-word keys. Recommend"last_error"or"error"for consistency with the other log call in this function which uses"error".Recommendation
APPROVE — Approve. The implementation is correct, well-tested, and follows established patterns in the codebase. The retry logic correctly handles 5xx errors, respects context cancellation, skips retries on 4xx, limits error body reads, and provides a clean testability hook via RetryBackoff. The minor findings are nits that don't affect correctness or safety. The
TestDoGet_RetriesOnTemporaryNetErrortest is weak but not harmful. This is a solid, production-ready change.Review by sonnet
Evaluated against
090ae384@@ -46,2 +53,4 @@token stringhttp *http.Client// RetryBackoff defines the delays between retry attempts.[MINOR] RetryBackoff is an exported field on Client, which exposes a testing/configuration hook as part of the public API surface. Per repository conventions and idiomatic Go (nil-opts pattern), this is usable and documented, but it makes the test-configuration concern part of the permanent API contract. A more idiomatic approach would be an unexported field with a package-internal setter, but the current approach is pragmatic and acceptable given the documented intent.
@@ -229,0 +274,4 @@if attempt > 0 {// Determine delay: use backoff slice if available, otherwise no delayvar delay time.Durationif attempt-1 < len(backoff) {[MINOR] In
isTemporaryNetError, treating all*net.OpErroras retriable is overly broad. OpError covers connection refused (transient) but also permission denied and other permanent conditions. A more precise check would inspectopErr.Errfor syscall errors. That said, the comment acknowledges this and the practical impact on the Gitea use-case is low.@@ -229,0 +295,4 @@}}req, err := http.NewRequestWithContext(ctx, http.MethodGet, reqURL, nil)[MINOR] When
delay == 0(i.e., the backoff slice is shorter than expected or empty), the retry loop silently skips the warning log and delay. This is a valid edge case but could be surprising — if someone setsRetryBackoff = []time.Duration{}, all retries happen with zero delay and no logging. The code is correct but slightly implicit; a comment noting this would improve clarity.@@ -229,0 +307,4 @@// We only retry if there are attempts remaining.if attempt < maxAttempts-1 && isTemporaryNetError(err) {slog.Warn("temporary network error, will retry","attempt", attempt+1,[NIT] The log field name
lastErroruses camelCase. slog convention (and the existingslog.Warncall at line 323) uses lowercase snake-case or single-word keys. Recommend"last_error"or"error"for consistency with the other log call in this function which uses"error".@@ -746,0 +904,4 @@ln.Close()// Start a server after a short delay to succeed on retrygo func() {[NIT]
TestDoGet_RetriesOnTemporaryNetErrorhas no meaningful assertions — the comment says 'actual retry behavior depends on timing' and the return values are discarded with_, _. This test exercises the code path but doesn't verify any postcondition, making it closer to a smoke test. It won't cause flakiness but also won't catch regressions. Consider whether it provides sufficient value or should be replaced with a more deterministic approach (e.g., using a custom http.Transport that fails N times then succeeds).Original reviewSuperseded — see current review for up-to-date findings.
Previous findings (commit
090ae384)Security Review
Summary
The changes add bounded, context-aware retry logic for GET requests on 5xx and transient network errors, with sensible backoff and limited error-body reads. No new security vulnerabilities are introduced; in fact, limiting error body reads improves resilience against resource exhaustion.
Recommendation
APPROVE — Approve the PR. The retry implementation respects context cancellation, avoids unbounded error-body reads, and does not log sensitive tokens. No security issues are evident in the changes, and CI has passed.
Review by security
Evaluated against
090ae384Gpt Review
Summary
The retry logic with exponential backoff is implemented correctly and idiomatically, with good context cancellation handling and bounded error-body reads. Tests thoroughly cover the new behavior and CI passed.
Findings
gitea/client.gogitea/client.gogitea/client.gogitea/client.goRecommendation
APPROVE — Overall the change adds robust retry behavior with context-aware backoff and appropriate error handling, and the tests provide solid coverage. To tighten the design, consider making RetryBackoff unexported (or clearly documenting it must be set before concurrent use) and narrowing isTemporaryNetError so permanent network failures don’t trigger unnecessary retries. Minor logging clarity and optional use of IsServerError inside doGet would also improve readability. With CI passing and no major issues found, this is good to merge.
Review by gpt
Evaluated against
090ae384@@ -39,12 +40,23 @@ func IsNotFound(err error) bool {return errors.As(err, &apiErr) && apiErr.StatusCode == http.StatusNotFound[NIT] IsServerError helper is introduced but not used within doGet; using it where applicable could improve readability.
@@ -213,0 +231,4 @@// Check for common retriable error patterns in the error chain.// Check OpError first since it embeds net.Error, and we want to catch// connection refused, network unreachable, etc. as retriable.[MINOR] isTemporaryNetError treats any net.OpError as retriable, which may cause retries on permanent failures (e.g., no such host). Consider narrowing the check (e.g., prefer net.Error timeouts, DNSError.IsTemporary, or specific syscall errors) to avoid unnecessary retries.
@@ -225,1 +268,4 @@// maxErrorBodyBytes limits how much of an error response body we read// to protect against malicious servers sending unbounded data.const maxErrorBodyBytes = 64 * 1024 // 64 KB[NIT] Retry log 'attempt' field uses attempt+1 during backoff, which may be interpreted ambiguously. Consider logging the retry number explicitly (e.g., 'retry' count) or the next attempt number for clarity.
Addressed Review Feedback (commit
ac53ecf)Addressed all MINOR findings from the previous review:
Fixed
1.
isTemporaryNetErrortoo broad (sonnet line ~277, gpt line ~234)*net.OpErroras retriableECONNREFUSED,ECONNRESET,ENETUNREACH,EHOSTUNREACH,ETIMEDOUTEACCES,EPERMare no longer retriedTemporary()toIsTimeoutsince "no such host" is permanentisRetriableSyscallErrorhelper with comprehensive test coverageDocumented (justified as pragmatic)
2.
RetryBackoffexported (sonnet line ~56, gpt line ~74)3. Empty backoff slice behavior (sonnet line ~298)
delay == 0, we retry immediately without warning. This is reasonable behavior — the retry still happens, just without delay. Adding a warning would spam logs for edge cases someone might intentionally configure.Verification
go test ./...✅go vet ./...✅isRetriableSyscallErrorfunctionTestIsTemporaryNetErrorwith specific syscall error casesSonnet Review
Summary
The PR implements retry logic with exponential backoff for transient 5xx and network errors in
doGet(). The implementation is correct, well-tested, and follows established Go patterns. CI passes. A few minor concerns worth noting but none are blockers.Findings
gitea/client.goRetryBackoffis an exported field onClient, which violates the concurrency safety guarantee documented on the type: "A Client is safe for concurrent use by multiple goroutines." The field's own comment acknowledges "Modifying it while requests are in flight is not safe", but this creates an exception to the advertised contract. Consider making it unexported and configuring it only throughNewClientor a constructor option, or at minimum update the concurrency safety doc comment onClientto note the exception.gitea/client.godelay == 0(becauseattempt-1 >= len(backoff)), the retry proceeds silently without any log message. The WARN log is only emitted whendelay > 0. In the case where someone setsRetryBackoff = []time.Duration{}(empty, not nil), all retries happen silently with no logging at all. This is a minor observability gap — it could be confusing in production if retries happen but nothing is logged. Consider logging even for zero-delay retries, or at least whenattempt > 0.gitea/client_test.goTestDoGet_RetriesOnTemporaryNetErroris timing-dependent and makes no deterministic assertions — it explicitly says "actual retry behavior depends on timing". The test comment acknowledges this. While the test does exercise the code path, it provides weak guarantees. A more reliable approach would be to useisTemporaryNetErrorunit tests (which are present) and a separate test with a mock/stub transport that returnssyscall.ECONNREFUSEDdirectly, avoiding the real TCP listener race.gitea/client.goGetAllFilesInPath_500Propagatestest (already existing) relies on the server always returning 500, but with retry logic now in place, the test server will be hit 3 times before the error propagates. The test still passes (because the server always returns 500), but test execution time increases by ~3ms (with default 0-delay in tests usingRetryBackoff = nil). Tests forGetAllFilesInPathdon't setRetryBackoff, so they'll use the default 1s/2s delays — but sincehttptest.NewServeris local, the server succeeds immediately and returns 500, so these tests hit the server 3 times with 1s+2s delays... Actually this could make existing integration-style tests significantly slower. Worth verifying that existing tests (not the new retry tests) that hit 500 responses set shortRetryBackoff, or accept the 3s slowdown.gitea/client.goReviewstruct reformatting (removing extra spaces to align fields) is an unrelated cosmetic change included in the diff. Not a problem, but slightly noisy.Recommendation
APPROVE — Approve with the minor notes. The core logic is sound: 5xx retry with bounded attempts, context cancellation support, 4xx pass-through, and well-documented backoff configuration for tests. The
isTemporaryNetError/isRetriableSyscallErrordecomposition is clean and well-tested. The most actionable concern is that existing tests calling endpoints that return 500 (e.g.,TestGetAllFilesInPath_500Propagates,TestGetAllFilesInPath_403Propagates) will now trigger 3 real HTTP requests with 3 seconds of total backoff delay because they don't setRetryBackoff. The reviewer should confirm those tests set short backoffs or that CI still finishes in acceptable time. The exportedRetryBackofffield is a reasonable pragmatic choice for test configurability given the project's conventions, but the concurrency doc comment should be updated to note the exception.Review by sonnet
Evaluated against
ac53ecfa@@ -45,3 +53,4 @@baseURL stringtoken stringhttp *http.Client[MINOR]
RetryBackoffis an exported field onClient, which violates the concurrency safety guarantee documented on the type: "A Client is safe for concurrent use by multiple goroutines." The field's own comment acknowledges "Modifying it while requests are in flight is not safe", but this creates an exception to the advertised contract. Consider making it unexported and configuring it only throughNewClientor a constructor option, or at minimum update the concurrency safety doc comment onClientto note the exception.@@ -213,0 +280,4 @@}}// If we can't identify the specific syscall error, be conservative and retry.[NIT] The
GetAllFilesInPath_500Propagatestest (already existing) relies on the server always returning 500, but with retry logic now in place, the test server will be hit 3 times before the error propagates. The test still passes (because the server always returns 500), but test execution time increases by ~3ms (with default 0-delay in tests usingRetryBackoff = nil). Tests forGetAllFilesInPathdon't setRetryBackoff, so they'll use the default 1s/2s delays — but sincehttptest.NewServeris local, the server succeeds immediately and returns 500, so these tests hit the server 3 times with 1s+2s delays... Actually this could make existing integration-style tests significantly slower. Worth verifying that existing tests (not the new retry tests) that hit 500 responses set shortRetryBackoff, or accept the 3s slowdown.@@ -222,3 +301,1 @@return nil, err}defer resp.Body.Close()// maxErrorBodyBytes limits how much of an error response body we read[MINOR] When
delay == 0(becauseattempt-1 >= len(backoff)), the retry proceeds silently without any log message. The WARN log is only emitted whendelay > 0. In the case where someone setsRetryBackoff = []time.Duration{}(empty, not nil), all retries happen silently with no logging at all. This is a minor observability gap — it could be confusing in production if retries happen but nothing is logged. Consider logging even for zero-delay retries, or at least whenattempt > 0.@@ -317,9 +462,9 @@ func (c *Client) GetAllFilesInPath(ctx context.Context, owner, repo, path string[NIT] The
Reviewstruct reformatting (removing extra spaces to align fields) is an unrelated cosmetic change included in the diff. Not a problem, but slightly noisy.@@ -746,0 +894,4 @@func TestDoGet_RetriesOnTemporaryNetError(t *testing.T) {attempts := 0// Create a listener that we can close to simulate connection refused[MINOR]
TestDoGet_RetriesOnTemporaryNetErroris timing-dependent and makes no deterministic assertions — it explicitly says "actual retry behavior depends on timing". The test comment acknowledges this. While the test does exercise the code path, it provides weak guarantees. A more reliable approach would be to useisTemporaryNetErrorunit tests (which are present) and a separate test with a mock/stub transport that returnssyscall.ECONNREFUSEDdirectly, avoiding the real TCP listener race.Security Review
Summary
The changes add bounded retry logic with context-aware backoff and safer error-body handling. No exploitable security vulnerabilities are evident; the implementation limits error body size and avoids infinite retries.
Findings
gitea/client.gogitea/client.goRecommendation
APPROVE — Overall, the retry logic is well-scoped: capped attempts, exponential backoff, context cancellation respected, and error bodies are size-limited to mitigate resource exhaustion and log flooding risks. To harden further, avoid logging full URLs and server-provided error bodies by redacting query parameters and logging only status codes or brief summaries. Otherwise, the changes are safe to merge.
Review by security
Evaluated against
ac53ecfa@@ -210,24 +226,153 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,return &review, nil[MINOR] Retry warnings log the full request URL and the last error (which may include server-provided body text). While bodies are truncated and URLs here do not include auth, logging full URLs or server error content can inadvertently leak sensitive query parameters or details if future callers pass sensitive data in query strings. Consider redacting query parameters and limiting error detail in logs.
[NIT] isRetriableSyscallError returns true for unknown underlying errors, causing retries even on potentially permanent failures. This is bounded and not a security issue, but could slightly increase request attempts against misconfigured endpoints. Consider restricting retries to known transient error classes.
Gpt Review
Summary
The retry logic for 5xx and transient network errors in doGet is well-implemented, idiomatic, and covered by thorough tests. CI passed and changes adhere to repository conventions and Go patterns.
Findings
gitea/client.gogitea/client.goRecommendation
APPROVE — The changes introduce robust retry behavior with context-aware backoff, clear error handling via APIError, and sensible detection of transient network conditions. Tests comprehensively cover the new logic, and CI has passed. Approve the PR. For future hardening, consider adding jitter to the backoff delays and clarify whether IsServerError is intended for external callers or should remain internal.
Review by gpt
Evaluated against
ac53ecfa@@ -39,12 +41,26 @@ func IsNotFound(err error) bool {return errors.As(err, &apiErr) && apiErr.StatusCode == http.StatusNotFound[NIT] IsServerError is exported but currently only used in tests; this is fine as a helper, but consider documenting intended external usage or keeping it unexported if not part of the public API.
@@ -210,24 +226,153 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,return &review, nil[NIT] Consider adding jitter to the retry backoff to avoid synchronized retries across multiple clients when the server experiences transient issues.
Security Finding Addressed
Finding: Retry warnings log the full request URL and the last error, which could inadvertently leak sensitive query parameters if future callers pass them.
Resolution: Added
redactURL()helper function that:[redacted])[invalid URL]for unparseable URLs to avoid leaking any dataBoth retry warning log calls now use
redactURL(reqURL)instead of the raw URL.Why the error itself is kept as-is:
APIError.Error()already truncates response bodies to 200 charsisTemporaryNetErrorpath) contain system-level error messages, not user-controlled dataTests: Added
TestRedactURLwith coverage for:All tests pass. Commit:
ecfbfddSonnet Review
Summary
Well-implemented retry logic with exponential backoff, proper context cancellation support, URL redaction for safe logging, and comprehensive test coverage. The code follows established patterns and conventions throughout the codebase.
Findings
gitea/client.godelay == 0(i.e.,attempt-1 >= len(backoff)orbackoffis empty), the code skips the timer block entirely and logs nothing. This is correct behavior. However, theslog.Warnlog is only emitted whendelay > 0, meaning silent retries happen with no observability when the backoff slice is shorter thanmaxAttempts-1. Consider logging retries even when delay is zero, or document this intentional behavior in the comment.gitea/client_test.goTestDoGet_RetriesOnTemporaryNetErroris a timing-dependent test that explicitly acknowledges it may not verify actual retry behavior ('The request might succeed or fail depending on timing'). It provides almost no guaranteed assertion —attemptscould be 0 and the test still passes. This test would be more valuable if it used a controlled mechanism (e.g., an atomic counter with a channel to sequence listener startup) rather than relying ontime.Sleep. As-is, it's essentially a no-op assertion that just verifies the code path compiles.gitea/client.go// backoff[i] is the delay before attempt i+1 (i.e., after attempt i fails).appears both as an inline comment and in theRetryBackofffield doc comment. The duplication is harmless but slightly redundant.gitea/client.goRetryBackofffield is exported (capital R), which exposes an implementation detail of the retry mechanism as a public API. The doc comment explicitly warns 'Modifying it while requests are in flight is not safe', which means theClientconcurrency safety documentation ('A Client is safe for concurrent use by multiple goroutines') is now conditionally true. This is a minor design tension worth noting — the concurrency safety claim in the struct comment should probably be qualified, orRetryBackoffshould only be settable before any request is made (which the doc comment does say, but the type system can't enforce it).Recommendation
APPROVE — Approve. The implementation is correct, well-tested, and follows the patterns established in the codebase. The retry logic properly handles: 5xx retries with exponential backoff, context cancellation during backoff waits, no-retry on 4xx errors, temporary network error detection, and URL redaction for safe logging. The
TestDoGet_RetriesOnTemporaryNetErrortest is weak (timing-dependent with no guaranteed assertions), but it doesn't block the PR — it's just low-value test coverage. The exportedRetryBackofffield is an acceptable design choice for test configurability, though the concurrency safety guarantee forClientis now slightly weakened. None of these findings are blocking issues.Review by sonnet
Evaluated against
ecfbfddc@@ -213,0 +293,4 @@parsed, err := url.Parse(rawURL)if err != nil {// If we cannot parse it, return a safe placeholder rather than// potentially logging something sensitive.[NIT] The comment
// backoff[i] is the delay before attempt i+1 (i.e., after attempt i fails).appears both as an inline comment and in theRetryBackofffield doc comment. The duplication is harmless but slightly redundant.@@ -222,3 +317,1 @@return nil, err}defer resp.Body.Close()// maxErrorBodyBytes limits how much of an error response body we read[MINOR] Timer leak on zero-delay path: when
delay == 0(i.e.,attempt-1 >= len(backoff)orbackoffis empty), the code skips the timer block entirely and logs nothing. This is correct behavior. However, theslog.Warnlog is only emitted whendelay > 0, meaning silent retries happen with no observability when the backoff slice is shorter thanmaxAttempts-1. Consider logging retries even when delay is zero, or document this intentional behavior in the comment.@@ -229,0 +351,4 @@return nil, err}req.Header.Set("Authorization", "token "+c.token)[NIT] The
RetryBackofffield is exported (capital R), which exposes an implementation detail of the retry mechanism as a public API. The doc comment explicitly warns 'Modifying it while requests are in flight is not safe', which means theClientconcurrency safety documentation ('A Client is safe for concurrent use by multiple goroutines') is now conditionally true. This is a minor design tension worth noting — the concurrency safety claim in the struct comment should probably be qualified, orRetryBackoffshould only be settable before any request is made (which the doc comment does say, but the type system can't enforce it).@@ -746,0 +890,4 @@t.Errorf("attempts = %d, expected 1 before context cancel during backoff", attempts)}}[MINOR]
TestDoGet_RetriesOnTemporaryNetErroris a timing-dependent test that explicitly acknowledges it may not verify actual retry behavior ('The request might succeed or fail depending on timing'). It provides almost no guaranteed assertion —attemptscould be 0 and the test still passes. This test would be more valuable if it used a controlled mechanism (e.g., an atomic counter with a channel to sequence listener startup) rather than relying ontime.Sleep. As-is, it's essentially a no-op assertion that just verifies the code path compiles.Security Review
Summary
The change adds bounded, context-aware retry logic for transient 5xx and network errors with careful URL redaction and capped error-body reads. CI passed and the implementation respects security best practices, with only a minor logging-related observation.
Findings
gitea/client.goRecommendation
APPROVE — Overall, the retry implementation is sound and improves resilience without introducing security risks. It limits error-body reads to 64KB, redacts query strings in logged URLs, respects context cancellation, and avoids unbounded retries. As a defense-in-depth measure, consider reducing or sanitizing the logged error body content for 5xx responses (e.g., log only status codes or a hash/reference) if logs may be broadly accessible. Otherwise, the changes are acceptable as-is.
Review by security
Evaluated against
ecfbfddc@@ -217,1 +312,4 @@backoff := c.RetryBackoffif backoff == nil {backoff = []time.Duration{1 * time.Second, 2 * time.Second}}[MINOR] Retry warning logs include the error object (lastError) which will render the API error message (truncated to 200 chars) and could expose portions of server error responses in logs. While bodies are size-limited and URLs are redacted, consider the operational sensitivity of logging server error content at WARN level.
Gpt Review
Summary
Well-implemented retry logic with bounded attempts, context-aware backoff, and clear error typing was added. Tests are comprehensive and CI passed. The changes are idiomatic, respect repository conventions, and improve resilience to transient 5xx and network errors.
Recommendation
APPROVE — The retry/backoff implementation in doGet is cautious and well-scoped (max 3 attempts, 1s/2s defaults), respects context cancellation, and avoids leaking sensitive data in logs. Error handling uses a structured APIError and preserves 4xx semantics. The added helpers and tests cover key edge cases. Consider keeping RetryBackoff immutable after client creation to avoid accidental data races in concurrent use, but the current documentation warning is acceptable. Proceed to merge.
Review by gpt
Evaluated against
ecfbfddcAddressed MINOR findings from review round
Finding 1: Non-deterministic test (sonnet-review-bot)
Problem:
TestDoGet_RetriesOnTemporaryNetErrorwas timing-dependent with no guaranteed assertions.Fix: Replaced the listener-based approach with a
mockTransportthat injects controlled failures:SetHTTPClient()method for test transport injectionmockTransportreturnsnet.OpErrorwithECONNREFUSEDfor first N attemptsFinding 2: Retry warning logs include error body (security-review-bot)
Problem:
lastErrorin retry warning logs could include up to 200 chars of server response body.Fix: Added
sanitizeErrorForLog()helper:APIError: logs onlyHTTP <status>(no body content)Both fixes include test coverage. All tests pass (
make precommitclean).Sonnet Review
Summary
This PR adds well-structured retry logic with exponential backoff to the Gitea API client. The implementation is correct, thoughtfully designed with proper context cancellation support, body size limiting, URL redaction for safe logging, and comprehensive test coverage. A few minor issues are worth noting but none are blockers.
Findings
gitea/client.gogitea/client.gogitea/client.gogitea/client_test.gogitea/client_test.gogitea/client_test.goRecommendation
APPROVE — Approve with the noted minor issues. The core logic is correct and well-tested. The most substantive issue is the exported RetryBackoff field on a type documented as concurrently safe — consider addressing this in a follow-up by making it unexported or introducing a constructor option. The timing-based test (TestDoGet_RespectsContextCancellation) is a known pattern but carries some flakiness risk on slow CI machines; worth hardening if flakiness is observed. None of the findings are blockers.
Review by sonnet
Evaluated against
5498dccd@@ -43,4 +51,4 @@// A Client is safe for concurrent use by multiple goroutines.type Client struct {baseURL stringtoken string[MINOR] RetryBackoff is an exported field on Client, which is documented as 'safe for concurrent use by multiple goroutines'. However, the comment on RetryBackoff says 'Modifying it while requests are in flight is not safe.' This creates an inconsistency: the type-level doc says concurrent-safe, but this field is an exception. Consider either making RetryBackoff unexported and configuring it only through a constructor option (e.g., a WithRetryBackoff option function or NewClient parameter), or at minimum adding a prominent warning at the type level that RetryBackoff must be configured before first use.
@@ -213,0 +296,4 @@// This prevents accidental exposure of sensitive data that future callers// might pass via query strings.func redactURL(rawURL string) string {parsed, err := url.Parse(rawURL)[MINOR] When delay is 0 (either because the backoff slice is exhausted or an explicit zero-duration backoff was provided), the retry happens silently with no log entry. This means if maxAttempts > len(backoff)+1, the extra retry attempts are invisible. A log entry at WARN level would help with observability even for zero-delay retries. The comment 'An empty RetryBackoff slice means retry without delay' is correct but the logging gap could be confusing during production incident investigation.
@@ -213,0 +313,4 @@// the status code is included; for other errors, the type is preserved.func sanitizeErrorForLog(err error) string {if err == nil {return "<nil>"[MINOR] When isTemporaryNetError returns true and we set lastErr = err and continue, the delay logic in the next iteration uses attempt-1 as the backoff index. However, unlike HTTP 5xx errors where lastErr is always set before the delay block, network errors also set lastErr. The code is correct (lastErr will be non-nil for the log message), but the flow could be clearer — when a network error occurs on attempt 2 (the last), the code falls through to 'return nil, err' (not lastErr). This is actually correct behavior (returns the raw error), but it is inconsistent with the HTTP error path which always returns lastErr. Minor readability issue.
@@ -746,0 +831,4 @@t.Errorf("status = %d, want 500", apiErr.StatusCode)}if attempts != 3 {t.Errorf("attempts = %d, want 3 (max retries)", attempts)[NIT] Double blank line before the mockTransport type declaration.
@@ -746,0 +840,4 @@server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {attempts++w.WriteHeader(http.StatusForbidden)w.Write([]byte(`{"message":"forbidden"}`))[NIT] TestDoGet_RespectsContextCancellation uses time.Sleep(20 * time.Millisecond) to race against a 100ms backoff timer. This is a time-based race and could theoretically flake if the test machine is under heavy load and the goroutine scheduling delays cause the cancel to fire after the backoff completes. Consider using a channel signal from the server handler to know exactly when the first attempt has finished, rather than relying on wall-clock timing.
@@ -746,0 +855,4 @@}if apiErr.StatusCode != http.StatusForbidden {t.Errorf("status = %d, want 403", apiErr.StatusCode)}[NIT] mockTransport.failCount is declared as int32 with a comment 'atomic' but is accessed with both atomic.AddInt32 and the struct field directly (initial value set in struct literal). The struct literal initialization is safe since it happens before the transport is used, but using atomic.Int32 (a value type from sync/atomic, like attemptsMade) would be more idiomatic and self-documenting.
Security Review
Summary
The changes add bounded, context-aware retry logic for transient 5xx and network errors, introduce safe logging (URL redaction, error sanitization), and cap error-body reads to prevent excessive memory use. No exploitable vulnerabilities are introduced; overall security posture is improved.
Recommendation
APPROVE — CI has passed and the new retry implementation is careful about context cancellation, avoids leaking sensitive data in logs by redacting query parameters and sanitizing API errors, and limits error response body size. The modifications do not introduce injection, auth, or deserialization risks, and they slightly harden against resource exhaustion. Approve the PR.
Review by security
Evaluated against
5498dccdGpt Review
Summary
The retry logic and supporting helpers are well-implemented, idiomatic, and thoroughly tested. The changes respect repository conventions, handle context cancellation, limit error body reads, and avoid panics.
Recommendation
APPROVE — Overall, this is a solid improvement. The doGet retry loop correctly handles 5xx and transient network errors with bounded retries and exponential backoff, logs with redacted URLs, and respects context cancellation. Tests are comprehensive and use httptest and table-driven patterns appropriately. Minor suggestions (non-blocking): consider clarifying the comment on sanitizeErrorForLog to reflect that it returns the error message for non-API errors rather than the type, and optionally document that SetHTTPClient should be configured prior to concurrent use (similar to the note on RetryBackoff). Otherwise, ship it.
Review by gpt
Evaluated against
5498dccdAddressed sonnet review finding #3 (MINOR)
The inconsistency in error returns has been fixed in commit
1b38e6a.The issue: On the final retry attempt, when a network error occurred, the code returned the raw
errvariable instead oflastErr. While they held the same value in practice, the code paths were inconsistent:lastErrerrdirectlyThe fix: Moved
lastErr = erroutside the retry condition, so:lastErrfirstlastErrconsistently✅ All tests pass via
make precommitSonnet Review
Summary
Solid, well-structured implementation of retry logic with exponential backoff. The code follows Go idioms, has thorough test coverage, and correctly handles all the edge cases (context cancellation, 4xx vs 5xx, network errors). CI passes.
Findings
gitea/client.goRetryBackofffield is exported onClient, making it part of the public API surface. This is used as a testing seam, but exporting a mutable slice field on a type documented as 'safe for concurrent use' is a design smell. The field comment itself warns 'Modifying it while requests are in flight is not safe', which contradicts the concurrency safety claim for the type. A cleaner pattern would be to keep it unexported and provide aWithRetryBackoff([]time.Duration) *Clientconstructor option, or accept it only in a constructor. Since it's already exported and the comment addresses the limitation, this is minor but worth noting for future refactors.gitea/client.godelay == 0(either because the backoff slice is exhausted or the caller set a zero duration), the retry happens immediately without any log message. This means silent retries occur with no observability whenattempt-1 >= len(backoff). The logging block is gated ondelay > 0, so a zero-delay retry (e.g.,RetryBackoff = []time.Duration{}) produces no log output at all. Consider logging the retry even at zero delay, or restructuring so the log always fires on retry regardless of delay.gitea/client_test.goTestDoGet_RespectsContextCancellationand themockTransporttype definition. Minor formatting issue.gitea/client.goSetHTTPClientis exported and documented as 'intended for testing'. Exporting test-only methods on production types is generally discouraged — it pollutes the public API and could be misused. Consider using theexport_test.gopattern to expose this only during tests, or making thehttpfield settable via a constructor option.Recommendation
APPROVE — Approve with minor observations. The implementation is correct and well-engineered: retry logic is sound, context cancellation is properly handled, the 4xx vs 5xx distinction is correct, error bodies are capped to prevent resource exhaustion, URLs are redacted before logging, and the test coverage is comprehensive. The
RetryBackoffexport andSetHTTPClientexport are minor design concerns worth tracking but not blocking. The zero-delay logging gap is a minor observability issue.Review by sonnet
Evaluated against
1b38e6ad@@ -56,6 +72,12 @@ func NewClient(baseURL, token string) *Client {}[NIT]
SetHTTPClientis exported and documented as 'intended for testing'. Exporting test-only methods on production types is generally discouraged — it pollutes the public API and could be misused. Consider using theexport_test.gopattern to expose this only during tests, or making thehttpfield settable via a constructor option.@@ -213,0 +304,4 @@}if parsed.RawQuery != "" {parsed.RawQuery = "[redacted]"}[MINOR] The
RetryBackofffield is exported onClient, making it part of the public API surface. This is used as a testing seam, but exporting a mutable slice field on a type documented as 'safe for concurrent use' is a design smell. The field comment itself warns 'Modifying it while requests are in flight is not safe', which contradicts the concurrency safety claim for the type. A cleaner pattern would be to keep it unexported and provide aWithRetryBackoff([]time.Duration) *Clientconstructor option, or accept it only in a constructor. Since it's already exported and the comment addresses the limitation, this is minor but worth noting for future refactors.@@ -225,1 +337,4 @@// maxErrorBodyBytes limits how much of an error response body we read// to protect against malicious servers sending unbounded data.const maxErrorBodyBytes = 64 * 1024 // 64 KB[MINOR] When
delay == 0(either because the backoff slice is exhausted or the caller set a zero duration), the retry happens immediately without any log message. This means silent retries occur with no observability whenattempt-1 >= len(backoff). The logging block is gated ondelay > 0, so a zero-delay retry (e.g.,RetryBackoff = []time.Duration{}) produces no log output at all. Consider logging the retry even at zero delay, or restructuring so the log always fires on retry regardless of delay.@@ -746,0 +838,4 @@func TestDoGet_NoRetryOn4xx(t *testing.T) {attempts := 0server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {attempts++[NIT] Double blank line between
TestDoGet_RespectsContextCancellationand themockTransporttype definition. Minor formatting issue.Security Review
Summary
The added retry logic, error classification helpers, and logging sanitization are implemented carefully and improve resilience without introducing security risks. CI passed and the changes include sensible safeguards like error body size limits and query redaction in logs.
Recommendation
APPROVE — Approve the PR. The retry/backoff implementation respects context cancellation, avoids logging sensitive data by redacting query parameters and sanitizing errors, and limits error response body sizes to prevent potential resource exhaustion. No authentication/authorization, injection, or secrets exposure issues were introduced. Consider, as future hardening, imposing reasonable limits on successful response body size if the API contracts allow, but this is not required for this change.
Review by security
Evaluated against
1b38e6adGpt Review
Summary
Solid addition of retry logic with sensible backoff and context-aware cancellation. Implementation is idiomatic, error-safe, and well-covered by comprehensive tests; CI has passed.
Recommendation
APPROVE — The retry mechanism for 5xx responses and transient network errors is well-designed (limited attempts, exponential backoff, context-aware waits), and helpers like IsServerError, redactURL, and sanitizeErrorForLog improve clarity and safety. Error bodies are bounded, responses are properly closed, and logging avoids leaking sensitive data. Tests thoroughly cover success, failure, 4xx no-retry, network errors, and cancellation, and CI is green. Proceed with merge.
Review by gpt
Evaluated against
1b38e6adSelf-review: PASS
No blocking issues found — ready for human review.
Checklist
Minor Observation (not blocking)
client_test.godetected by gofmt - cosmetic only