feat(gitea): add retry logic for 5xx errors #69
@@ -215,23 +215,32 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,
|
||||
}
|
||||
|
|
||||
return &review, nil
|
||||
}
|
||||
|
||||
// doGet performs an HTTP GET request with retry on 5xx errors.
|
||||
|
sonnet-review-bot
commented
[NIT] Missing blank line between **[NIT]** Missing blank line between `}` (end of `PostReview`) and the `// doGet` comment. The original had a blank line there; the diff accidentally removed it. Minor style issue, but `gofmt` doesn't enforce blank lines between methods so it won't fail CI.
|
||||
// Retries up to 3 times with exponential backoff (1s, 2s delays).
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `backoff` slice is defined inline but only the delay values at indices 1 and 2 are ever used (index 0 is always 0 and the `attempt > 0` guard skips it). This is fine functionally but slightly misleading — `backoff[0]` is dead. A minor cleanliness issue, not a bug.
|
||||
func (c *Client) doGet(ctx context.Context, reqURL string) ([]byte, error) {
|
||||
|
gpt-review-bot
commented
[NIT] The backoff slice includes a 0-duration element that is never indexed due to the attempt>0 check. This is harmless but could be simplified by defining delays only for attempts that actually wait (e.g., []time.Duration{1 * time.Second, 2 * time.Second}). **[NIT]** The backoff slice includes a 0-duration element that is never indexed due to the attempt>0 check. This is harmless but could be simplified by defining delays only for attempts that actually wait (e.g., []time.Duration{1 * time.Second, 2 * time.Second}).
|
||||
const maxAttempts = 3
|
||||
backoff := []time.Duration{0, 1 * time.Second, 2 * time.Second}
|
||||
// backoff[i] is the delay before attempt i+1 (i.e., after attempt i fails).
|
||||
// First attempt (i=0) has no delay; retries wait 1s then 2s.
|
||||
backoff := []time.Duration{1 * time.Second, 2 * time.Second}
|
||||
|
||||
|
gpt-review-bot
commented
[NIT] Logging at WARN level on every retry may be noisy for library consumers. Consider lowering to INFO/DEBUG or making retry logging optional/configurable to avoid cluttering logs in normal operation. **[NIT]** Logging at WARN level on every retry may be noisy for library consumers. Consider lowering to INFO/DEBUG or making retry logging optional/configurable to avoid cluttering logs in normal operation.
|
||||
// maxErrorBodyBytes limits how much of an error response body we read
|
||||
|
[MINOR] Retry warnings log the full request URL and the last error (which may include server-provided body text). While bodies are truncated and URLs here do not include auth, logging full URLs or server error content can inadvertently leak sensitive query parameters or details if future callers pass sensitive data in query strings. Consider redacting query parameters and limiting error detail in logs. **[MINOR]** Retry warnings log the full request URL and the last error (which may include server-provided body text). While bodies are truncated and URLs here do not include auth, logging full URLs or server error content can inadvertently leak sensitive query parameters or details if future callers pass sensitive data in query strings. Consider redacting query parameters and limiting error detail in logs.
[NIT] isRetriableSyscallError returns true for unknown underlying errors, causing retries even on potentially permanent failures. This is bounded and not a security issue, but could slightly increase request attempts against misconfigured endpoints. Consider restricting retries to known transient error classes. **[NIT]** isRetriableSyscallError returns true for unknown underlying errors, causing retries even on potentially permanent failures. This is bounded and not a security issue, but could slightly increase request attempts against misconfigured endpoints. Consider restricting retries to known transient error classes.
gpt-review-bot
commented
[NIT] Consider adding jitter to the retry backoff to avoid synchronized retries across multiple clients when the server experiences transient issues. **[NIT]** Consider adding jitter to the retry backoff to avoid synchronized retries across multiple clients when the server experiences transient issues.
|
||||
// to protect against malicious servers sending unbounded data.
|
||||
const maxErrorBodyBytes = 64 * 1024 // 64 KB
|
||||
|
||||
var lastErr error
|
||||
|
gpt-review-bot
commented
[MINOR] Using time.After in a select with context cancellation can leave a timer to fire after cancellation. Consider using time.NewTimer/backoffTimer := time.NewTimer(d) and deferring backoffTimer.Stop() to avoid unnecessary timers if ctx is canceled before the delay elapses. **[MINOR]** Using time.After in a select with context cancellation can leave a timer to fire after cancellation. Consider using time.NewTimer/backoffTimer := time.NewTimer(d) and deferring backoffTimer.Stop() to avoid unnecessary timers if ctx is canceled before the delay elapses.
|
||||
for attempt := 0; attempt < maxAttempts; attempt++ {
|
||||
if attempt > 0 {
|
||||
delay := backoff[attempt-1]
|
||||
slog.Warn("retrying request after server error",
|
||||
|
gpt-review-bot
commented
[MINOR] isTemporaryNetError treats any net.OpError as retriable, which may cause retries on permanent failures (e.g., no such host). Consider narrowing the check (e.g., prefer net.Error timeouts, DNSError.IsTemporary, or specific syscall errors) to avoid unnecessary retries. **[MINOR]** isTemporaryNetError treats any net.OpError as retriable, which may cause retries on permanent failures (e.g., no such host). Consider narrowing the check (e.g., prefer net.Error timeouts, DNSError.IsTemporary, or specific syscall errors) to avoid unnecessary retries.
|
||||
"attempt", attempt+1,
|
||||
"url", reqURL,
|
||||
|
security-review-bot marked this conversation as resolved
Outdated
[MINOR] On non-2xx responses, the code reads the entire response body with io.ReadAll and stores it in APIError. This can allow a malicious or misbehaving server to force large allocations. Consider limiting the size for error bodies (e.g., io.LimitReader) since the body is only used for error context. **[MINOR]** On non-2xx responses, the code reads the entire response body with io.ReadAll and stores it in APIError. This can allow a malicious or misbehaving server to force large allocations. Consider limiting the size for error bodies (e.g., io.LimitReader) since the body is only used for error context.
|
||||
"delay", backoff[attempt].String())
|
||||
"delay", delay.String())
|
||||
|
||||
timer := time.NewTimer(delay)
|
||||
select {
|
||||
case <-time.After(backoff[attempt]):
|
||||
case <-timer.C:
|
||||
case <-ctx.Done():
|
||||
|
sonnet-review-bot
commented
[MINOR] Network/transport errors from **[MINOR]** Network/transport errors from `c.http.Do(req)` (e.g., connection refused, DNS failure) are returned immediately without retry. Transient network errors are often just as worth retrying as 5xx responses. This is a design choice, but worth noting — the PR description only mentions 5xx so this is intentional, but it limits the usefulness of the retry in some failure modes.
|
||||
timer.Stop()
|
||||
return nil, ctx.Err()
|
||||
}
|
||||
}
|
||||
@@ -247,17 +256,20 @@ func (c *Client) doGet(ctx context.Context, reqURL string) ([]byte, error) {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
body, readErr := io.ReadAll(resp.Body)
|
||||
resp.Body.Close()
|
||||
|
||||
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
|
||||
if readErr != nil {
|
||||
return nil, readErr
|
||||
body, err := io.ReadAll(resp.Body)
|
||||
|
gpt-review-bot
commented
[NIT] The retry condition duplicates the 5xx check inline. For consistency and reuse, consider using the new IsServerError helper here (errors.As on APIError or check status) to keep logic centralized. **[NIT]** The retry condition duplicates the 5xx check inline. For consistency and reuse, consider using the new IsServerError helper here (errors.As on APIError or check status) to keep logic centralized.
|
||||
resp.Body.Close()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return body, nil
|
||||
}
|
||||
|
||||
lastErr = &APIError{StatusCode: resp.StatusCode, Body: string(body)}
|
||||
// Error path: limit how much we read from potentially malicious server
|
||||
errBody, _ := io.ReadAll(io.LimitReader(resp.Body, maxErrorBodyBytes))
|
||||
resp.Body.Close()
|
||||
|
sonnet-review-bot
commented
[NIT] Network errors from **[NIT]** Network errors from `c.http.Do(req)` are returned immediately without retry. Transient network errors (e.g., connection reset, EOF) are just as likely to be transient as 5xx responses. This is a deliberate design choice and acceptable, but worth a comment explaining why only 5xx is retried and not transport errors.
|
||||
|
||||
|
gpt-review-bot
commented
[NIT] Retry log 'attempt' field uses attempt+1 during backoff, which may be interpreted ambiguously. Consider logging the retry number explicitly (e.g., 'retry' count) or the next attempt number for clarity. **[NIT]** Retry log 'attempt' field uses attempt+1 during backoff, which may be interpreted ambiguously. Consider logging the retry number explicitly (e.g., 'retry' count) or the next attempt number for clarity.
|
||||
lastErr = &APIError{StatusCode: resp.StatusCode, Body: string(errBody)}
|
||||
|
||||
// Only retry on 5xx server errors
|
||||
if resp.StatusCode < 500 || resp.StatusCode >= 600 {
|
||||
|
||||
[MINOR] Hard-coded backoff durations (1s, 2s) make unit tests slow and can increase overall test suite runtime. Consider making backoff configurable (e.g., via a field on Client or a helper function) so tests can use shorter delays while production keeps the intended values.
[MINOR] doGet only retries on 5xx responses; transient transport-level errors from c.http.Do (e.g., temporary network failures) return immediately without retry. If desired, consider retrying on temporary network errors (e.g., timeouts) to further improve resilience.