feat(gitea): add retry logic for 5xx errors #69
@@ -39,6 +39,12 @@ func IsNotFound(err error) bool {
|
||||
return errors.As(err, &apiErr) && apiErr.StatusCode == http.StatusNotFound
|
||||
}
|
||||
|
|
||||
|
||||
|
gpt-review-bot
commented
[NIT] IsServerError is exported but currently only used in tests; this is fine as a helper, but consider documenting intended external usage or keeping it unexported if not part of the public API. **[NIT]** IsServerError is exported but currently only used in tests; this is fine as a helper, but consider documenting intended external usage or keeping it unexported if not part of the public API.
|
||||
// IsServerError reports whether an error is an API 5xx response.
|
||||
func IsServerError(err error) bool {
|
||||
var apiErr *APIError
|
||||
return errors.As(err, &apiErr) && apiErr.StatusCode >= 500 && apiErr.StatusCode < 600
|
||||
}
|
||||
|
||||
// Client interacts with the Gitea API.
|
||||
// A Client is safe for concurrent use by multiple goroutines.
|
||||
type Client struct {
|
||||
@@ -210,7 +216,26 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,
|
||||
return &review, nil
|
||||
}
|
||||
|
||||
|
sonnet-review-bot
commented
[NIT] Missing blank line between **[NIT]** Missing blank line between `}` (end of `PostReview`) and the `// doGet` comment. The original had a blank line there; the diff accidentally removed it. Minor style issue, but `gofmt` doesn't enforce blank lines between methods so it won't fail CI.
|
||||
// doGet performs an HTTP GET request with retry on 5xx errors.
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `backoff` slice is defined inline but only the delay values at indices 1 and 2 are ever used (index 0 is always 0 and the `attempt > 0` guard skips it). This is fine functionally but slightly misleading — `backoff[0]` is dead. A minor cleanliness issue, not a bug.
|
||||
// Retries up to 3 times with exponential backoff (1s, 2s delays).
|
||||
|
gpt-review-bot
commented
[NIT] The backoff slice includes a 0-duration element that is never indexed due to the attempt>0 check. This is harmless but could be simplified by defining delays only for attempts that actually wait (e.g., []time.Duration{1 * time.Second, 2 * time.Second}). **[NIT]** The backoff slice includes a 0-duration element that is never indexed due to the attempt>0 check. This is harmless but could be simplified by defining delays only for attempts that actually wait (e.g., []time.Duration{1 * time.Second, 2 * time.Second}).
|
||||
func (c *Client) doGet(ctx context.Context, reqURL string) ([]byte, error) {
|
||||
const maxAttempts = 3
|
||||
backoff := []time.Duration{0, 1 * time.Second, 2 * time.Second}
|
||||
|
||||
var lastErr error
|
||||
|
gpt-review-bot
commented
[NIT] Logging at WARN level on every retry may be noisy for library consumers. Consider lowering to INFO/DEBUG or making retry logging optional/configurable to avoid cluttering logs in normal operation. **[NIT]** Logging at WARN level on every retry may be noisy for library consumers. Consider lowering to INFO/DEBUG or making retry logging optional/configurable to avoid cluttering logs in normal operation.
|
||||
for attempt := 0; attempt < maxAttempts; attempt++ {
|
||||
|
[MINOR] Retry warnings log the full request URL and the last error (which may include server-provided body text). While bodies are truncated and URLs here do not include auth, logging full URLs or server error content can inadvertently leak sensitive query parameters or details if future callers pass sensitive data in query strings. Consider redacting query parameters and limiting error detail in logs. **[MINOR]** Retry warnings log the full request URL and the last error (which may include server-provided body text). While bodies are truncated and URLs here do not include auth, logging full URLs or server error content can inadvertently leak sensitive query parameters or details if future callers pass sensitive data in query strings. Consider redacting query parameters and limiting error detail in logs.
[NIT] isRetriableSyscallError returns true for unknown underlying errors, causing retries even on potentially permanent failures. This is bounded and not a security issue, but could slightly increase request attempts against misconfigured endpoints. Consider restricting retries to known transient error classes. **[NIT]** isRetriableSyscallError returns true for unknown underlying errors, causing retries even on potentially permanent failures. This is bounded and not a security issue, but could slightly increase request attempts against misconfigured endpoints. Consider restricting retries to known transient error classes.
gpt-review-bot
commented
[NIT] Consider adding jitter to the retry backoff to avoid synchronized retries across multiple clients when the server experiences transient issues. **[NIT]** Consider adding jitter to the retry backoff to avoid synchronized retries across multiple clients when the server experiences transient issues.
|
||||
if attempt > 0 {
|
||||
slog.Warn("retrying request after server error",
|
||||
"attempt", attempt+1,
|
||||
"url", reqURL,
|
||||
|
gpt-review-bot
commented
[MINOR] Using time.After in a select with context cancellation can leave a timer to fire after cancellation. Consider using time.NewTimer/backoffTimer := time.NewTimer(d) and deferring backoffTimer.Stop() to avoid unnecessary timers if ctx is canceled before the delay elapses. **[MINOR]** Using time.After in a select with context cancellation can leave a timer to fire after cancellation. Consider using time.NewTimer/backoffTimer := time.NewTimer(d) and deferring backoffTimer.Stop() to avoid unnecessary timers if ctx is canceled before the delay elapses.
|
||||
"delay", backoff[attempt].String())
|
||||
select {
|
||||
case <-time.After(backoff[attempt]):
|
||||
case <-ctx.Done():
|
||||
|
gpt-review-bot
commented
[MINOR] isTemporaryNetError treats any net.OpError as retriable, which may cause retries on permanent failures (e.g., no such host). Consider narrowing the check (e.g., prefer net.Error timeouts, DNSError.IsTemporary, or specific syscall errors) to avoid unnecessary retries. **[MINOR]** isTemporaryNetError treats any net.OpError as retriable, which may cause retries on permanent failures (e.g., no such host). Consider narrowing the check (e.g., prefer net.Error timeouts, DNSError.IsTemporary, or specific syscall errors) to avoid unnecessary retries.
|
||||
return nil, ctx.Err()
|
||||
}
|
||||
|
security-review-bot marked this conversation as resolved
Outdated
[MINOR] On non-2xx responses, the code reads the entire response body with io.ReadAll and stores it in APIError. This can allow a malicious or misbehaving server to force large allocations. Consider limiting the size for error bodies (e.g., io.LimitReader) since the body is only used for error context. **[MINOR]** On non-2xx responses, the code reads the entire response body with io.ReadAll and stores it in APIError. This can allow a malicious or misbehaving server to force large allocations. Consider limiting the size for error bodies (e.g., io.LimitReader) since the body is only used for error context.
|
||||
}
|
||||
|
||||
req, err := http.NewRequestWithContext(ctx, http.MethodGet, reqURL, nil)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
@@ -221,13 +246,26 @@ func (c *Client) doGet(ctx context.Context, reqURL string) ([]byte, error) {
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
|
||||
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
|
||||
body, _ := io.ReadAll(resp.Body)
|
||||
return nil, &APIError{StatusCode: resp.StatusCode, Body: string(body)}
|
||||
body, readErr := io.ReadAll(resp.Body)
|
||||
resp.Body.Close()
|
||||
|
||||
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
|
||||
if readErr != nil {
|
||||
return nil, readErr
|
||||
}
|
||||
return io.ReadAll(resp.Body)
|
||||
return body, nil
|
||||
}
|
||||
|
||||
lastErr = &APIError{StatusCode: resp.StatusCode, Body: string(body)}
|
||||
|
gpt-review-bot
commented
[NIT] The retry condition duplicates the 5xx check inline. For consistency and reuse, consider using the new IsServerError helper here (errors.As on APIError or check status) to keep logic centralized. **[NIT]** The retry condition duplicates the 5xx check inline. For consistency and reuse, consider using the new IsServerError helper here (errors.As on APIError or check status) to keep logic centralized.
|
||||
|
||||
// Only retry on 5xx server errors
|
||||
if resp.StatusCode < 500 || resp.StatusCode >= 600 {
|
||||
return nil, lastErr
|
||||
}
|
||||
}
|
||||
|
||||
return nil, lastErr
|
||||
}
|
||||
|
||||
|
sonnet-review-bot
commented
[NIT] Network errors from **[NIT]** Network errors from `c.http.Do(req)` are returned immediately without retry. Transient network errors (e.g., connection reset, EOF) are just as likely to be transient as 5xx responses. This is a deliberate design choice and acceptable, but worth a comment explaining why only 5xx is retried and not transport errors.
|
||||
// escapePath escapes each segment of a relative file path for use in URLs.
|
||||
|
gpt-review-bot
commented
[NIT] Retry log 'attempt' field uses attempt+1 during backoff, which may be interpreted ambiguously. Consider logging the retry number explicitly (e.g., 'retry' count) or the next attempt number for clarity. **[NIT]** Retry log 'attempt' field uses attempt+1 during backoff, which may be interpreted ambiguously. Consider logging the retry number explicitly (e.g., 'retry' count) or the next attempt number for clarity.
|
||||
|
||||
@@ -10,6 +10,7 @@ import (
|
||||
"net/http/httptest"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestGetPullRequest(t *testing.T) {
|
||||
@@ -743,3 +744,137 @@ func TestResolveComment_Error(t *testing.T) {
|
||||
t.Fatal("expected error for 404 response")
|
||||
}
|
||||
}
|
||||
|
||||
func TestIsServerError(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
err error
|
||||
want bool
|
||||
}{
|
||||
{"nil error", nil, false},
|
||||
{"non-API error", fmt.Errorf("network timeout"), false},
|
||||
{"404 APIError", &APIError{StatusCode: 404, Body: "not found"}, false},
|
||||
{"500 APIError", &APIError{StatusCode: 500, Body: "server error"}, true},
|
||||
{"502 APIError", &APIError{StatusCode: 502, Body: "bad gateway"}, true},
|
||||
{"503 APIError", &APIError{StatusCode: 503, Body: "unavailable"}, true},
|
||||
{"599 APIError", &APIError{StatusCode: 599, Body: "edge case"}, true},
|
||||
{"600 not server error", &APIError{StatusCode: 600, Body: "edge"}, false},
|
||||
{"400 not server error", &APIError{StatusCode: 400, Body: "bad request"}, false},
|
||||
{"wrapped 500", fmt.Errorf("fetch: %w", &APIError{StatusCode: 500, Body: "err"}), true},
|
||||
{"wrapped 404", fmt.Errorf("fetch: %w", &APIError{StatusCode: 404, Body: "err"}), false},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
got := IsServerError(tt.err)
|
||||
if got != tt.want {
|
||||
t.Errorf("IsServerError(%v) = %v, want %v", tt.err, got, tt.want)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestDoGet_RetriesOn500(t *testing.T) {
|
||||
attempts := 0
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
attempts++
|
||||
if attempts < 3 {
|
||||
w.WriteHeader(http.StatusInternalServerError)
|
||||
w.Write([]byte(`{"message":"transient error"}`))
|
||||
return
|
||||
}
|
||||
w.WriteHeader(http.StatusOK)
|
||||
w.Write([]byte(`{"data":"success"}`))
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
client := NewClient(server.URL, "test-token")
|
||||
body, err := client.doGet(context.Background(), server.URL+"/test")
|
||||
if err != nil {
|
||||
t.Fatalf("expected success after retry, got error: %v", err)
|
||||
}
|
||||
if string(body) != `{"data":"success"}` {
|
||||
t.Errorf("body = %q, want %q", string(body), `{"data":"success"}`)
|
||||
}
|
||||
if attempts != 3 {
|
||||
t.Errorf("attempts = %d, want 3", attempts)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDoGet_FailsAfterMaxRetries(t *testing.T) {
|
||||
attempts := 0
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
attempts++
|
||||
w.WriteHeader(http.StatusInternalServerError)
|
||||
w.Write([]byte(`{"message":"persistent error"}`))
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
client := NewClient(server.URL, "test-token")
|
||||
_, err := client.doGet(context.Background(), server.URL+"/test")
|
||||
if err == nil {
|
||||
t.Fatal("expected error after max retries")
|
||||
}
|
||||
var apiErr *APIError
|
||||
if !errors.As(err, &apiErr) {
|
||||
t.Fatalf("expected APIError, got: %v", err)
|
||||
}
|
||||
if apiErr.StatusCode != http.StatusInternalServerError {
|
||||
t.Errorf("status = %d, want 500", apiErr.StatusCode)
|
||||
}
|
||||
if attempts != 3 {
|
||||
t.Errorf("attempts = %d, want 3 (max retries)", attempts)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDoGet_NoRetryOn4xx(t *testing.T) {
|
||||
attempts := 0
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
attempts++
|
||||
w.WriteHeader(http.StatusForbidden)
|
||||
w.Write([]byte(`{"message":"forbidden"}`))
|
||||
|
sonnet-review-bot
commented
[NIT] Double blank line before the mockTransport type declaration. **[NIT]** Double blank line before the mockTransport type declaration.
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
client := NewClient(server.URL, "test-token")
|
||||
_, err := client.doGet(context.Background(), server.URL+"/test")
|
||||
if err == nil {
|
||||
|
sonnet-review-bot
commented
[NIT] **[NIT]** `TestDoGet_RetriesOn500` and `TestDoGet_FailsAfterMaxRetries` will be slow in CI because the actual 1s and 2s backoff delays fire. The tests don't inject a clock or configurable backoff. Since the convention file notes `go test ./...` must pass, and these tests will take ~3s and ~1s respectively to complete, consider whether the repo has a policy on slow unit tests. This is a common trade-off with time-based tests and not a blocker.
|
||||
t.Fatal("expected error for 403")
|
||||
|
sonnet-review-bot
commented
[NIT] Double blank line between **[NIT]** Double blank line between `TestDoGet_RespectsContextCancellation` and the `mockTransport` type definition. Minor formatting issue.
|
||||
}
|
||||
var apiErr *APIError
|
||||
|
sonnet-review-bot
commented
[NIT] TestDoGet_RespectsContextCancellation uses time.Sleep(20 * time.Millisecond) to race against a 100ms backoff timer. This is a time-based race and could theoretically flake if the test machine is under heavy load and the goroutine scheduling delays cause the cancel to fire after the backoff completes. Consider using a channel signal from the server handler to know exactly when the first attempt has finished, rather than relying on wall-clock timing. **[NIT]** TestDoGet_RespectsContextCancellation uses time.Sleep(20 * time.Millisecond) to race against a 100ms backoff timer. This is a time-based race and could theoretically flake if the test machine is under heavy load and the goroutine scheduling delays cause the cancel to fire after the backoff completes. Consider using a channel signal from the server handler to know exactly when the first attempt has finished, rather than relying on wall-clock timing.
|
||||
if !errors.As(err, &apiErr) {
|
||||
t.Fatalf("expected APIError, got: %v", err)
|
||||
}
|
||||
if apiErr.StatusCode != http.StatusForbidden {
|
||||
t.Errorf("status = %d, want 403", apiErr.StatusCode)
|
||||
}
|
||||
if attempts != 1 {
|
||||
t.Errorf("attempts = %d, want 1 (no retry on 4xx)", attempts)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDoGet_RespectsContextCancellation(t *testing.T) {
|
||||
attempts := 0
|
||||
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
attempts++
|
||||
|
sonnet-review-bot
commented
[NIT] mockTransport.failCount is declared as int32 with a comment 'atomic' but is accessed with both atomic.AddInt32 and the struct field directly (initial value set in struct literal). The struct literal initialization is safe since it happens before the transport is used, but using atomic.Int32 (a value type from sync/atomic, like attemptsMade) would be more idiomatic and self-documenting. **[NIT]** mockTransport.failCount is declared as int32 with a comment 'atomic' but is accessed with both atomic.AddInt32 and the struct field directly (initial value set in struct literal). The struct literal initialization is safe since it happens before the transport is used, but using atomic.Int32 (a value type from sync/atomic, like attemptsMade) would be more idiomatic and self-documenting.
|
||||
w.WriteHeader(http.StatusInternalServerError)
|
||||
w.Write([]byte(`{"message":"error"}`))
|
||||
}))
|
||||
defer server.Close()
|
||||
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
// Cancel immediately after first attempt would trigger retry
|
||||
go func() {
|
||||
time.Sleep(50 * time.Millisecond)
|
||||
|
sonnet-review-bot
commented
[NIT] **[NIT]** `TestDoGet_RetriesOn500` incurs real 1-second backoff delays (attempt 1→2 waits 1s, attempt 2→3 waits 2s), making the test take ~3 seconds minimum. Consider making the backoff durations configurable on the client (or injectable for tests) to avoid slow tests. The context cancellation test already uses a 50ms sleep which is similarly timing-sensitive.
|
||||
cancel()
|
||||
}()
|
||||
|
||||
client := NewClient(server.URL, "test-token")
|
||||
_, err := client.doGet(ctx, server.URL+"/test")
|
||||
if err == nil {
|
||||
t.Fatal("expected error on context cancellation")
|
||||
}
|
||||
// Should have made 1 attempt, then context cancelled during backoff
|
||||
if attempts > 2 {
|
||||
t.Errorf("attempts = %d, expected at most 2 before context cancel", attempts)
|
||||
}
|
||||
}
|
||||
|
||||
[NIT] IsServerError helper is introduced but not used within doGet; using it where applicable could improve readability.