fix: retry on transient LLM response body truncation #48

Merged
aweiker merged 1 commits from fix/response-body-truncation into main 2026-05-08 02:32:37 +00:00
4 changed files with 216 additions and 18 deletions
Showing only changes of commit db479d0ff4 - Show all commits
+28 -12
View File
@@ -254,25 +254,41 @@ func main() {
slog.Warn("context trimmed to fit budget", "trimmed", budgetResult.Trimmed)
Review

[MINOR] The retry logic for LLM completion requests wraps the entire request and parse operation with two attempts. However, the retry only happens on parse failure or LLM completion errors separately from transport-level retries inside Complete(). This may cause duplicate retries or confusing logs. Consider unifying retry approaches or clarifying retry severity.

**[MINOR]** The retry logic for LLM completion requests wraps the entire request and parse operation with two attempts. However, the retry only happens on parse failure or LLM completion errors separately from transport-level retries inside Complete(). This may cause duplicate retries or confusing logs. Consider unifying retry approaches or clarifying retry severity.
Review

[MAJOR] CI is failing across multiple review jobs; per process rules, this requires addressing before merge.

**[MAJOR]** CI is failing across multiple review jobs; per process rules, this requires addressing before merge.
}
Review

[NIT] Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity.

**[NIT]** Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity.
// Step 8: Call LLM
// Step 8: Call LLM (with retry on parse failure)
slog.Info("sending request to LLM", "model", *llmModel)
messages := []llm.Message{
{Role: "system", Content: budgetResult.SystemPrompt},
{Role: "user", Content: budgetResult.UserPrompt},
}
response, err := llmClient.Complete(ctx, messages)
if err != nil {
slog.Error("LLM request failed", "model", *llmModel, "error", err)
os.Exit(1)
}
slog.Info("LLM response received", "bytes", len(response))
var response string
Review

[MINOR] The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done.

**[MINOR]** The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done.
var result *review.ReviewResult
for attempt := 1; attempt <= 2; attempt++ {
if attempt > 1 {
slog.Warn("retrying LLM request after parse failure", "attempt", attempt)
time.Sleep(time.Second)
}
// Step 9: Parse response
result, err := review.ParseResponse(response)
if err != nil {
slog.Error("failed to parse LLM response", "error", err)
os.Exit(1)
response, err = llmClient.Complete(ctx, messages)
if err != nil {
slog.Error("LLM request failed", "model", *llmModel, "error", err, "attempt", attempt)
if attempt == 2 {
os.Exit(1)
}
continue
}
slog.Info("LLM response received", "bytes", len(response), "attempt", attempt)
// Step 9: Parse response
result, err = review.ParseResponse(response)
if err != nil {
slog.Error("failed to parse LLM response", "error", err, "attempt", attempt)
if attempt == 2 {
os.Exit(1)
}
continue
}
break
}
slog.Info("review parsed", "verdict", result.Verdict, "findings", len(result.Findings))
Review

[MAJOR] result is declared as var result *review.ReviewResult before the loop. If llmClient.Complete succeeds on attempt 1 but review.ParseResponse fails on attempt 1, and then llmClient.Complete also fails on attempt 2, the code hits continue and the loop ends with result == nil. The code then falls through to slog.Info("review parsed", "verdict", result.Verdict, ...) which will panic with a nil pointer dereference. The os.Exit(1) on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate os.Exit(1), but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.

**[MAJOR]** `result` is declared as `var result *review.ReviewResult` before the loop. If `llmClient.Complete` succeeds on attempt 1 but `review.ParseResponse` fails on attempt 1, and then `llmClient.Complete` also fails on attempt 2, the code hits `continue` and the loop ends with `result == nil`. The code then falls through to `slog.Info("review parsed", "verdict", result.Verdict, ...)` which will panic with a nil pointer dereference. The `os.Exit(1)` on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate `os.Exit(1)`, but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.
+51 -5
View File
@@ -75,12 +75,52 @@ type Message struct {
// Complete sends a chat completion request and returns the assistant's response content.
Review

[MAJOR] The Complete method introduces retry logic for transient errors detected by error string matching. However, this retry logic only retries once and does not propagate the root cause error context. Consider improving error handling to retain full context and proper wrapping.

**[MAJOR]** The Complete method introduces retry logic for transient errors detected by error string matching. However, this retry logic only retries once and does not propagate the root cause error context. Consider improving error handling to retain full context and proper wrapping.
Review

[MAJOR] The isRetryableError function relies on error message substring matching. This is brittle and may cause false positives or negatives. Consider defining typed errors or error wrapping for more reliable error classification.

**[MAJOR]** The isRetryableError function relies on error message substring matching. This is brittle and may cause false positives or negatives. Consider defining typed errors or error wrapping for more reliable error classification.
Review

[MINOR] isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As.

**[MINOR]** isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As.
// The first message with role "system" is treated as the system prompt.
func (c *Client) Complete(ctx context.Context, messages []Message) (string, error) {
switch c.provider {
case ProviderAnthropic:
return c.completeAnthropic(ctx, messages)
default:
return c.completeOpenAI(ctx, messages)
var result string
var err error
Review

[NIT] In Complete(), the loop uses attempt < 2 with sleeps for retry. Consider extracting max retries and backoff as constants or configuration for flexibility and clarity.

**[NIT]** In Complete(), the loop uses `attempt < 2` with sleeps for retry. Consider extracting max retries and backoff as constants or configuration for flexibility and clarity.
for attempt := 0; attempt < 2; attempt++ {
switch c.provider {
case ProviderAnthropic:
result, err = c.completeAnthropic(ctx, messages)
default:
result, err = c.completeOpenAI(ctx, messages)
Review

[MAJOR] CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality.

**[MAJOR]** CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality.
Review

[NIT] The loop bound attempt < 2 with attempt starting at 0 is idiomatic but slightly harder to read than maxRetries = 2 with a named constant, especially now that the same retry count (2) appears in both Complete() in llm/client.go and the outer loop in cmd/review-bot/main.go. Consider a named constant to avoid magic numbers and make the retry policy explicit.

**[NIT]** The loop bound `attempt < 2` with `attempt` starting at 0 is idiomatic but slightly harder to read than `maxRetries = 2` with a named constant, especially now that the same retry count (2) appears in both `Complete()` in `llm/client.go` and the outer loop in `cmd/review-bot/main.go`. Consider a named constant to avoid magic numbers and make the retry policy explicit.
}
if err == nil {
return result, nil
}
// Only retry on response body read errors (transient network issues).
// Do not retry on context cancellation, status errors, or parse errors
// that indicate a structural API problem.
if !isRetryableError(err) {
Review

[MINOR] The sleep time.Sleep(500 * time.Millisecond) is guarded by attempt == 0 && ctx.Err() == nil, but since the loop runs from attempt = 0 to attempt < 2, when attempt == 1 (the second/last iteration) fails and is retryable, control falls through to return "", err without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider if attempt < 1 or restructuring to sleep at the top of the loop for attempt > 0.

**[MINOR]** The sleep `time.Sleep(500 * time.Millisecond)` is guarded by `attempt == 0 && ctx.Err() == nil`, but since the loop runs from `attempt = 0` to `attempt < 2`, when `attempt == 1` (the second/last iteration) fails and is retryable, control falls through to `return "", err` without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider `if attempt < 1` or restructuring to sleep at the top of the loop for `attempt > 0`.
return "", err
}
if attempt == 0 && ctx.Err() == nil {
Review

[MINOR] isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is).

**[MINOR]** isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is).
// Brief pause before retry to allow transient issues to resolve.
time.Sleep(500 * time.Millisecond)
}
}
Review

[MAJOR] isRetryableError classifies errors by substring-matching on err.Error(). This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in doRequest will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (var errBodyTruncated = errors.New(...)) or a custom error type implementing errors.Is/errors.As, then check with errors.Is(err, errBodyTruncated). This is especially important because isRetryableError is tested by constructing errors from raw strings (fmt.Errorf("%s", tt.err)), which means the tests would still pass even if the actual error-wrapping chain changed.

**[MAJOR]** `isRetryableError` classifies errors by substring-matching on `err.Error()`. This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in `doRequest` will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (`var errBodyTruncated = errors.New(...)`) or a custom error type implementing `errors.Is`/`errors.As`, then check with `errors.Is(err, errBodyTruncated)`. This is especially important because `isRetryableError` is tested by constructing errors from raw strings (`fmt.Errorf("%s", tt.err)`), which means the tests would still pass even if the actual error-wrapping chain changed.
return "", err
}
// isRetryableError returns true for transient errors worth retrying.
func isRetryableError(err error) bool {
if err == nil {
return false
}
s := err.Error()
// Body read failures (connection reset, truncation)
if strings.Contains(s, "read response") {
return true
}
// Unexpected body length (our content-length validation)
if strings.Contains(s, "body length mismatch") {
return true
}
return false
}
Review

[NIT] The ParseResponse error wrapping uses %w with additional diagnostic info concatenated via . For better compatibility with errors.Is / errors.As and formatting, consider wrapping once and logging additional data separately.

**[NIT]** The ParseResponse error wrapping uses `%w` with additional diagnostic info concatenated via ` `. For better compatibility with errors.Is / errors.As and formatting, consider wrapping once and logging additional data separately.
// --- OpenAI-compatible implementation ---
@@ -231,6 +271,12 @@ func (c *Client) doRequest(req *http.Request, parse func([]byte) (string, error)
return "", fmt.Errorf("read response: %w", err)
Review

[MAJOR] The content-length validation triggers an error when the received body length is less than the Content-Length header. However, the current implementation only checks if received length is less than Content-Length, but not if it is greater, which might also indicate an error. Consider validating exact equality or reject oversize bodies.

**[MAJOR]** The content-length validation triggers an error when the received body length is less than the Content-Length header. However, the current implementation only checks if received length is less than Content-Length, but not if it is greater, which might also indicate an error. Consider validating exact equality or reject oversize bodies.
Review

[MINOR] The content-length validation cl > 0 && int64(len(body)) < cl uses resp.ContentLength which is already parsed by net/http from the Content-Length header and returns -1 when absent. The check cl > 0 is correct for this, but it is worth noting that resp.ContentLength will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.

**[MINOR]** The content-length validation `cl > 0 && int64(len(body)) < cl` uses `resp.ContentLength` which is already parsed by `net/http` from the `Content-Length` header and returns -1 when absent. The check `cl > 0` is correct for this, but it is worth noting that `resp.ContentLength` will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.
Review

[MAJOR] doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors.

**[MAJOR]** doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors.
}
// Validate body length against Content-Length header when present.
// A mismatch indicates the response was truncated in transit.
Review

[NIT] The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path.

**[NIT]** The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path.
if cl := resp.ContentLength; cl > 0 && int64(len(body)) < cl {
return "", fmt.Errorf("body length mismatch: Content-Length=%d, received=%d", cl, len(body))
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return "", fmt.Errorf("LLM API error (status %d): %s", resp.StatusCode, string(body))
}
+129
View File
@@ -3,6 +3,7 @@ package llm
import (
"context"
"encoding/json"
"fmt"
"net/http"
"net/http/httptest"
"testing"
@@ -295,3 +296,131 @@ func TestWithProvider(t *testing.T) {
t.Errorf("expected provider anthropic, got %s", client.provider)
Review

[MINOR] A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried.

**[MINOR]** A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried.
}
}
func TestComplete_RetryOnBodyReadError(t *testing.T) {
Review

[NIT] The retry tests cover body truncation via premature connection close and content-length mismatch but do not explicitly test the behavior when received body is longer than declared content-length. Adding such tests could improve coverage.

**[NIT]** The retry tests cover body truncation via premature connection close and content-length mismatch but do not explicitly test the behavior when received body is longer than declared content-length. Adding such tests could improve coverage.
attempts := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
// First attempt: send headers then close connection abruptly
// Simulate by writing partial response and flushing with wrong Content-Length
w.Header().Set("Content-Length", "1000")
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"choices":[{"message":{"con`))
// The test HTTP server will close the connection after handler returns,
// but Content-Length mismatch means client gets fewer bytes than expected
return
}
// Second attempt: succeed
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(ChatResponse{
Choices: []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
}{{Message: struct {
Content string `json:"content"`
}{Content: "success"}}},
})
}))
defer server.Close()
client := NewClient(server.URL, "key", "model")
got, err := client.Complete(context.Background(), []Message{{Role: "user", Content: "Hi"}})
if err != nil {
t.Fatalf("expected retry to succeed, got error: %v", err)
}
if got != "success" {
Review

[MINOR] TestComplete_ContentLengthMismatch does not verify that exactly 2 attempts were made (unlike TestComplete_RetryOnBodyReadError). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding if attempts != 2 { t.Errorf(...) }.

**[MINOR]** `TestComplete_ContentLengthMismatch` does not verify that exactly 2 attempts were made (unlike `TestComplete_RetryOnBodyReadError`). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding `if attempts != 2 { t.Errorf(...) }`.
t.Errorf("expected %q, got %q", "success", got)
}
if attempts != 2 {
t.Errorf("expected 2 attempts, got %d", attempts)
}
}
func TestComplete_ContentLengthMismatch(t *testing.T) {
attempts := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
// Claim Content-Length is larger than actual body
w.Header().Set("Content-Length", "500")
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
// Write less than 500 bytes
w.Write([]byte(`{"choices":[{"message":{"content":"partial"}}]}`))
return
}
// Second attempt succeeds
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(ChatResponse{
Choices: []struct {
Message struct {
Content string `json:"content"`
} `json:"message"`
}{{Message: struct {
Content string `json:"content"`
}{Content: "complete"}}},
})
}))
defer server.Close()
client := NewClient(server.URL, "key", "model")
got, err := client.Complete(context.Background(), []Message{{Role: "user", Content: "Hi"}})
if err != nil {
t.Fatalf("expected retry to succeed on content-length mismatch, got: %v", err)
}
if got != "complete" {
t.Errorf("expected %q, got %q", "complete", got)
}
}
func TestComplete_NoRetryOnAPIError(t *testing.T) {
attempts := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
w.WriteHeader(http.StatusBadRequest)
w.Write([]byte(`{"error":"bad request"}`))
}))
defer server.Close()
client := NewClient(server.URL, "key", "model")
_, err := client.Complete(context.Background(), []Message{{Role: "user", Content: "Hi"}})
if err == nil {
t.Fatal("expected error for 400, got nil")
}
if attempts != 1 {
t.Errorf("should not retry on API errors, got %d attempts", attempts)
}
}
func TestIsRetryableError(t *testing.T) {
tests := []struct {
name string
err string
expected bool
}{
{"nil formatted", "", false},
{"read response error", "read response: unexpected EOF", true},
{"body length mismatch", "body length mismatch: Content-Length=1000, received=500", true},
{"API error", "LLM API error (status 400): bad request", false},
{"parse error", "parse response: unexpected end of JSON input", false},
{"request error", "LLM request: connection refused", false},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if tt.err == "" {
if isRetryableError(nil) {
t.Error("nil error should not be retryable")
}
return
}
err := fmt.Errorf("%s", tt.err)
got := isRetryableError(err)
if got != tt.expected {
t.Errorf("isRetryableError(%q) = %v, want %v", tt.err, got, tt.expected)
}
})
}
}
+8 -1
View File
@@ -33,7 +33,14 @@ func ParseResponse(response string) (*ReviewResult, error) {
// Try to repair before giving up.
Review

[MINOR] Improved error message from ParseResponse includes lengths and previews, aiding diagnostics. Consider adding more structured logging (e.g., separate fields) instead of embedding data in error messages for easier machine parsing.

**[MINOR]** Improved error message from ParseResponse includes lengths and previews, aiding diagnostics. Consider adding more structured logging (e.g., separate fields) instead of embedding data in error messages for easier machine parsing.
Review

[NIT] The improved parse error diagnostics remove printing the entire raw response (good for privacy) and provide lengths + preview. Consider also including the original parse error's wrapped text via %v in the diagnostic so the repair attempt failure reason is clearly visible in logs.

**[NIT]** The improved parse error diagnostics remove printing the entire raw response (good for privacy) and provide lengths + preview. Consider also including the original parse error's wrapped text via %v in the diagnostic so the repair attempt failure reason is clearly visible in logs.
repaired := repairJSON(cleaned)
if err2 := json.Unmarshal([]byte(repaired), &result); err2 != nil {
return nil, fmt.Errorf("parse LLM response as JSON: %w\nRaw response: %s", err, response)
// Include diagnostic info: lengths help identify truncation
rawLen := len(response)
cleanedLen := len(cleaned)
preview := cleaned
Review

[NIT] On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics.

**[NIT]** On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics.
if len(preview) > 200 {
preview = preview[:100] + "..." + preview[len(preview)-100:]
Review

[MINOR] On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level.

**[MINOR]** On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level.
}
return nil, fmt.Errorf("parse LLM response as JSON: %w\nRaw length: %d, cleaned length: %d\nCleaned preview: %s", err, rawLen, cleanedLen, preview)
}
}