fix: retry on transient LLM response body truncation #48

Merged
aweiker merged 1 commits from fix/response-body-truncation into main 2026-05-08 02:32:37 +00:00
Owner

Summary

Fixes intermittent unexpected end of JSON input failures where the LLM response body is truncated in transit between the HAI proxy and review-bot client.

Problem

In ~6% of CI runs, json.Unmarshal fails with "unexpected end of JSON input" even though the response was valid JSON (confirmed by logging). The truncation occurs at the network level between the HAI proxy and the client.

Solution

Three layers of defense:

  1. Content-Length validation (llm/client.go): After io.ReadAll, validate that received bytes match Content-Length header. Mismatch triggers a retryable error.

  2. Transport-level retry (llm/client.go): Complete() retries once on retryable errors (body read failures, content-length mismatches) with 500ms backoff. Does NOT retry API errors.

  3. Parse-level retry (cmd/review-bot/main.go): If ParseResponse fails, re-requests from the LLM once before exiting. Defensive layer since retries always succeed per issue evidence.

Bonus: Better diagnostics (review/parser.go): Parse errors now log raw vs cleaned lengths and a content preview to aid future debugging.

Testing

  • All existing tests pass
  • New tests for retry behavior, content-length mismatch detection, and no-retry on API errors
  • TestIsRetryableError covers the error classification logic

Closes #47

## Summary Fixes intermittent `unexpected end of JSON input` failures where the LLM response body is truncated in transit between the HAI proxy and review-bot client. ## Problem In ~6% of CI runs, `json.Unmarshal` fails with "unexpected end of JSON input" even though the response was valid JSON (confirmed by logging). The truncation occurs at the network level between the HAI proxy and the client. ## Solution **Three layers of defense:** 1. **Content-Length validation** (`llm/client.go`): After `io.ReadAll`, validate that received bytes match `Content-Length` header. Mismatch triggers a retryable error. 2. **Transport-level retry** (`llm/client.go`): `Complete()` retries once on retryable errors (body read failures, content-length mismatches) with 500ms backoff. Does NOT retry API errors. 3. **Parse-level retry** (`cmd/review-bot/main.go`): If `ParseResponse` fails, re-requests from the LLM once before exiting. Defensive layer since retries always succeed per issue evidence. **Bonus: Better diagnostics** (`review/parser.go`): Parse errors now log raw vs cleaned lengths and a content preview to aid future debugging. ## Testing - All existing tests pass - New tests for retry behavior, content-length mismatch detection, and no-retry on API errors - `TestIsRetryableError` covers the error classification logic Closes #47
rodin added 1 commit 2026-05-07 07:44:47 +00:00
fix: retry on transient LLM response body truncation
CI / test (pull_request) Successful in 15s
CI / review (/openai/v1, gpt-4.1, gpt41, openai, GPT_REVIEW_TOKEN) (pull_request) Successful in 25s
CI / review (/openai/v1, gpt-4.1-mini, gpt41-mini, openai, GPT_REVIEW_TOKEN) (pull_request) Successful in 29s
CI / review (/anthropic/v1, claude-sonnet-4-6, sonnet, anthropic, SONNET_REVIEW_TOKEN) (pull_request) Successful in 49s
CI / review (/openai/v1, gpt-5, security, openai, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 50s
CI / review (/openai/v1, gpt-5, gpt, openai, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m15s
CI / review (/openai/v1, gpt-5-mini, gpt5-mini, openai, GPT_REVIEW_TOKEN) (pull_request) Successful in 52s
db479d0ff4
Addresses intermittent 'unexpected end of JSON input' failures where the
LLM response body is truncated in transit between the proxy and client.

Root cause: network-level truncation where io.ReadAll returns partial data
(observed in 3/50 CI runs through HAI proxy). The response body reading
was already using io.ReadAll correctly, but transient network issues
between the proxy and client can still cause partial reads.

Changes:
- Add Content-Length validation in doRequest: detect when fewer bytes
  arrive than the server declared, triggering a retry
- Add retry logic in Complete: retries once on retryable errors (body
  read failures, content-length mismatches) with a 500ms backoff
- Add parse-level retry in main: if ParseResponse fails, re-requests
  from the LLM once before giving up (defensive, since retries always
  succeed per issue evidence)
- Improve ParseResponse error diagnostics: log raw vs cleaned lengths
  and a preview of the cleaned content to aid future debugging

Does NOT retry on API errors (4xx/5xx) or structural issues — only
transient body read problems.

Closes #47
rodin was assigned by aweiker 2026-05-07 14:09:46 +00:00
aweiker requested changes 2026-05-08 01:41:50 +00:00
Dismissed
aweiker left a comment
Contributor

Make it retry up to 5 times. Each time wait 5 seconds between retries.

Make it retry up to 5 times. Each time wait 5 seconds between retries.
gpt-review-bot requested changes 2026-05-08 02:19:39 +00:00
gpt-review-bot left a comment
First-time contributor

Gpt41 Review

Summary

The diff introduces robust retry logic for transient LLM response truncation, adds content-length validation, retryable error detection, defensive diagnostics, and comprehensive tests. However, the CI status is failing, which requires changes before approval.

Findings

# Severity File Line Finding
1 [MAJOR] `` 0 CI is failing on multiple jobs and targets. All CI checks must pass before this change can be merged as per repository conventions.

Recommendation

REQUEST_CHANGES — While the changes introduce good reliability improvements and stronger error diagnostics, the CI failure must be addressed before proceeding. Please investigate the source of the CI failure, ensure all tests pass, and re-submit. Once CI passes, this change will be ready for approval.


Review by gpt41


Evaluated against db479d0f

# Gpt41 Review ## Summary The diff introduces robust retry logic for transient LLM response truncation, adds content-length validation, retryable error detection, defensive diagnostics, and comprehensive tests. However, the CI status is failing, which requires changes before approval. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `` | 0 | CI is failing on multiple jobs and targets. All CI checks must pass before this change can be merged as per repository conventions. | ## Recommendation **REQUEST_CHANGES** — While the changes introduce good reliability improvements and stronger error diagnostics, the CI failure must be addressed before proceeding. Please investigate the source of the CI failure, ensure all tests pass, and re-submit. Once CI passes, this change will be ready for approval. --- *Review by gpt41* <!-- review-bot:gpt41 --> --- *Evaluated against db479d0f*
gpt-review-bot requested changes 2026-05-08 02:19:43 +00:00
gpt-review-bot left a comment
First-time contributor

Gpt41-mini Review

Summary

The PR improves robustness against truncated LLM responses by adding retry logic at multiple layers and enhances diagnostics. However, the CI is failing, which requires addressing. Also, some minor idiomatic improvements and comments on error handling and retry logic could be clarified or improved.

Findings

# Severity File Line Finding
1 [MAJOR] llm/client.go 75 The Complete method introduces retry logic for transient errors detected by error string matching. However, this retry logic only retries once and does not propagate the root cause error context. Consider improving error handling to retain full context and proper wrapping.
2 [MAJOR] llm/client.go 271 The content-length validation triggers an error when the received body length is less than the Content-Length header. However, the current implementation only checks if received length is less than Content-Length, but not if it is greater, which might also indicate an error. Consider validating exact equality or reject oversize bodies.
3 [MAJOR] llm/client.go 75 The isRetryableError function relies on error message substring matching. This is brittle and may cause false positives or negatives. Consider defining typed errors or error wrapping for more reliable error classification.
4 [MINOR] cmd/review-bot/main.go 254 The retry logic for LLM completion requests wraps the entire request and parse operation with two attempts. However, the retry only happens on parse failure or LLM completion errors separately from transport-level retries inside Complete(). This may cause duplicate retries or confusing logs. Consider unifying retry approaches or clarifying retry severity.
5 [MINOR] review/parser.go 33 Improved error message from ParseResponse includes lengths and previews, aiding diagnostics. Consider adding more structured logging (e.g., separate fields) instead of embedding data in error messages for easier machine parsing.
6 [NIT] llm/client_test.go 300 The retry tests cover body truncation via premature connection close and content-length mismatch but do not explicitly test the behavior when received body is longer than declared content-length. Adding such tests could improve coverage.
7 [NIT] llm/client.go 79 In Complete(), the loop uses attempt < 2 with sleeps for retry. Consider extracting max retries and backoff as constants or configuration for flexibility and clarity.
8 [NIT] llm/client.go 124 The ParseResponse error wrapping uses %w with additional diagnostic info concatenated via `
`. For better compatibility with errors.Is / errors.As and formatting, consider wrapping once and logging additional data separately.

Recommendation

REQUEST_CHANGES — The changes address an important reliability issue with truncated LLM responses by adding retry mechanisms at multiple layers and improves diagnostics. However, the fact that CI is failing means this PR cannot be merged as-is. It is recommended to fix the CI failures first. Additionally, the error handling and retry logic should be refined to use typed or wrapped errors for more robust error classification instead of substring matching. Tests should include both content-length short and long mismatch cases to fully verify handling. Minor code style and documentation improvements will also enhance maintainability. After these changes and successful CI runs, the PR would be acceptable.


Review by gpt41-mini


Evaluated against db479d0f

# Gpt41-mini Review ## Summary The PR improves robustness against truncated LLM responses by adding retry logic at multiple layers and enhances diagnostics. However, the CI is failing, which requires addressing. Also, some minor idiomatic improvements and comments on error handling and retry logic could be clarified or improved. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `llm/client.go` | 75 | The Complete method introduces retry logic for transient errors detected by error string matching. However, this retry logic only retries once and does not propagate the root cause error context. Consider improving error handling to retain full context and proper wrapping. | | 2 | [MAJOR] | `llm/client.go` | 271 | The content-length validation triggers an error when the received body length is less than the Content-Length header. However, the current implementation only checks if received length is less than Content-Length, but not if it is greater, which might also indicate an error. Consider validating exact equality or reject oversize bodies. | | 3 | [MAJOR] | `llm/client.go` | 75 | The isRetryableError function relies on error message substring matching. This is brittle and may cause false positives or negatives. Consider defining typed errors or error wrapping for more reliable error classification. | | 4 | [MINOR] | `cmd/review-bot/main.go` | 254 | The retry logic for LLM completion requests wraps the entire request and parse operation with two attempts. However, the retry only happens on parse failure or LLM completion errors separately from transport-level retries inside Complete(). This may cause duplicate retries or confusing logs. Consider unifying retry approaches or clarifying retry severity. | | 5 | [MINOR] | `review/parser.go` | 33 | Improved error message from ParseResponse includes lengths and previews, aiding diagnostics. Consider adding more structured logging (e.g., separate fields) instead of embedding data in error messages for easier machine parsing. | | 6 | [NIT] | `llm/client_test.go` | 300 | The retry tests cover body truncation via premature connection close and content-length mismatch but do not explicitly test the behavior when received body is longer than declared content-length. Adding such tests could improve coverage. | | 7 | [NIT] | `llm/client.go` | 79 | In Complete(), the loop uses `attempt < 2` with sleeps for retry. Consider extracting max retries and backoff as constants or configuration for flexibility and clarity. | | 8 | [NIT] | `llm/client.go` | 124 | The ParseResponse error wrapping uses `%w` with additional diagnostic info concatenated via ` `. For better compatibility with errors.Is / errors.As and formatting, consider wrapping once and logging additional data separately. | ## Recommendation **REQUEST_CHANGES** — The changes address an important reliability issue with truncated LLM responses by adding retry mechanisms at multiple layers and improves diagnostics. However, the fact that CI is failing means this PR cannot be merged as-is. It is recommended to fix the CI failures first. Additionally, the error handling and retry logic should be refined to use typed or wrapped errors for more robust error classification instead of substring matching. Tests should include both content-length short and long mismatch cases to fully verify handling. Minor code style and documentation improvements will also enhance maintainability. After these changes and successful CI runs, the PR would be acceptable. --- *Review by gpt41-mini* <!-- review-bot:gpt41-mini --> --- *Evaluated against db479d0f*
@@ -254,25 +254,41 @@ func main() {
slog.Warn("context trimmed to fit budget", "trimmed", budgetResult.Trimmed)
First-time contributor

[MINOR] The retry logic for LLM completion requests wraps the entire request and parse operation with two attempts. However, the retry only happens on parse failure or LLM completion errors separately from transport-level retries inside Complete(). This may cause duplicate retries or confusing logs. Consider unifying retry approaches or clarifying retry severity.

**[MINOR]** The retry logic for LLM completion requests wraps the entire request and parse operation with two attempts. However, the retry only happens on parse failure or LLM completion errors separately from transport-level retries inside Complete(). This may cause duplicate retries or confusing logs. Consider unifying retry approaches or clarifying retry severity.
@@ -75,12 +75,52 @@ type Message struct {
// Complete sends a chat completion request and returns the assistant's response content.
First-time contributor

[MAJOR] The Complete method introduces retry logic for transient errors detected by error string matching. However, this retry logic only retries once and does not propagate the root cause error context. Consider improving error handling to retain full context and proper wrapping.

**[MAJOR]** The Complete method introduces retry logic for transient errors detected by error string matching. However, this retry logic only retries once and does not propagate the root cause error context. Consider improving error handling to retain full context and proper wrapping.
First-time contributor

[MAJOR] The isRetryableError function relies on error message substring matching. This is brittle and may cause false positives or negatives. Consider defining typed errors or error wrapping for more reliable error classification.

**[MAJOR]** The isRetryableError function relies on error message substring matching. This is brittle and may cause false positives or negatives. Consider defining typed errors or error wrapping for more reliable error classification.
@@ -81,2 +78,2 @@
default:
return c.completeOpenAI(ctx, messages)
var result string
var err error
First-time contributor

[NIT] In Complete(), the loop uses attempt < 2 with sleeps for retry. Consider extracting max retries and backoff as constants or configuration for flexibility and clarity.

**[NIT]** In Complete(), the loop uses `attempt < 2` with sleeps for retry. Consider extracting max retries and backoff as constants or configuration for flexibility and clarity.
@@ -84,1 +121,4 @@
return true
}
return false
}
First-time contributor

[NIT] The ParseResponse error wrapping uses %w with additional diagnostic info concatenated via . For better compatibility with errors.Is / errors.As and formatting, consider wrapping once and logging additional data separately.

**[NIT]** The ParseResponse error wrapping uses `%w` with additional diagnostic info concatenated via ` `. For better compatibility with errors.Is / errors.As and formatting, consider wrapping once and logging additional data separately.
@@ -231,6 +271,12 @@ func (c *Client) doRequest(req *http.Request, parse func([]byte) (string, error)
return "", fmt.Errorf("read response: %w", err)
First-time contributor

[MAJOR] The content-length validation triggers an error when the received body length is less than the Content-Length header. However, the current implementation only checks if received length is less than Content-Length, but not if it is greater, which might also indicate an error. Consider validating exact equality or reject oversize bodies.

**[MAJOR]** The content-length validation triggers an error when the received body length is less than the Content-Length header. However, the current implementation only checks if received length is less than Content-Length, but not if it is greater, which might also indicate an error. Consider validating exact equality or reject oversize bodies.
@@ -296,2 +297,4 @@
}
}
func TestComplete_RetryOnBodyReadError(t *testing.T) {
First-time contributor

[NIT] The retry tests cover body truncation via premature connection close and content-length mismatch but do not explicitly test the behavior when received body is longer than declared content-length. Adding such tests could improve coverage.

**[NIT]** The retry tests cover body truncation via premature connection close and content-length mismatch but do not explicitly test the behavior when received body is longer than declared content-length. Adding such tests could improve coverage.
@@ -33,7 +33,14 @@ func ParseResponse(response string) (*ReviewResult, error) {
// Try to repair before giving up.
First-time contributor

[MINOR] Improved error message from ParseResponse includes lengths and previews, aiding diagnostics. Consider adding more structured logging (e.g., separate fields) instead of embedding data in error messages for easier machine parsing.

**[MINOR]** Improved error message from ParseResponse includes lengths and previews, aiding diagnostics. Consider adding more structured logging (e.g., separate fields) instead of embedding data in error messages for easier machine parsing.
sonnet-review-bot requested changes 2026-05-08 02:20:03 +00:00
sonnet-review-bot left a comment
First-time contributor

Sonnet Review

Summary

CI is failing across all review jobs, which is an automatic REQUEST_CHANGES. Beyond the CI failure, the code has meaningful issues: string-matching-based error classification for retry logic is fragile (violates Go error-handling idioms), the retry loop in Complete() has an off-by-one issue where the sleep never fires on the last attempt, and there is a nil-dereference risk in main.go if both attempts fail on the parse step.

Findings

# Severity File Line Finding
1 [MAJOR] llm/client.go 86 CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality.
2 [MAJOR] llm/client.go 105 isRetryableError classifies errors by substring-matching on err.Error(). This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in doRequest will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (var errBodyTruncated = errors.New(...)) or a custom error type implementing errors.Is/errors.As, then check with errors.Is(err, errBodyTruncated). This is especially important because isRetryableError is tested by constructing errors from raw strings (fmt.Errorf("%s", tt.err)), which means the tests would still pass even if the actual error-wrapping chain changed.
3 [MAJOR] cmd/review-bot/main.go 294 result is declared as var result *review.ReviewResult before the loop. If llmClient.Complete succeeds on attempt 1 but review.ParseResponse fails on attempt 1, and then llmClient.Complete also fails on attempt 2, the code hits continue and the loop ends with result == nil. The code then falls through to slog.Info("review parsed", "verdict", result.Verdict, ...) which will panic with a nil pointer dereference. The os.Exit(1) on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate os.Exit(1), but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.
4 [MINOR] llm/client.go 96 The sleep time.Sleep(500 * time.Millisecond) is guarded by attempt == 0 && ctx.Err() == nil, but since the loop runs from attempt = 0 to attempt < 2, when attempt == 1 (the second/last iteration) fails and is retryable, control falls through to return "", err without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider if attempt < 1 or restructuring to sleep at the top of the loop for attempt > 0.
5 [MINOR] llm/client.go 271 The content-length validation cl > 0 && int64(len(body)) < cl uses resp.ContentLength which is already parsed by net/http from the Content-Length header and returns -1 when absent. The check cl > 0 is correct for this, but it is worth noting that resp.ContentLength will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.
6 [MINOR] llm/client_test.go 333 TestComplete_ContentLengthMismatch does not verify that exactly 2 attempts were made (unlike TestComplete_RetryOnBodyReadError). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding if attempts != 2 { t.Errorf(...) }.
7 [NIT] llm/client.go 86 The loop bound attempt < 2 with attempt starting at 0 is idiomatic but slightly harder to read than maxRetries = 2 with a named constant, especially now that the same retry count (2) appears in both Complete() in llm/client.go and the outer loop in cmd/review-bot/main.go. Consider a named constant to avoid magic numbers and make the retry policy explicit.

Recommendation

REQUEST_CHANGES — Do not merge. CI is failing, which is an automatic block. Additionally, before resubmitting: (1) Replace string-matching in isRetryableError with sentinel errors or a custom error type so the retry classification is robust and refactor-safe. (2) Add a nil-guard on result after the retry loop in main.go to prevent a nil pointer dereference when Complete succeeds but Parse fails on both attempts. (3) Investigate whether the production LLM proxy uses chunked transfer encoding — if so, the Content-Length validation in doRequest will never trigger and the transport-level retry layer adds no value for the stated 6% truncation failure mode. The parse-level retry in main.go may be the only effective layer.


Review by sonnet


Evaluated against db479d0f

# Sonnet Review ## Summary CI is failing across all review jobs, which is an automatic REQUEST_CHANGES. Beyond the CI failure, the code has meaningful issues: string-matching-based error classification for retry logic is fragile (violates Go error-handling idioms), the retry loop in `Complete()` has an off-by-one issue where the sleep never fires on the last attempt, and there is a nil-dereference risk in `main.go` if both attempts fail on the parse step. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `llm/client.go` | 86 | CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality. | | 2 | [MAJOR] | `llm/client.go` | 105 | `isRetryableError` classifies errors by substring-matching on `err.Error()`. This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in `doRequest` will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (`var errBodyTruncated = errors.New(...)`) or a custom error type implementing `errors.Is`/`errors.As`, then check with `errors.Is(err, errBodyTruncated)`. This is especially important because `isRetryableError` is tested by constructing errors from raw strings (`fmt.Errorf("%s", tt.err)`), which means the tests would still pass even if the actual error-wrapping chain changed. | | 3 | [MAJOR] | `cmd/review-bot/main.go` | 294 | `result` is declared as `var result *review.ReviewResult` before the loop. If `llmClient.Complete` succeeds on attempt 1 but `review.ParseResponse` fails on attempt 1, and then `llmClient.Complete` also fails on attempt 2, the code hits `continue` and the loop ends with `result == nil`. The code then falls through to `slog.Info("review parsed", "verdict", result.Verdict, ...)` which will panic with a nil pointer dereference. The `os.Exit(1)` on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate `os.Exit(1)`, but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop. | | 4 | [MINOR] | `llm/client.go` | 96 | The sleep `time.Sleep(500 * time.Millisecond)` is guarded by `attempt == 0 && ctx.Err() == nil`, but since the loop runs from `attempt = 0` to `attempt < 2`, when `attempt == 1` (the second/last iteration) fails and is retryable, control falls through to `return "", err` without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider `if attempt < 1` or restructuring to sleep at the top of the loop for `attempt > 0`. | | 5 | [MINOR] | `llm/client.go` | 271 | The content-length validation `cl > 0 && int64(len(body)) < cl` uses `resp.ContentLength` which is already parsed by `net/http` from the `Content-Length` header and returns -1 when absent. The check `cl > 0` is correct for this, but it is worth noting that `resp.ContentLength` will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem. | | 6 | [MINOR] | `llm/client_test.go` | 333 | `TestComplete_ContentLengthMismatch` does not verify that exactly 2 attempts were made (unlike `TestComplete_RetryOnBodyReadError`). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding `if attempts != 2 { t.Errorf(...) }`. | | 7 | [NIT] | `llm/client.go` | 86 | The loop bound `attempt < 2` with `attempt` starting at 0 is idiomatic but slightly harder to read than `maxRetries = 2` with a named constant, especially now that the same retry count (2) appears in both `Complete()` in `llm/client.go` and the outer loop in `cmd/review-bot/main.go`. Consider a named constant to avoid magic numbers and make the retry policy explicit. | ## Recommendation **REQUEST_CHANGES** — Do not merge. CI is failing, which is an automatic block. Additionally, before resubmitting: (1) Replace string-matching in `isRetryableError` with sentinel errors or a custom error type so the retry classification is robust and refactor-safe. (2) Add a nil-guard on `result` after the retry loop in `main.go` to prevent a nil pointer dereference when Complete succeeds but Parse fails on both attempts. (3) Investigate whether the production LLM proxy uses chunked transfer encoding — if so, the Content-Length validation in `doRequest` will never trigger and the transport-level retry layer adds no value for the stated 6% truncation failure mode. The parse-level retry in `main.go` may be the only effective layer. --- *Review by sonnet* <!-- review-bot:sonnet --> --- *Evaluated against db479d0f*
@@ -276,3 +291,4 @@
break
}
slog.Info("review parsed", "verdict", result.Verdict, "findings", len(result.Findings))
First-time contributor

[MAJOR] result is declared as var result *review.ReviewResult before the loop. If llmClient.Complete succeeds on attempt 1 but review.ParseResponse fails on attempt 1, and then llmClient.Complete also fails on attempt 2, the code hits continue and the loop ends with result == nil. The code then falls through to slog.Info("review parsed", "verdict", result.Verdict, ...) which will panic with a nil pointer dereference. The os.Exit(1) on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate os.Exit(1), but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.

**[MAJOR]** `result` is declared as `var result *review.ReviewResult` before the loop. If `llmClient.Complete` succeeds on attempt 1 but `review.ParseResponse` fails on attempt 1, and then `llmClient.Complete` also fails on attempt 2, the code hits `continue` and the loop ends with `result == nil`. The code then falls through to `slog.Info("review parsed", "verdict", result.Verdict, ...)` which will panic with a nil pointer dereference. The `os.Exit(1)` on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate `os.Exit(1)`, but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.
@@ -83,0 +83,4 @@
case ProviderAnthropic:
result, err = c.completeAnthropic(ctx, messages)
default:
result, err = c.completeOpenAI(ctx, messages)
First-time contributor

[MAJOR] CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality.

**[MAJOR]** CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality.
First-time contributor

[NIT] The loop bound attempt < 2 with attempt starting at 0 is idiomatic but slightly harder to read than maxRetries = 2 with a named constant, especially now that the same retry count (2) appears in both Complete() in llm/client.go and the outer loop in cmd/review-bot/main.go. Consider a named constant to avoid magic numbers and make the retry policy explicit.

**[NIT]** The loop bound `attempt < 2` with `attempt` starting at 0 is idiomatic but slightly harder to read than `maxRetries = 2` with a named constant, especially now that the same retry count (2) appears in both `Complete()` in `llm/client.go` and the outer loop in `cmd/review-bot/main.go`. Consider a named constant to avoid magic numbers and make the retry policy explicit.
@@ -83,0 +93,4 @@
// Only retry on response body read errors (transient network issues).
// Do not retry on context cancellation, status errors, or parse errors
// that indicate a structural API problem.
if !isRetryableError(err) {
First-time contributor

[MINOR] The sleep time.Sleep(500 * time.Millisecond) is guarded by attempt == 0 && ctx.Err() == nil, but since the loop runs from attempt = 0 to attempt < 2, when attempt == 1 (the second/last iteration) fails and is retryable, control falls through to return "", err without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider if attempt < 1 or restructuring to sleep at the top of the loop for attempt > 0.

**[MINOR]** The sleep `time.Sleep(500 * time.Millisecond)` is guarded by `attempt == 0 && ctx.Err() == nil`, but since the loop runs from `attempt = 0` to `attempt < 2`, when `attempt == 1` (the second/last iteration) fails and is retryable, control falls through to `return "", err` without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider `if attempt < 1` or restructuring to sleep at the top of the loop for `attempt > 0`.
@@ -83,1 +102,4 @@
time.Sleep(500 * time.Millisecond)
}
}
First-time contributor

[MAJOR] isRetryableError classifies errors by substring-matching on err.Error(). This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in doRequest will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (var errBodyTruncated = errors.New(...)) or a custom error type implementing errors.Is/errors.As, then check with errors.Is(err, errBodyTruncated). This is especially important because isRetryableError is tested by constructing errors from raw strings (fmt.Errorf("%s", tt.err)), which means the tests would still pass even if the actual error-wrapping chain changed.

**[MAJOR]** `isRetryableError` classifies errors by substring-matching on `err.Error()`. This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in `doRequest` will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (`var errBodyTruncated = errors.New(...)`) or a custom error type implementing `errors.Is`/`errors.As`, then check with `errors.Is(err, errBodyTruncated)`. This is especially important because `isRetryableError` is tested by constructing errors from raw strings (`fmt.Errorf("%s", tt.err)`), which means the tests would still pass even if the actual error-wrapping chain changed.
@@ -231,6 +271,12 @@ func (c *Client) doRequest(req *http.Request, parse func([]byte) (string, error)
return "", fmt.Errorf("read response: %w", err)
First-time contributor

[MINOR] The content-length validation cl > 0 && int64(len(body)) < cl uses resp.ContentLength which is already parsed by net/http from the Content-Length header and returns -1 when absent. The check cl > 0 is correct for this, but it is worth noting that resp.ContentLength will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.

**[MINOR]** The content-length validation `cl > 0 && int64(len(body)) < cl` uses `resp.ContentLength` which is already parsed by `net/http` from the `Content-Length` header and returns -1 when absent. The check `cl > 0` is correct for this, but it is worth noting that `resp.ContentLength` will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.
@@ -298,0 +330,4 @@
if err != nil {
t.Fatalf("expected retry to succeed, got error: %v", err)
}
if got != "success" {
First-time contributor

[MINOR] TestComplete_ContentLengthMismatch does not verify that exactly 2 attempts were made (unlike TestComplete_RetryOnBodyReadError). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding if attempts != 2 { t.Errorf(...) }.

**[MINOR]** `TestComplete_ContentLengthMismatch` does not verify that exactly 2 attempts were made (unlike `TestComplete_RetryOnBodyReadError`). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding `if attempts != 2 { t.Errorf(...) }`.
security-review-bot requested review from security-review-bot 2026-05-08 02:20:05 +00:00
security-review-bot requested changes 2026-05-08 02:20:05 +00:00
security-review-bot left a comment
Collaborator

Security Review

Summary

The changes add robust retry logic and diagnostics; however, CI is failing and must be addressed before merge. From a security perspective, one minor concern is increased logging of LLM response previews which could leak content into logs.

Findings

# Severity File Line Finding
1 [MAJOR] cmd/review-bot/main.go 254 CI is failing across multiple review jobs; per process rules, this requires addressing before merge.
2 [MINOR] review/parser.go 41 On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level.

Recommendation

REQUEST_CHANGES — Please fix the failing CI jobs before merging. Functionally, the retry and Content-Length validation are reasonable mitigations for transient truncation and do not introduce obvious security vulnerabilities. For defense-in-depth, consider limiting or redacting the 'Cleaned preview' included in error messages to avoid exposing LLM response content in logs, or emit it only at debug level.


Review by security


Evaluated against db479d0f

# Security Review ## Summary The changes add robust retry logic and diagnostics; however, CI is failing and must be addressed before merge. From a security perspective, one minor concern is increased logging of LLM response previews which could leak content into logs. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `cmd/review-bot/main.go` | 254 | CI is failing across multiple review jobs; per process rules, this requires addressing before merge. | | 2 | [MINOR] | `review/parser.go` | 41 | On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level. | ## Recommendation **REQUEST_CHANGES** — Please fix the failing CI jobs before merging. Functionally, the retry and Content-Length validation are reasonable mitigations for transient truncation and do not introduce obvious security vulnerabilities. For defense-in-depth, consider limiting or redacting the 'Cleaned preview' included in error messages to avoid exposing LLM response content in logs, or emit it only at debug level. --- *Review by security* <!-- review-bot:security --> --- *Evaluated against db479d0f*
@@ -254,25 +254,41 @@ func main() {
slog.Warn("context trimmed to fit budget", "trimmed", budgetResult.Trimmed)
Collaborator

[MAJOR] CI is failing across multiple review jobs; per process rules, this requires addressing before merge.

**[MAJOR]** CI is failing across multiple review jobs; per process rules, this requires addressing before merge.
@@ -37,0 +38,4 @@
cleanedLen := len(cleaned)
preview := cleaned
if len(preview) > 200 {
preview = preview[:100] + "..." + preview[len(preview)-100:]
Collaborator

[MINOR] On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level.

**[MINOR]** On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level.
gpt-review-bot requested changes 2026-05-08 02:20:29 +00:00
gpt-review-bot left a comment
First-time contributor

Gpt Review

Summary

The changes add sensible retry logic for transient body read/truncation and improve diagnostics. However, CI is currently failing, which mandates requesting changes. There are also a few minor issues around retry messaging, context-aware sleeping, and brittle error classification.

Findings

# Severity File Line Finding
1 [MAJOR] CI 0 CI is failing across multiple review jobs (OpenAI/Anthropic review checks). Per process, failing CI requires changes before merge.
2 [MINOR] cmd/review-bot/main.go 264 The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done.
3 [NIT] cmd/review-bot/main.go 256 Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity.
4 [MINOR] llm/client.go 100 isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is).
5 [NIT] llm/client.go 275 The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path.
6 [NIT] review/parser.go 39 On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics.

Recommendation

REQUEST_CHANGES — Because CI is failing, please address the CI issues before merging. The core approach (retrying on transient body read failures and validating Content-Length) looks sound and is covered by tests. To improve robustness and clarity: 1) In cmd/review-bot/main.go, avoid sleeping if the context is canceled before retry, and consider a more generic retry log message. 2) In llm/client.go, replace substring-based retry classification with sentinel errors and errors.Is checks (e.g., define ErrBodyLengthMismatch and wrap it via fmt.Errorf("%w: ...", ErrBodyLengthMismatch)). Continue retrying read failures; consider documenting that Content-Length mismatch detection may rarely be hit due to io.ReadAll behavior. 3) In review/parser.go, include the repaired unmarshal error (err2) in the returned diagnostics for more actionable insight. Once CI is green and these minor improvements are addressed, the PR will be in good shape.


Review by gpt


Evaluated against db479d0f

# Gpt Review ## Summary The changes add sensible retry logic for transient body read/truncation and improve diagnostics. However, CI is currently failing, which mandates requesting changes. There are also a few minor issues around retry messaging, context-aware sleeping, and brittle error classification. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `CI` | 0 | CI is failing across multiple review jobs (OpenAI/Anthropic review checks). Per process, failing CI requires changes before merge. | | 2 | [MINOR] | `cmd/review-bot/main.go` | 264 | The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done. | | 3 | [NIT] | `cmd/review-bot/main.go` | 256 | Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity. | | 4 | [MINOR] | `llm/client.go` | 100 | isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is). | | 5 | [NIT] | `llm/client.go` | 275 | The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path. | | 6 | [NIT] | `review/parser.go` | 39 | On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics. | ## Recommendation **REQUEST_CHANGES** — Because CI is failing, please address the CI issues before merging. The core approach (retrying on transient body read failures and validating Content-Length) looks sound and is covered by tests. To improve robustness and clarity: 1) In cmd/review-bot/main.go, avoid sleeping if the context is canceled before retry, and consider a more generic retry log message. 2) In llm/client.go, replace substring-based retry classification with sentinel errors and errors.Is checks (e.g., define ErrBodyLengthMismatch and wrap it via fmt.Errorf("%w: ...", ErrBodyLengthMismatch)). Continue retrying read failures; consider documenting that Content-Length mismatch detection may rarely be hit due to io.ReadAll behavior. 3) In review/parser.go, include the repaired unmarshal error (err2) in the returned diagnostics for more actionable insight. Once CI is green and these minor improvements are addressed, the PR will be in good shape. --- *Review by gpt* <!-- review-bot:gpt --> --- *Evaluated against db479d0f*
@@ -254,25 +254,41 @@ func main() {
slog.Warn("context trimmed to fit budget", "trimmed", budgetResult.Trimmed)
}
First-time contributor

[NIT] Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity.

**[NIT]** Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity.
@@ -267,3 +264,1 @@
os.Exit(1)
}
slog.Info("LLM response received", "bytes", len(response))
var response string
First-time contributor

[MINOR] The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done.

**[MINOR]** The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done.
@@ -83,0 +97,4 @@
return "", err
}
if attempt == 0 && ctx.Err() == nil {
First-time contributor

[MINOR] isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is).

**[MINOR]** isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is).
@@ -232,2 +272,4 @@
}
// Validate body length against Content-Length header when present.
// A mismatch indicates the response was truncated in transit.
First-time contributor

[NIT] The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path.

**[NIT]** The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path.
@@ -37,0 +36,4 @@
// Include diagnostic info: lengths help identify truncation
rawLen := len(response)
cleanedLen := len(cleaned)
preview := cleaned
First-time contributor

[NIT] On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics.

**[NIT]** On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics.
gpt-review-bot requested changes 2026-05-08 02:20:32 +00:00
gpt-review-bot left a comment
First-time contributor

Gpt5-mini Review

Summary

The retry-on-truncation approach is a sound defense-in-depth idea, and tests were added to exercise it; however CI is failing and there is a correctness/regression risk where API errors could be retried due to the Content-Length check ordering. Also the retry detection uses brittle string matching instead of wrapped/sentinel errors.

Findings

# Severity File Line Finding
1 [MAJOR] CI 0 The PR CI is failing (multiple review jobs report 'Failing after ~15-17s'). Per repository policy, CI must be green — this blocks approval until investigated and fixed.
2 [MAJOR] llm/client.go 271 doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors.
3 [MINOR] llm/client.go 75 isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As.
4 [MINOR] llm/client_test.go 296 A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried.
5 [NIT] review/parser.go 33 The improved parse error diagnostics remove printing the entire raw response (good for privacy) and provide lengths + preview. Consider also including the original parse error's wrapped text via %v in the diagnostic so the repair attempt failure reason is clearly visible in logs.

Recommendation

REQUEST_CHANGES — Fix the CI failures first and re-run tests. Then address the retry semantics in llm/client.go: either (A) move the HTTP status-code check before the Content-Length validation so non-2xx API errors are returned as API errors (and not classified as retryable), or (B) ensure doRequest returns an error class that distinguishes 'API error' vs 'transient transport/truncation' (for example, return a typed or sentinel error for body-length mismatch). Replace fragile substring matching in isRetryableError with checks against wrapped sentinel/typed errors (errors.Is / errors.As) or explicit error types so retry decisions are robust. Add a unit test that verifies a non-2xx response with a mismatched Content-Length does not trigger retries. After making these changes and confirming tests/CI pass, the PR can be approved.


Review by gpt5-mini


Evaluated against db479d0f

# Gpt5-mini Review ## Summary The retry-on-truncation approach is a sound defense-in-depth idea, and tests were added to exercise it; however CI is failing and there is a correctness/regression risk where API errors could be retried due to the Content-Length check ordering. Also the retry detection uses brittle string matching instead of wrapped/sentinel errors. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `CI` | 0 | The PR CI is failing (multiple review jobs report 'Failing after ~15-17s'). Per repository policy, CI must be green — this blocks approval until investigated and fixed. | | 2 | [MAJOR] | `llm/client.go` | 271 | doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors. | | 3 | [MINOR] | `llm/client.go` | 75 | isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As. | | 4 | [MINOR] | `llm/client_test.go` | 296 | A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried. | | 5 | [NIT] | `review/parser.go` | 33 | The improved parse error diagnostics remove printing the entire raw response (good for privacy) and provide lengths + preview. Consider also including the original parse error's wrapped text via %v in the diagnostic so the repair attempt failure reason is clearly visible in logs. | ## Recommendation **REQUEST_CHANGES** — Fix the CI failures first and re-run tests. Then address the retry semantics in llm/client.go: either (A) move the HTTP status-code check before the Content-Length validation so non-2xx API errors are returned as API errors (and not classified as retryable), or (B) ensure doRequest returns an error class that distinguishes 'API error' vs 'transient transport/truncation' (for example, return a typed or sentinel error for body-length mismatch). Replace fragile substring matching in isRetryableError with checks against wrapped sentinel/typed errors (errors.Is / errors.As) or explicit error types so retry decisions are robust. Add a unit test that verifies a non-2xx response with a mismatched Content-Length does not trigger retries. After making these changes and confirming tests/CI pass, the PR can be approved. --- *Review by gpt5-mini* <!-- review-bot:gpt5-mini --> --- *Evaluated against db479d0f*
@@ -75,12 +75,52 @@ type Message struct {
// Complete sends a chat completion request and returns the assistant's response content.
First-time contributor

[MINOR] isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As.

**[MINOR]** isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As.
@@ -231,6 +271,12 @@ func (c *Client) doRequest(req *http.Request, parse func([]byte) (string, error)
return "", fmt.Errorf("read response: %w", err)
First-time contributor

[MAJOR] doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors.

**[MAJOR]** doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors.
@@ -295,3 +296,131 @@ func TestWithProvider(t *testing.T) {
t.Errorf("expected provider anthropic, got %s", client.provider)
First-time contributor

[MINOR] A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried.

**[MINOR]** A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried.
@@ -33,7 +33,14 @@ func ParseResponse(response string) (*ReviewResult, error) {
// Try to repair before giving up.
First-time contributor

[NIT] The improved parse error diagnostics remove printing the entire raw response (good for privacy) and provide lengths + preview. Consider also including the original parse error's wrapped text via %v in the diagnostic so the repair attempt failure reason is clearly visible in logs.

**[NIT]** The improved parse error diagnostics remove printing the entire raw response (good for privacy) and provide lengths + preview. Consider also including the original parse error's wrapped text via %v in the diagnostic so the repair attempt failure reason is clearly visible in logs.
aweiker approved these changes 2026-05-08 02:32:33 +00:00
aweiker merged commit 2089ca0f2d into main 2026-05-08 02:32:37 +00:00
Sign in to join this conversation.
5 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: rodin/review-bot#48