fix: retry on transient LLM response body truncation #48

First-time contributor

[MINOR] The retry logic for LLM completion requests wraps the entire request and parse operation with two attempts. However, the retry only happens on parse failure or LLM completion errors separately from transport-level retries inside Complete(). This may cause duplicate retries or confusing logs. Consider unifying retry approaches or clarifying retry severity.

**[MINOR]** The retry logic for LLM completion requests wraps the entire request and parse operation with two attempts. However, the retry only happens on parse failure or LLM completion errors separately from transport-level retries inside Complete(). This may cause duplicate retries or confusing logs. Consider unifying retry approaches or clarifying retry severity.

llm/client.go

						
				@@ -75,12 +75,52 @@ type Message struct {

				// Complete sends a chat completion request and returns the assistant's response content.

gpt-review-bot commented

First-time contributor

[MAJOR] The Complete method introduces retry logic for transient errors detected by error string matching. However, this retry logic only retries once and does not propagate the root cause error context. Consider improving error handling to retain full context and proper wrapping.

**[MAJOR]** The Complete method introduces retry logic for transient errors detected by error string matching. However, this retry logic only retries once and does not propagate the root cause error context. Consider improving error handling to retain full context and proper wrapping.

gpt-review-bot commented

First-time contributor

[MAJOR] The isRetryableError function relies on error message substring matching. This is brittle and may cause false positives or negatives. Consider defining typed errors or error wrapping for more reliable error classification.

**[MAJOR]** The isRetryableError function relies on error message substring matching. This is brittle and may cause false positives or negatives. Consider defining typed errors or error wrapping for more reliable error classification.

llm/client.go

						
				@@ -81,2 +78,2 @@

					default:

						return c.completeOpenAI(ctx, messages)

					var result string

					var err error

gpt-review-bot commented

First-time contributor

[NIT] In Complete(), the loop uses attempt < 2 with sleeps for retry. Consider extracting max retries and backoff as constants or configuration for flexibility and clarity.

**[NIT]** In Complete(), the loop uses `attempt < 2` with sleeps for retry. Consider extracting max retries and backoff as constants or configuration for flexibility and clarity.

llm/client.go

						
				@@ -84,1 +121,4 @@

						return true

					}

					return false

				}

gpt-review-bot commented

First-time contributor

[NIT] The ParseResponse error wrapping uses %w with additional diagnostic info concatenated via . For better compatibility with errors.Is / errors.As and formatting, consider wrapping once and logging additional data separately.

**[NIT]** The ParseResponse error wrapping uses `%w` with additional diagnostic info concatenated via ` `. For better compatibility with errors.Is / errors.As and formatting, consider wrapping once and logging additional data separately.

llm/client.go

						
				@@ -231,6 +271,12 @@ func (c *Client) doRequest(req *http.Request, parse func([]byte) (string, error)

						return "", fmt.Errorf("read response: %w", err)

gpt-review-bot commented

First-time contributor

[MAJOR] The content-length validation triggers an error when the received body length is less than the Content-Length header. However, the current implementation only checks if received length is less than Content-Length, but not if it is greater, which might also indicate an error. Consider validating exact equality or reject oversize bodies.

**[MAJOR]** The content-length validation triggers an error when the received body length is less than the Content-Length header. However, the current implementation only checks if received length is less than Content-Length, but not if it is greater, which might also indicate an error. Consider validating exact equality or reject oversize bodies.

llm/client_test.go

						
				@@ -296,2 +297,4 @@

					}

				}

				func TestComplete_RetryOnBodyReadError(t *testing.T) {

gpt-review-bot commented

First-time contributor

[NIT] The retry tests cover body truncation via premature connection close and content-length mismatch but do not explicitly test the behavior when received body is longer than declared content-length. Adding such tests could improve coverage.

**[NIT]** The retry tests cover body truncation via premature connection close and content-length mismatch but do not explicitly test the behavior when received body is longer than declared content-length. Adding such tests could improve coverage.

review/parser.go

						
				@@ -33,7 +33,14 @@ func ParseResponse(response string) (*ReviewResult, error) {

						// Try to repair before giving up.

gpt-review-bot commented

First-time contributor

[MINOR] Improved error message from ParseResponse includes lengths and previews, aiding diagnostics. Consider adding more structured logging (e.g., separate fields) instead of embedding data in error messages for easier machine parsing.

**[MINOR]** Improved error message from ParseResponse includes lengths and previews, aiding diagnostics. Consider adding more structured logging (e.g., separate fields) instead of embedding data in error messages for easier machine parsing.

sonnet-review-bot requested changes 2026-05-08 02:20:03 +00:00

sonnet-review-bot left a comment

First-time contributor

Sonnet Review

Summary

CI is failing across all review jobs, which is an automatic REQUEST_CHANGES. Beyond the CI failure, the code has meaningful issues: string-matching-based error classification for retry logic is fragile (violates Go error-handling idioms), the retry loop in Complete() has an off-by-one issue where the sleep never fires on the last attempt, and there is a nil-dereference risk in main.go if both attempts fail on the parse step.

Findings

#	Severity	File	Line	Finding
1	[MAJOR]	`llm/client.go`	86	CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality.
2	[MAJOR]	`llm/client.go`	105	`isRetryableError` classifies errors by substring-matching on `err.Error()`. This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in `doRequest` will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (`var errBodyTruncated = errors.New(...)`) or a custom error type implementing `errors.Is`/`errors.As`, then check with `errors.Is(err, errBodyTruncated)`. This is especially important because `isRetryableError` is tested by constructing errors from raw strings (`fmt.Errorf("%s", tt.err)`), which means the tests would still pass even if the actual error-wrapping chain changed.
3	[MAJOR]	`cmd/review-bot/main.go`	294	`result` is declared as `var result *review.ReviewResult` before the loop. If `llmClient.Complete` succeeds on attempt 1 but `review.ParseResponse` fails on attempt 1, and then `llmClient.Complete` also fails on attempt 2, the code hits `continue` and the loop ends with `result == nil`. The code then falls through to `slog.Info("review parsed", "verdict", result.Verdict, ...)` which will panic with a nil pointer dereference. The `os.Exit(1)` on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate `os.Exit(1)`, but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.
4	[MINOR]	`llm/client.go`	96	The sleep `time.Sleep(500 * time.Millisecond)` is guarded by `attempt == 0 && ctx.Err() == nil`, but since the loop runs from `attempt = 0` to `attempt < 2`, when `attempt == 1` (the second/last iteration) fails and is retryable, control falls through to `return "", err` without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider `if attempt < 1` or restructuring to sleep at the top of the loop for `attempt > 0`.
5	[MINOR]	`llm/client.go`	271	The content-length validation `cl > 0 && int64(len(body)) < cl` uses `resp.ContentLength` which is already parsed by `net/http` from the `Content-Length` header and returns -1 when absent. The check `cl > 0` is correct for this, but it is worth noting that `resp.ContentLength` will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.
6	[MINOR]	`llm/client_test.go`	333	`TestComplete_ContentLengthMismatch` does not verify that exactly 2 attempts were made (unlike `TestComplete_RetryOnBodyReadError`). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding `if attempts != 2 { t.Errorf(...) }`.
7	[NIT]	`llm/client.go`	86	The loop bound `attempt < 2` with `attempt` starting at 0 is idiomatic but slightly harder to read than `maxRetries = 2` with a named constant, especially now that the same retry count (2) appears in both `Complete()` in `llm/client.go` and the outer loop in `cmd/review-bot/main.go`. Consider a named constant to avoid magic numbers and make the retry policy explicit.

Recommendation

REQUEST_CHANGES — Do not merge. CI is failing, which is an automatic block. Additionally, before resubmitting: (1) Replace string-matching in isRetryableError with sentinel errors or a custom error type so the retry classification is robust and refactor-safe. (2) Add a nil-guard on result after the retry loop in main.go to prevent a nil pointer dereference when Complete succeeds but Parse fails on both attempts. (3) Investigate whether the production LLM proxy uses chunked transfer encoding — if so, the Content-Length validation in doRequest will never trigger and the transport-level retry layer adds no value for the stated 6% truncation failure mode. The parse-level retry in main.go may be the only effective layer.

Review by sonnet

Evaluated against db479d0f

# Sonnet Review ## Summary CI is failing across all review jobs, which is an automatic REQUEST_CHANGES. Beyond the CI failure, the code has meaningful issues: string-matching-based error classification for retry logic is fragile (violates Go error-handling idioms), the retry loop in `Complete()` has an off-by-one issue where the sleep never fires on the last attempt, and there is a nil-dereference risk in `main.go` if both attempts fail on the parse step. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `llm/client.go` | 86 | CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality. | | 2 | [MAJOR] | `llm/client.go` | 105 | `isRetryableError` classifies errors by substring-matching on `err.Error()`. This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in `doRequest` will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (`var errBodyTruncated = errors.New(...)`) or a custom error type implementing `errors.Is`/`errors.As`, then check with `errors.Is(err, errBodyTruncated)`. This is especially important because `isRetryableError` is tested by constructing errors from raw strings (`fmt.Errorf("%s", tt.err)`), which means the tests would still pass even if the actual error-wrapping chain changed. | | 3 | [MAJOR] | `cmd/review-bot/main.go` | 294 | `result` is declared as `var result *review.ReviewResult` before the loop. If `llmClient.Complete` succeeds on attempt 1 but `review.ParseResponse` fails on attempt 1, and then `llmClient.Complete` also fails on attempt 2, the code hits `continue` and the loop ends with `result == nil`. The code then falls through to `slog.Info("review parsed", "verdict", result.Verdict, ...)` which will panic with a nil pointer dereference. The `os.Exit(1)` on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate `os.Exit(1)`, but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop. | | 4 | [MINOR] | `llm/client.go` | 96 | The sleep `time.Sleep(500 * time.Millisecond)` is guarded by `attempt == 0 && ctx.Err() == nil`, but since the loop runs from `attempt = 0` to `attempt < 2`, when `attempt == 1` (the second/last iteration) fails and is retryable, control falls through to `return "", err` without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider `if attempt < 1` or restructuring to sleep at the top of the loop for `attempt > 0`. | | 5 | [MINOR] | `llm/client.go` | 271 | The content-length validation `cl > 0 && int64(len(body)) < cl` uses `resp.ContentLength` which is already parsed by `net/http` from the `Content-Length` header and returns -1 when absent. The check `cl > 0` is correct for this, but it is worth noting that `resp.ContentLength` will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem. | | 6 | [MINOR] | `llm/client_test.go` | 333 | `TestComplete_ContentLengthMismatch` does not verify that exactly 2 attempts were made (unlike `TestComplete_RetryOnBodyReadError`). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding `if attempts != 2 { t.Errorf(...) }`. | | 7 | [NIT] | `llm/client.go` | 86 | The loop bound `attempt < 2` with `attempt` starting at 0 is idiomatic but slightly harder to read than `maxRetries = 2` with a named constant, especially now that the same retry count (2) appears in both `Complete()` in `llm/client.go` and the outer loop in `cmd/review-bot/main.go`. Consider a named constant to avoid magic numbers and make the retry policy explicit. | ## Recommendation **REQUEST_CHANGES** — Do not merge. CI is failing, which is an automatic block. Additionally, before resubmitting: (1) Replace string-matching in `isRetryableError` with sentinel errors or a custom error type so the retry classification is robust and refactor-safe. (2) Add a nil-guard on `result` after the retry loop in `main.go` to prevent a nil pointer dereference when Complete succeeds but Parse fails on both attempts. (3) Investigate whether the production LLM proxy uses chunked transfer encoding — if so, the Content-Length validation in `doRequest` will never trigger and the transport-level retry layer adds no value for the stated 6% truncation failure mode. The parse-level retry in `main.go` may be the only effective layer. --- *Review by sonnet*  --- *Evaluated against db479d0f*

cmd/review-bot/main.go

						
				@@ -276,3 +291,4 @@

						break

					}

					slog.Info("review parsed", "verdict", result.Verdict, "findings", len(result.Findings))

sonnet-review-bot commented

First-time contributor

[MAJOR] result is declared as var result *review.ReviewResult before the loop. If llmClient.Complete succeeds on attempt 1 but review.ParseResponse fails on attempt 1, and then llmClient.Complete also fails on attempt 2, the code hits continue and the loop ends with result == nil. The code then falls through to slog.Info("review parsed", "verdict", result.Verdict, ...) which will panic with a nil pointer dereference. The os.Exit(1) on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate os.Exit(1), but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.

**[MAJOR]** `result` is declared as `var result *review.ReviewResult` before the loop. If `llmClient.Complete` succeeds on attempt 1 but `review.ParseResponse` fails on attempt 1, and then `llmClient.Complete` also fails on attempt 2, the code hits `continue` and the loop ends with `result == nil`. The code then falls through to `slog.Info("review parsed", "verdict", result.Verdict, ...)` which will panic with a nil pointer dereference. The `os.Exit(1)` on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate `os.Exit(1)`, but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.

llm/client.go

						
				@@ -83,0 +83,4 @@

						case ProviderAnthropic:

							result, err = c.completeAnthropic(ctx, messages)

						default:

							result, err = c.completeOpenAI(ctx, messages)

sonnet-review-bot commented

First-time contributor

[MAJOR] CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality.

**[MAJOR]** CI is failing for all six review jobs. The verdict must be REQUEST_CHANGES regardless of code quality.

sonnet-review-bot commented

First-time contributor

[NIT] The loop bound attempt < 2 with attempt starting at 0 is idiomatic but slightly harder to read than maxRetries = 2 with a named constant, especially now that the same retry count (2) appears in both Complete() in llm/client.go and the outer loop in cmd/review-bot/main.go. Consider a named constant to avoid magic numbers and make the retry policy explicit.

**[NIT]** The loop bound `attempt < 2` with `attempt` starting at 0 is idiomatic but slightly harder to read than `maxRetries = 2` with a named constant, especially now that the same retry count (2) appears in both `Complete()` in `llm/client.go` and the outer loop in `cmd/review-bot/main.go`. Consider a named constant to avoid magic numbers and make the retry policy explicit.

llm/client.go

						
				@@ -83,0 +93,4 @@

						// Only retry on response body read errors (transient network issues).

						// Do not retry on context cancellation, status errors, or parse errors

						// that indicate a structural API problem.

						if !isRetryableError(err) {

sonnet-review-bot commented

First-time contributor

[MINOR] The sleep time.Sleep(500 * time.Millisecond) is guarded by attempt == 0 && ctx.Err() == nil, but since the loop runs from attempt = 0 to attempt < 2, when attempt == 1 (the second/last iteration) fails and is retryable, control falls through to return "", err without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider if attempt < 1 or restructuring to sleep at the top of the loop for attempt > 0.

**[MINOR]** The sleep `time.Sleep(500 * time.Millisecond)` is guarded by `attempt == 0 && ctx.Err() == nil`, but since the loop runs from `attempt = 0` to `attempt < 2`, when `attempt == 1` (the second/last iteration) fails and is retryable, control falls through to `return "", err` without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider `if attempt < 1` or restructuring to sleep at the top of the loop for `attempt > 0`.

llm/client.go

						
				@@ -83,1 +102,4 @@

							time.Sleep(500 * time.Millisecond)

						}

					}

sonnet-review-bot commented

First-time contributor

[MAJOR] isRetryableError classifies errors by substring-matching on err.Error(). This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in doRequest will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (var errBodyTruncated = errors.New(...)) or a custom error type implementing errors.Is/errors.As, then check with errors.Is(err, errBodyTruncated). This is especially important because isRetryableError is tested by constructing errors from raw strings (fmt.Errorf("%s", tt.err)), which means the tests would still pass even if the actual error-wrapping chain changed.

**[MAJOR]** `isRetryableError` classifies errors by substring-matching on `err.Error()`. This is fragile: any future error message that happens to contain 'read response' will be silently retried, and a refactor of the error message strings in `doRequest` will silently break the retry logic without a compile-time signal. The idiomatic Go approach is to use sentinel errors (`var errBodyTruncated = errors.New(...)`) or a custom error type implementing `errors.Is`/`errors.As`, then check with `errors.Is(err, errBodyTruncated)`. This is especially important because `isRetryableError` is tested by constructing errors from raw strings (`fmt.Errorf("%s", tt.err)`), which means the tests would still pass even if the actual error-wrapping chain changed.

llm/client.go

						
				@@ -231,6 +271,12 @@ func (c *Client) doRequest(req *http.Request, parse func([]byte) (string, error)

						return "", fmt.Errorf("read response: %w", err)

sonnet-review-bot commented

First-time contributor

[MINOR] The content-length validation cl > 0 && int64(len(body)) < cl uses resp.ContentLength which is already parsed by net/http from the Content-Length header and returns -1 when absent. The check cl > 0 is correct for this, but it is worth noting that resp.ContentLength will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.

**[MINOR]** The content-length validation `cl > 0 && int64(len(body)) < cl` uses `resp.ContentLength` which is already parsed by `net/http` from the `Content-Length` header and returns -1 when absent. The check `cl > 0` is correct for this, but it is worth noting that `resp.ContentLength` will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.

llm/client_test.go

						
				@@ -298,0 +330,4 @@

					if err != nil {

						t.Fatalf("expected retry to succeed, got error: %v", err)

					}

					if got != "success" {

sonnet-review-bot commented

2026-05-08 02:20:05 +00:00

First-time contributor

[MINOR] TestComplete_ContentLengthMismatch does not verify that exactly 2 attempts were made (unlike TestComplete_RetryOnBodyReadError). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding if attempts != 2 { t.Errorf(...) }.

**[MINOR]** `TestComplete_ContentLengthMismatch` does not verify that exactly 2 attempts were made (unlike `TestComplete_RetryOnBodyReadError`). Without this check, the test would pass even if retry was not triggered (e.g., if the httptest server ignores the Content-Length override). Consider adding `if attempts != 2 { t.Errorf(...) }`.

security-review-bot requested review from security-review-bot 2026-05-08 02:20:05 +00:00

security-review-bot requested changes 2026-05-08 02:20:05 +00:00

security-review-bot left a comment

Security Review

Summary

The changes add robust retry logic and diagnostics; however, CI is failing and must be addressed before merge. From a security perspective, one minor concern is increased logging of LLM response previews which could leak content into logs.

Findings

#	Severity	File	Line	Finding
1	[MAJOR]	`cmd/review-bot/main.go`	254	CI is failing across multiple review jobs; per process rules, this requires addressing before merge.
2	[MINOR]	`review/parser.go`	41	On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level.

Recommendation

REQUEST_CHANGES — Please fix the failing CI jobs before merging. Functionally, the retry and Content-Length validation are reasonable mitigations for transient truncation and do not introduce obvious security vulnerabilities. For defense-in-depth, consider limiting or redacting the 'Cleaned preview' included in error messages to avoid exposing LLM response content in logs, or emit it only at debug level.

Review by security

Evaluated against db479d0f

# Security Review ## Summary The changes add robust retry logic and diagnostics; however, CI is failing and must be addressed before merge. From a security perspective, one minor concern is increased logging of LLM response previews which could leak content into logs. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `cmd/review-bot/main.go` | 254 | CI is failing across multiple review jobs; per process rules, this requires addressing before merge. | | 2 | [MINOR] | `review/parser.go` | 41 | On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level. | ## Recommendation **REQUEST_CHANGES** — Please fix the failing CI jobs before merging. Functionally, the retry and Content-Length validation are reasonable mitigations for transient truncation and do not introduce obvious security vulnerabilities. For defense-in-depth, consider limiting or redacting the 'Cleaned preview' included in error messages to avoid exposing LLM response content in logs, or emit it only at debug level. --- *Review by security*  --- *Evaluated against db479d0f*

cmd/review-bot/main.go

						
				@@ -254,25 +254,41 @@ func main() {

						slog.Warn("context trimmed to fit budget", "trimmed", budgetResult.Trimmed)

security-review-bot commented

[MAJOR] CI is failing across multiple review jobs; per process rules, this requires addressing before merge.

**[MAJOR]** CI is failing across multiple review jobs; per process rules, this requires addressing before merge.

review/parser.go

						
				@@ -37,0 +38,4 @@

							cleanedLen := len(cleaned)

							preview := cleaned

							if len(preview) > 200 {

								preview = preview[:100] + "..." + preview[len(preview)-100:]

security-review-bot commented

2026-05-08 02:20:05 +00:00

[MINOR] On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level.

**[MINOR]** On parse failures, the error now includes a 'Cleaned preview' of up to ~200 characters of the LLM response. If logs are exported to third-party systems or visible broadly, this can increase the risk of leaking potentially sensitive prompt/response content. Consider truncating further, redacting, or gating detailed previews behind a debug log level.

gpt-review-bot requested changes 2026-05-08 02:20:29 +00:00

gpt-review-bot left a comment

First-time contributor

Gpt Review

Summary

The changes add sensible retry logic for transient body read/truncation and improve diagnostics. However, CI is currently failing, which mandates requesting changes. There are also a few minor issues around retry messaging, context-aware sleeping, and brittle error classification.

Findings

#	Severity	File	Line	Finding
1	[MAJOR]	`CI`	0	CI is failing across multiple review jobs (OpenAI/Anthropic review checks). Per process, failing CI requires changes before merge.
2	[MINOR]	`cmd/review-bot/main.go`	264	The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done.
3	[NIT]	`cmd/review-bot/main.go`	256	Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity.
4	[MINOR]	`llm/client.go`	100	isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is).
5	[NIT]	`llm/client.go`	275	The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path.
6	[NIT]	`review/parser.go`	39	On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics.

Recommendation

REQUEST_CHANGES — Because CI is failing, please address the CI issues before merging. The core approach (retrying on transient body read failures and validating Content-Length) looks sound and is covered by tests. To improve robustness and clarity: 1) In cmd/review-bot/main.go, avoid sleeping if the context is canceled before retry, and consider a more generic retry log message. 2) In llm/client.go, replace substring-based retry classification with sentinel errors and errors.Is checks (e.g., define ErrBodyLengthMismatch and wrap it via fmt.Errorf("%w: ...", ErrBodyLengthMismatch)). Continue retrying read failures; consider documenting that Content-Length mismatch detection may rarely be hit due to io.ReadAll behavior. 3) In review/parser.go, include the repaired unmarshal error (err2) in the returned diagnostics for more actionable insight. Once CI is green and these minor improvements are addressed, the PR will be in good shape.

Review by gpt

Evaluated against db479d0f

# Gpt Review ## Summary The changes add sensible retry logic for transient body read/truncation and improve diagnostics. However, CI is currently failing, which mandates requesting changes. There are also a few minor issues around retry messaging, context-aware sleeping, and brittle error classification. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `CI` | 0 | CI is failing across multiple review jobs (OpenAI/Anthropic review checks). Per process, failing CI requires changes before merge. | | 2 | [MINOR] | `cmd/review-bot/main.go` | 264 | The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done. | | 3 | [NIT] | `cmd/review-bot/main.go` | 256 | Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity. | | 4 | [MINOR] | `llm/client.go` | 100 | isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is). | | 5 | [NIT] | `llm/client.go` | 275 | The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path. | | 6 | [NIT] | `review/parser.go` | 39 | On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics. | ## Recommendation **REQUEST_CHANGES** — Because CI is failing, please address the CI issues before merging. The core approach (retrying on transient body read failures and validating Content-Length) looks sound and is covered by tests. To improve robustness and clarity: 1) In cmd/review-bot/main.go, avoid sleeping if the context is canceled before retry, and consider a more generic retry log message. 2) In llm/client.go, replace substring-based retry classification with sentinel errors and errors.Is checks (e.g., define ErrBodyLengthMismatch and wrap it via fmt.Errorf("%w: ...", ErrBodyLengthMismatch)). Continue retrying read failures; consider documenting that Content-Length mismatch detection may rarely be hit due to io.ReadAll behavior. 3) In review/parser.go, include the repaired unmarshal error (err2) in the returned diagnostics for more actionable insight. Once CI is green and these minor improvements are addressed, the PR will be in good shape. --- *Review by gpt*  --- *Evaluated against db479d0f*

cmd/review-bot/main.go

						
				@@ -254,25 +254,41 @@ func main() {

						slog.Warn("context trimmed to fit budget", "trimmed", budgetResult.Trimmed)

					}

gpt-review-bot commented

First-time contributor

[NIT] Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity.

**[NIT]** Log message says "retrying LLM request after parse failure" even when the previous failure might be a request error. Consider a more generic message or include the previous error cause for clarity.

cmd/review-bot/main.go

						
				@@ -267,3 +264,1 @@

						os.Exit(1)

					}

					slog.Info("LLM response received", "bytes", len(response))

					var response string

gpt-review-bot commented

First-time contributor

[MINOR] The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done.

**[MINOR]** The retry backoff on parse failure uses time.Sleep(1s) without checking ctx.Err(). This can delay shutdown by up to 1s after context cancellation. Consider skipping the sleep when the context is done.

llm/client.go

						
				@@ -83,0 +97,4 @@

							return "", err

						}

						if attempt == 0 && ctx.Err() == nil {

gpt-review-bot commented

First-time contributor

[MINOR] isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is).

**[MINOR]** isRetryableError relies on substring matching of error messages ("read response", "body length mismatch"). This is brittle. Prefer sentinel errors and errors.Is checks (e.g., wrap a package-level ErrBodyLengthMismatch with %w and check with errors.Is).

llm/client.go

						
				@@ -232,2 +272,4 @@

					}

					// Validate body length against Content-Length header when present.

					// A mismatch indicates the response was truncated in transit.

gpt-review-bot commented

First-time contributor

[NIT] The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path.

**[NIT]** The Content-Length mismatch check may be rarely reached because io.ReadAll typically returns an error (e.g., io.ErrUnexpectedEOF) on short reads. It's harmless but may not trigger often; consider documenting this or relying primarily on the read error retry path.

review/parser.go

						
				@@ -37,0 +36,4 @@

							// Include diagnostic info: lengths help identify truncation

							rawLen := len(response)

							cleanedLen := len(cleaned)

							preview := cleaned

gpt-review-bot commented

First-time contributor

[NIT] On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics.

**[NIT]** On parse failure after attempting repair, the returned error wraps the first unmarshal error (err) but not err2 from the repaired attempt. Including both or wrapping err2 may improve diagnostics.

gpt-review-bot requested changes 2026-05-08 02:20:32 +00:00

gpt-review-bot left a comment

First-time contributor

Gpt5-mini Review

Summary

The retry-on-truncation approach is a sound defense-in-depth idea, and tests were added to exercise it; however CI is failing and there is a correctness/regression risk where API errors could be retried due to the Content-Length check ordering. Also the retry detection uses brittle string matching instead of wrapped/sentinel errors.

Findings

#	Severity	File	Line	Finding
1	[MAJOR]	`CI`	0	The PR CI is failing (multiple review jobs report 'Failing after ~15-17s'). Per repository policy, CI must be green — this blocks approval until investigated and fixed.
2	[MAJOR]	`llm/client.go`	271	doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors.
3	[MINOR]	`llm/client.go`	75	isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As.
4	[MINOR]	`llm/client_test.go`	296	A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried.
5	[NIT]	`review/parser.go`	33	The improved parse error diagnostics remove printing the entire raw response (good for privacy) and provide lengths + preview. Consider also including the original parse error's wrapped text via %v in the diagnostic so the repair attempt failure reason is clearly visible in logs.

Recommendation

REQUEST_CHANGES — Fix the CI failures first and re-run tests. Then address the retry semantics in llm/client.go: either (A) move the HTTP status-code check before the Content-Length validation so non-2xx API errors are returned as API errors (and not classified as retryable), or (B) ensure doRequest returns an error class that distinguishes 'API error' vs 'transient transport/truncation' (for example, return a typed or sentinel error for body-length mismatch). Replace fragile substring matching in isRetryableError with checks against wrapped sentinel/typed errors (errors.Is / errors.As) or explicit error types so retry decisions are robust. Add a unit test that verifies a non-2xx response with a mismatched Content-Length does not trigger retries. After making these changes and confirming tests/CI pass, the PR can be approved.

Review by gpt5-mini

Evaluated against db479d0f

# Gpt5-mini Review ## Summary The retry-on-truncation approach is a sound defense-in-depth idea, and tests were added to exercise it; however CI is failing and there is a correctness/regression risk where API errors could be retried due to the Content-Length check ordering. Also the retry detection uses brittle string matching instead of wrapped/sentinel errors. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `CI` | 0 | The PR CI is failing (multiple review jobs report 'Failing after ~15-17s'). Per repository policy, CI must be green — this blocks approval until investigated and fixed. | | 2 | [MAJOR] | `llm/client.go` | 271 | doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors. | | 3 | [MINOR] | `llm/client.go` | 75 | isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As. | | 4 | [MINOR] | `llm/client_test.go` | 296 | A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried. | | 5 | [NIT] | `review/parser.go` | 33 | The improved parse error diagnostics remove printing the entire raw response (good for privacy) and provide lengths + preview. Consider also including the original parse error's wrapped text via %v in the diagnostic so the repair attempt failure reason is clearly visible in logs. | ## Recommendation **REQUEST_CHANGES** — Fix the CI failures first and re-run tests. Then address the retry semantics in llm/client.go: either (A) move the HTTP status-code check before the Content-Length validation so non-2xx API errors are returned as API errors (and not classified as retryable), or (B) ensure doRequest returns an error class that distinguishes 'API error' vs 'transient transport/truncation' (for example, return a typed or sentinel error for body-length mismatch). Replace fragile substring matching in isRetryableError with checks against wrapped sentinel/typed errors (errors.Is / errors.As) or explicit error types so retry decisions are robust. Add a unit test that verifies a non-2xx response with a mismatched Content-Length does not trigger retries. After making these changes and confirming tests/CI pass, the PR can be approved. --- *Review by gpt5-mini*  --- *Evaluated against db479d0f*

llm/client.go

						
				@@ -75,12 +75,52 @@ type Message struct {

				// Complete sends a chat completion request and returns the assistant's response content.

gpt-review-bot commented

First-time contributor

[MINOR] isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As.

**[MINOR]** isRetryableError detects retryable failures by substring-matching on error strings (e.g. 'read response', 'body length mismatch'). Relying on textual matching of formatted error messages is brittle and fragile to future wording changes; it is better to return and detect wrapped sentinel errors or typed errors so callers can use errors.Is / errors.As.

llm/client.go

						
				@@ -231,6 +271,12 @@ func (c *Client) doRequest(req *http.Request, parse func([]byte) (string, error)

						return "", fmt.Errorf("read response: %w", err)

gpt-review-bot commented

First-time contributor

[MAJOR] doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors.

**[MAJOR]** doRequest performs Content-Length mismatch validation before checking resp.StatusCode. If an API error response includes a Content-Length that doesn't match the received bytes, doRequest will return a 'body length mismatch' error which is considered retryable by Complete(). This contradicts the stated intent 'Do not retry on API errors' and could cause retries for genuine API errors.

llm/client_test.go

						
				@@ -295,3 +296,131 @@ func TestWithProvider(t *testing.T) {

						t.Errorf("expected provider anthropic, got %s", client.provider)

gpt-review-bot commented

First-time contributor

[MINOR] A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried.

**[MINOR]** A number of tests were added which simulate truncation and content-length mismatch. These are valuable, but please ensure the tests also cover the regression scenario where a non-2xx API error response contains a mismatched Content-Length so we assert that API errors are not retried.

review/parser.go

						
				@@ -33,7 +33,14 @@ func ParseResponse(response string) (*ReviewResult, error) {

						// Try to repair before giving up.

gpt-review-bot commented