fix: retry on transient LLM response body truncation

Addresses intermittent 'unexpected end of JSON input' failures where the LLM response body is truncated in transit between the proxy and client. Root cause: network-level truncation where io.ReadAll returns partial data (observed in 3/50 CI runs through HAI proxy). The response body reading was already using io.ReadAll correctly, but transient network issues between the proxy and client can still cause partial reads. Changes: - Add Content-Length validation in doRequest: detect when fewer bytes arrive than the server declared, triggering a retry - Add retry logic in Complete: retries once on retryable errors (body read failures, content-length mismatches) with a 500ms backoff - Add parse-level retry in main: if ParseResponse fails, re-requests from the LLM once before giving up (defensive, since retries always succeed per issue evidence) - Improve ParseResponse error diagnostics: log raw vs cleaned lengths and a preview of the cleaned content to aid future debugging Does NOT retry on API errors (4xx/5xx) or structural issues — only transient body read problems. Closes #47
2026-05-07 00:44:32 -07:00
parent cabbb5a55a
commit db479d0ff4
4 changed files with 216 additions and 18 deletions
@@ -33,7 +33,14 @@ func ParseResponse(response string) (*ReviewResult, error) {
 		// Try to repair before giving up.
 		repaired := repairJSON(cleaned)
 		if err2 := json.Unmarshal([]byte(repaired), &result); err2 != nil {
-			return nil, fmt.Errorf("parse LLM response as JSON: %w\nRaw response: %s", err, response)
+			// Include diagnostic info: lengths help identify truncation
+			rawLen := len(response)
+			cleanedLen := len(cleaned)
+			preview := cleaned
+			if len(preview) > 200 {
+				preview = preview[:100] + "..." + preview[len(preview)-100:]
+			}
+			return nil, fmt.Errorf("parse LLM response as JSON: %w\nRaw length: %d, cleaned length: %d\nCleaned preview: %s", err, rawLen, cleanedLen, preview)
 		}
 	}