[MINOR] In getDeploymentURL, the token is fetched before acquiring the write lock (correct to avoid deadlock with getToken's own lock), but then fetchDeployments is called while holding the write lock AND passes the token in. If fetchDeployments takes a long time (slow network), all readers are blocked for the duration. This is acceptable for a CLI tool but worth noting — in a high-concurrency server scenario a different approach would be needed.
[MINOR] The field http *http.Client shadows the net/http package name within the struct's method bodies. This is valid Go but can cause confusion — a reader seeing c.http.Do(req) has to remember it's a field, not the package. The existing Client in client.go has the same pattern, so this is consistent with the codebase, but renaming to httpClient would be cleaner.
[NIT] CompleteAnthropic and CompleteOpenAI are public methods on AICoreClient. Since the intended consumer is only completeAICore in client.go (internal routing), exporting these gives callers the ability to bypass the routing logic (calling CompleteOpenAI with an Anthropic model name, for example). Consider making them unexported (completeAnthropic, completeOpenAI) since they're not part of any documented public API.
[MINOR] Deployment cache has no TTL or invalidation mechanism. If a deployment goes from RUNNING to STOPPED (or a new deployment is added), the cached entry will be used until the process restarts. Consider adding a deployment cache TTL (e.g., 5-10 minutes) similar to the token refresh guard, or at least document this limitation. This mirrors the concern raised about the deployment URL being cached indefinitely once fetched.
[NIT] The field name http shadows the imported net/http package name within the struct. While Go allows this (the package is still accessible by its full import path), it's a subtle naming collision. The existing llm/client.go already uses the same pattern so this is consistent with the codebase, but worth noting as a readability concern.
[MINOR] If a model has multiple RUNNING deployments (as the test demonstrates for 'gpt-5' with deploy-456 STOPPED and deploy-789 RUNNING), the final write wins in the loop since the map is overwritten. In the test, deploy-456 is STOPPED so it is filtered out and deploy-789 wins correctly. However, if two deployments for the same model name are both RUNNING, the result is nondeterministic (last one parsed wins). This edge case should either be documented or handled explicitly (e.g., pick the first RUNNING, or error).
[NIT] The variable evaluatedSHA is assigned from pr.Head.Sha here, but pr.Head.Sha was already captured earlier in the function (around line 277) for the CI status check and again for the commit footer. This is fine functionally, but a brief comment or colocation with the earlier SHA usage might make the relationship clearer to future readers. Very minor readability nit.
[NIT] In CompleteAnthropic, getDeploymentURL and then getToken are called sequentially. getDeploymentURL internally already calls getToken (to fetch deployments), so the second explicit getToken call will typically be a cache hit. This is correct and safe, but a comment explaining the double-call or refactoring to return the token from getDeploymentURL would improve readability.
[MINOR] fetchDeployments has an unnecessary blank line after the function signature (func (c *AICoreClient) fetchDeployments(...) error {\n\n). Cosmetic but inconsistent with the rest of the file style.
[MINOR] Deployments are fetched once and cached forever with no TTL or invalidation mechanism. If a deployment goes from RUNNING to STOPPED (or a new deployment is added) between reviews, the cached URL will remain stale until the process restarts. Given this tool runs as a short-lived CI process this is acceptable, but worth a comment explaining the intentional lack of cache expiry.
[MINOR] The deployment.ID field is stored but never read — it's populated in fetchDeployments but getDeploymentURL only returns d.URL. If the ID is truly unused, consider removing it to avoid dead code. If it's intended for future use (e.g., logging), add a comment.
[NIT] TestAICoreClient_TokenExpiry ignores the error return from getToken on lines 390 and 396 (token1, _ := and token2, _ :=). Tests should check errors or use t.Fatal to avoid masking failures.
[NIT] TestAICoreClient_TokenExpiry generates token strings with string(rune('0'+call)) which works for single-digit call counts (1, 2) but would produce incorrect results for call >= 10 (non-digit runes). Since the test only makes 2 calls this is safe, but fmt.Sprintf("token-%d", call) would be clearer and more robust.
[MINOR] The sleep time.Sleep(500 * time.Millisecond) is guarded by attempt == 0 && ctx.Err() == nil, but since the loop runs from attempt = 0 to attempt < 2, when attempt == 1 (the second/last iteration) fails and is retryable, control falls through to return "", err without sleeping. This is fine, but the condition comment says 'brief pause before retry' — actually the sleep only fires before the second attempt, not after it. The logic is correct but the comment/guard could be clearer. More importantly: if a future developer extends the loop to 3 attempts, the sleep will only fire once instead of between each attempt. Consider if attempt < 1 or restructuring to sleep at the top of the loop for attempt > 0.
[MAJOR] result is declared as var result *review.ReviewResult before the loop. If llmClient.Complete succeeds on attempt 1 but review.ParseResponse fails on attempt 1, and then llmClient.Complete also fails on attempt 2, the code hits continue and the loop ends with result == nil. The code then falls through to slog.Info("review parsed", "verdict", result.Verdict, ...) which will panic with a nil pointer dereference. The os.Exit(1) on the second Complete failure is only reached when attempt==2 and Complete itself errors; if Complete succeeds but Parse fails on attempt 2, there is a separate os.Exit(1), but if Complete fails on attempt 2, the loop exits normally with a nil result. This needs a guard after the loop.
[MINOR] The content-length validation cl > 0 && int64(len(body)) < cl uses resp.ContentLength which is already parsed by net/http from the Content-Length header and returns -1 when absent. The check cl > 0 is correct for this, but it is worth noting that resp.ContentLength will be -1 for chunked transfer encoding responses (which many LLM APIs use). This means the check will never fire for chunked responses, which is likely the actual transport path in production — the HAI proxy may be using chunked encoding. The fix addresses the symptom (Content-Length mismatch) but may not cover the real production scenario. This is a correctness concern for the stated problem.