feat(github): implement PRReader + FileReader client (#80) #93
@@ -133,10 +133,23 @@ func NewClient(token, baseURL string, opts ...ClientOption) *Client {
|
||||
|
||||
|
|
||||
// SetHTTPClient sets the underlying HTTP client used for requests.
|
||||
// This is intended for testing to inject mock transports.
|
||||
// Passing nil will restore the default client with a 30s timeout.
|
||||
// Passing nil restores the default client (30s timeout + auth-stripping
|
||||
|
gpt-review-bot
commented
[MINOR] Retry-After is parsed only as delta-seconds; per RFC 7231 it may also be an HTTP-date. Consider falling back to parsing an HTTP-date when Atoi fails to honor server guidance more robustly. **[MINOR]** Retry-After is parsed only as delta-seconds; per RFC 7231 it may also be an HTTP-date. Consider falling back to parsing an HTTP-date when Atoi fails to honor server guidance more robustly.
|
||||
// CheckRedirect policy matching NewClient).
|
||||
|
gpt-review-bot
commented
[MINOR] Non-HTTPS rejection checks strings.HasPrefix(url, "https://"). This is case-sensitive and string-based; safer to parse with net/url and check URL.Scheme case-insensitively (e.g., u.Scheme == "https"). **[MINOR]** Non-HTTPS rejection checks strings.HasPrefix(url, "https://"). This is case-sensitive and string-based; safer to parse with net/url and check URL.Scheme case-insensitively (e.g., u.Scheme == "https").
|
||||
func (c *Client) SetHTTPClient(hc *http.Client) {
|
||||
if hc == nil {
|
||||
hc = &http.Client{Timeout: 30 * time.Second}
|
||||
hc = &http.Client{
|
||||
Timeout: 30 * time.Second,
|
||||
CheckRedirect: func(req *http.Request, via []*http.Request) error {
|
||||
if len(via) >= 10 {
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `doRequest` method signature uses `url` as a parameter name, which shadows the imported `net/url` package. While `net/url` is not imported in `client.go` (it's only in `pr.go` and `files.go`), this naming is a minor style concern since `url` as a local variable name is idiomatic but shadows the package name if ever imported. Not a bug here, just worth noting.
|
||||
return fmt.Errorf("stopped after 10 redirects")
|
||||
}
|
||||
|
[MINOR] SetHTTPClient allows replacing the underlying http.Client without enforcing a safe CheckRedirect policy, which could cause Authorization to be forwarded to untrusted hosts on redirects. The docstring warns about this, but adding a runtime safeguard (e.g., rejecting clients with nil CheckRedirect or wrapping it) would further reduce misuse risk. **[MINOR]** SetHTTPClient allows replacing the underlying http.Client without enforcing a safe CheckRedirect policy, which could cause Authorization to be forwarded to untrusted hosts on redirects. The docstring warns about this, but adding a runtime safeguard (e.g., rejecting clients with nil CheckRedirect or wrapping it) would further reduce misuse risk.
|
||||
prev := via[len(via)-1]
|
||||
|
gpt-review-bot
commented
[NIT] Authorization scheme is hardcoded to "Bearer". While modern GitHub tokens support this, some environments still use the "token" scheme. Consider documenting the expected token type or allowing the scheme to be configurable if needed. **[NIT]** Authorization scheme is hardcoded to "Bearer". While modern GitHub tokens support this, some environments still use the "token" scheme. Consider documenting the expected token type or allowing the scheme to be configurable if needed.
|
||||
if req.URL.Host != prev.URL.Host || (prev.URL.Scheme == "https" && req.URL.Scheme == "http") {
|
||||
req.Header.Del("Authorization")
|
||||
|
gpt-review-bot
commented
[MINOR] Authorization header always uses the "Bearer" scheme. Some GitHub token types (classic PAT) historically use the "token" scheme. Consider making the auth scheme configurable or auto-detectable to maximize compatibility. **[MINOR]** Authorization header always uses the "Bearer" scheme. Some GitHub token types (classic PAT) historically use the "token" scheme. Consider making the auth scheme configurable or auto-detectable to maximize compatibility.
|
||||
}
|
||||
|
[MINOR] Authorization header is sent to whatever baseURL is configured. If baseURL can be influenced by untrusted input, this could leak tokens to an attacker-controlled host (SSRF/token exfiltration). Ensure baseURL is treated as trusted configuration and consider allowlisting expected hosts at a higher layer. **[MINOR]** Authorization header is sent to whatever baseURL is configured. If baseURL can be influenced by untrusted input, this could leak tokens to an attacker-controlled host (SSRF/token exfiltration). Ensure baseURL is treated as trusted configuration and consider allowlisting expected hosts at a higher layer.
|
||||
return nil
|
||||
|
sonnet-review-bot
commented
[MINOR] The CheckRedirect lambda is duplicated verbatim in both NewClient and SetHTTPClient(nil). Extract it to a package-level function (e.g., **[MINOR]** The CheckRedirect lambda is duplicated verbatim in both NewClient and SetHTTPClient(nil). Extract it to a package-level function (e.g., `defaultCheckRedirect`) to eliminate the duplication and ensure both code paths stay in sync when the policy changes.
|
||||
},
|
||||
|
gpt-review-bot
commented
[NIT] Retry-After parsing only handles delta-seconds via Atoi. RFC 7231 allows an HTTP-date format; optionally support parsing HTTP-date to fully respect server guidance. **[NIT]** Retry-After parsing only handles delta-seconds via Atoi. RFC 7231 allows an HTTP-date format; optionally support parsing HTTP-date to fully respect server guidance.
|
||||
}
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] When **[MINOR]** When `c.http.Do(req)` returns an error (network failure, context cancellation), the function returns immediately without retrying. For transient network errors, a retry could be valuable. The current behavior is reasonable for the stated scope (only retry on 429), but the comment says 'It respects the Retry-After header when present' without mentioning the no-retry-on-transport-error behavior. This is a documentation gap rather than a bug.
|
||||
c.httpClient = hc
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] The security check **[MINOR]** The security check `if !c.allowInsecureHTTP && req.URL.Scheme != "https"` is performed inside the retry loop, meaning it will fail on every retry attempt rather than being checked once before the loop starts. Since the URL doesn't change between retries, this is wasteful and the error message is slightly misleading (it mentions `req.URL.Host` but the real issue is the scheme). Moving the check before the retry loop or to `NewClient` would be cleaner.
|
||||
@@ -199,6 +212,9 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
|
||||
return nil, fmt.Errorf("create request: %w", err)
|
||||
}
|
||||
if c.token != "" {
|
||||
// Bearer is the OAuth2 standard and is accepted by GitHub for both
|
||||
|
gpt-review-bot
commented
[NIT] Calling timer.Stop() after the timer has already fired is unnecessary. It’s harmless, but can be removed for clarity; only the ctx.Done() path needs Stop before return. **[NIT]** Calling timer.Stop() after the timer has already fired is unnecessary. It’s harmless, but can be removed for clarity; only the ctx.Done() path needs Stop before return.
|
||||
// classic PATs and fine-grained tokens. The alternative "token" scheme
|
||||
// is GitHub-specific and offers no additional compatibility.
|
||||
|
sonnet-review-bot
commented
[MINOR] Timer leak on the happy path of the retry loop. When **[MINOR]** Timer leak on the happy path of the retry loop. When `timer.C` fires normally (case `<-timer.C`), `timer.Stop()` is never called. The timer has already fired so this is not a resource leak in the traditional sense, but idiomatic Go calls `timer.Stop()` on all branches to release the internal runtime resources promptly. The fix: `case <-timer.C: timer.Stop()` or use `defer timer.Stop()` before the select.
|
||||
req.Header.Set("Authorization", "Bearer "+c.token)
|
||||
}
|
||||
req.Header.Set("User-Agent", userAgent)
|
||||
|
sonnet-review-bot
commented
[NIT] The timer pattern **[NIT]** The timer pattern `timer.Stop() // no-op after fire; kept for symmetry` in the `case <-timer.C` branch is slightly misleading. `Stop()` after a channel receive is indeed a no-op since the channel has already fired, but the comment could be clearer: `timer.Stop()` after `<-timer.C` does nothing and could simply be omitted. This is a readability nit, not a correctness issue.
|
||||
@@ -232,7 +248,8 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
|
||||
|
||||
// Retry on 429 rate limit
|
||||
if resp.StatusCode == http.StatusTooManyRequests && attempt < maxAttempts-1 {
|
||||
// Check for Retry-After header and override backoff if present
|
||||
// Check for Retry-After header and override backoff if present.
|
||||
// Supports both integer seconds (common) and HTTP-date format (RFC 7231).
|
||||
if ra := resp.Header.Get("Retry-After"); ra != "" {
|
||||
if seconds, err := strconv.Atoi(ra); err == nil && seconds > 0 {
|
||||
delay := time.Duration(seconds) * time.Second
|
||||
@@ -242,6 +259,17 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
|
||||
if attempt < len(backoff) {
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `int64` cast in `if int64(len(body)) >= maxResponseBytes` is unnecessary since `maxResponseBytes` is an untyped constant and `len()` returns `int`. Both sides of the comparison are `int`-typed. The cast is harmless but adds noise.
|
||||
backoff[attempt] = delay
|
||||
}
|
||||
} else if t, err := http.ParseTime(ra); err == nil {
|
||||
delay := time.Until(t)
|
||||
|
sonnet-review-bot
commented
[NIT] In **[NIT]** In `doRequest`, after the `handleResponse` call, there's a check `if resp.StatusCode == http.StatusTooManyRequests` but `resp` could theoretically have been closed by `handleResponse`. The body is closed, but the `resp.StatusCode` field is still accessible on the struct. This is correct and safe in Go's net/http — just worth being aware of.
|
||||
if delay < 0 {
|
||||
delay = 0
|
||||
}
|
||||
if delay > maxRetryAfter {
|
||||
delay = maxRetryAfter
|
||||
}
|
||||
if attempt < len(backoff) {
|
||||
backoff[attempt] = delay
|
||||
}
|
||||
}
|
||||
}
|
||||
continue
|
||||
|
||||
@@ -263,6 +263,84 @@ func TestDoRequest_RetryAfterDoesNotMutateBackoff(t *testing.T) {
|
||||
}
|
||||
}
|
||||
|
||||
func TestDoRequest_429RetryAfterHTTPDate(t *testing.T) {
|
||||
attempts := 0
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
attempts++
|
||||
if attempts == 1 {
|
||||
// Use HTTP-date format (RFC 7231) — a time 2 seconds in the future.
|
||||
future := time.Now().Add(2 * time.Second).UTC()
|
||||
w.Header().Set("Retry-After", future.Format(http.TimeFormat))
|
||||
w.WriteHeader(429)
|
||||
w.Write([]byte(`{"message":"rate limit"}`))
|
||||
return
|
||||
}
|
||||
w.WriteHeader(200)
|
||||
w.Write([]byte(`{"ok":true}`))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
c := NewClient("token", srv.URL, AllowInsecureHTTP())
|
||||
c.SetHTTPClient(srv.Client())
|
||||
|
sonnet-review-bot
commented
[NIT] **[NIT]** `TestDoRequest_LimitsResponseBody` tests a constant value rather than actual behavior. The comment acknowledges this limitation. This is acceptable as a documentation-style test, but it adds no real safety guarantee — if `maxResponseBytes` is set correctly but the `io.LimitReader` call is removed, the test would still pass. Consider removing it or replacing with a test that actually sends a response exceeding the limit.
|
||||
c.SetRetryBackoff([]time.Duration{1 * time.Millisecond, 1 * time.Millisecond})
|
||||
|
||||
start := time.Now()
|
||||
body, err := c.doGet(context.Background(), srv.URL+"/test")
|
||||
elapsed := time.Since(start)
|
||||
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
if string(body) != `{"ok":true}` {
|
||||
t.Errorf("unexpected body: %s", body)
|
||||
|
sonnet-review-bot
commented
[NIT] **[NIT]** `TestDoRequest_LimitsResponseBody` doesn't actually test the limit behavior — it only checks the constant value. The comment acknowledges this. This is a weak test; consider removing it or replacing it with an actual test using a small limit (e.g., setting `maxResponseBytes` to a small value via a test helper, or accepting that this particular boundary isn't testable without refactoring).
|
||||
}
|
||||
if attempts != 2 {
|
||||
t.Errorf("expected 2 attempts, got %d", attempts)
|
||||
}
|
||||
// HTTP-date was ~2s in the future; by the time client processes it,
|
||||
// time.Until gives ~1-2s. Verify it's meaningfully delayed (not instant).
|
||||
if elapsed < 500*time.Millisecond {
|
||||
t.Errorf("expected meaningful delay from HTTP-date Retry-After, got %v", elapsed)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDoRequest_429RetryAfterHTTPDateInPast(t *testing.T) {
|
||||
attempts := 0
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
attempts++
|
||||
if attempts == 1 {
|
||||
// Use a time in the past — should result in zero/immediate retry.
|
||||
past := time.Now().Add(-10 * time.Second).UTC()
|
||||
w.Header().Set("Retry-After", past.Format(http.TimeFormat))
|
||||
w.WriteHeader(429)
|
||||
w.Write([]byte(`{"message":"rate limit"}`))
|
||||
return
|
||||
}
|
||||
w.WriteHeader(200)
|
||||
w.Write([]byte(`{"ok":true}`))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
c := NewClient("token", srv.URL, AllowInsecureHTTP())
|
||||
c.SetHTTPClient(srv.Client())
|
||||
c.SetRetryBackoff([]time.Duration{5 * time.Second, 5 * time.Second})
|
||||
|
||||
start := time.Now()
|
||||
_, err := c.doGet(context.Background(), srv.URL+"/test")
|
||||
elapsed := time.Since(start)
|
||||
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
if attempts != 2 {
|
||||
t.Errorf("expected 2 attempts, got %d", attempts)
|
||||
}
|
||||
// Past date should override the 5s backoff to ~0
|
||||
if elapsed > 500*time.Millisecond {
|
||||
t.Errorf("expected near-instant retry for past HTTP-date, got %v", elapsed)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDoRequest_SetsUserAgentHeader(t *testing.T) {
|
||||
var gotUA string
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
@@ -392,4 +470,7 @@ func TestSetHTTPClient_NilRestoresDefault(t *testing.T) {
|
||||
if c.httpClient.Timeout != 30*time.Second {
|
||||
t.Errorf("expected 30s timeout, got %v", c.httpClient.Timeout)
|
||||
}
|
||||
if c.httpClient.CheckRedirect == nil {
|
||||
t.Fatal("expected CheckRedirect policy after SetHTTPClient(nil)")
|
||||
}
|
||||
}
|
||||
|
||||
@@ -22,6 +22,10 @@ func (c *Client) GetFileContent(ctx context.Context, owner, repo, path, ref stri
|
||||
// If the path points to a single file (not a directory), the API returns
|
||||
// a JSON object instead of an array; this is handled by returning a
|
||||
// single-element slice.
|
||||
//
|
||||
// Note: dot-segments ("." and "..") in the path are silently removed to
|
||||
// prevent path traversal. This means a path like "foo/../bar" resolves
|
||||
// to "foo/bar" rather than "bar".
|
||||
func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]vcs.ContentEntry, error) {
|
||||
reqURL := fmt.Sprintf("%s/repos/%s/%s/contents/%s",
|
||||
c.baseURL, url.PathEscape(owner), url.PathEscape(repo), escapePath(path))
|
||||
|
||||
@@ -123,6 +123,10 @@ func (c *Client) GetPullRequestFiles(ctx context.Context, owner, repo string, nu
|
||||
|
||||
// GetFileContentAtRef fetches a file at a specific ref from a repo.
|
||||
// If ref is empty, the query parameter is omitted (uses default branch).
|
||||
//
|
||||
// Note: dot-segments ("." and "..") in the path are silently removed to
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `GetFileContentAtRef` method is placed in `pr.go` but it's also the implementation delegate for `GetFileContent` in `files.go`. Conceptually it belongs in `files.go` alongside the other file-related methods. The split works because Go doesn't care which file a method is in, but for navigability (per the `style.md` pattern of organizing by concept) moving it to `files.go` would be cleaner.
|
||||
// prevent path traversal. This means a path like "foo/../bar" resolves
|
||||
// to "foo/bar" rather than "bar".
|
||||
func (c *Client) GetFileContentAtRef(ctx context.Context, owner, repo, path, ref string) (string, error) {
|
||||
reqURL := fmt.Sprintf("%s/repos/%s/%s/contents/%s",
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `GetPullRequestFiles` comment says 'Returns nil (not an empty slice) when the PR has no changed files' but if the first page returns an empty array, `allFiles` remains nil — this is correct. However the GitHub API actually returns an empty array for PRs with 0 changed files (valid edge case), so the nil vs empty distinction in the doc comment is accurate but subtle. No code change needed.
|
||||
c.baseURL, url.PathEscape(owner), url.PathEscape(repo), escapePath(path))
|
||||
|
||||
[NIT] Calling timer.Stop() after the timer has already fired is unnecessary. You could simplify the delay logic with select { case <-time.After(delay): case <-ctx.Done(): } or only Stop in the ctx.Done branch.