feat(github): implement PRReader + FileReader client (#80) #93

Closed
rodin wants to merge 16 commits from review-bot-issue-80 into feature/github-support
4 changed files with 120 additions and 3 deletions
Showing only changes of commit 1fcc0b738a - Show all commits
+31 -3
View File
50
@@ -133,10 +133,23 @@ func NewClient(token, baseURL string, opts ...ClientOption) *Client {
Outdated
Review

[NIT] Calling timer.Stop() after the timer has already fired is unnecessary. You could simplify the delay logic with select { case <-time.After(delay): case <-ctx.Done(): } or only Stop in the ctx.Done branch.

**[NIT]** Calling timer.Stop() after the timer has already fired is unnecessary. You could simplify the delay logic with select { case <-time.After(delay): case <-ctx.Done(): } or only Stop in the ctx.Done branch.
// SetHTTPClient sets the underlying HTTP client used for requests.
// This is intended for testing to inject mock transports.
// Passing nil will restore the default client with a 30s timeout.
// Passing nil restores the default client (30s timeout + auth-stripping
Outdated
Review

[MINOR] Retry-After is parsed only as delta-seconds; per RFC 7231 it may also be an HTTP-date. Consider falling back to parsing an HTTP-date when Atoi fails to honor server guidance more robustly.

**[MINOR]** Retry-After is parsed only as delta-seconds; per RFC 7231 it may also be an HTTP-date. Consider falling back to parsing an HTTP-date when Atoi fails to honor server guidance more robustly.
// CheckRedirect policy matching NewClient).
Outdated
Review

[MINOR] Non-HTTPS rejection checks strings.HasPrefix(url, "https://"). This is case-sensitive and string-based; safer to parse with net/url and check URL.Scheme case-insensitively (e.g., u.Scheme == "https").

**[MINOR]** Non-HTTPS rejection checks strings.HasPrefix(url, "https://"). This is case-sensitive and string-based; safer to parse with net/url and check URL.Scheme case-insensitively (e.g., u.Scheme == "https").
func (c *Client) SetHTTPClient(hc *http.Client) {
if hc == nil {
hc = &http.Client{Timeout: 30 * time.Second}
hc = &http.Client{
Timeout: 30 * time.Second,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
if len(via) >= 10 {
Outdated
Review

[NIT] The doRequest method signature uses url as a parameter name, which shadows the imported net/url package. While net/url is not imported in client.go (it's only in pr.go and files.go), this naming is a minor style concern since url as a local variable name is idiomatic but shadows the package name if ever imported. Not a bug here, just worth noting.

**[NIT]** The `doRequest` method signature uses `url` as a parameter name, which shadows the imported `net/url` package. While `net/url` is not imported in `client.go` (it's only in `pr.go` and `files.go`), this naming is a minor style concern since `url` as a local variable name is idiomatic but shadows the package name if ever imported. Not a bug here, just worth noting.
return fmt.Errorf("stopped after 10 redirects")
}
Review

[MINOR] SetHTTPClient allows replacing the underlying http.Client without enforcing a safe CheckRedirect policy, which could cause Authorization to be forwarded to untrusted hosts on redirects. The docstring warns about this, but adding a runtime safeguard (e.g., rejecting clients with nil CheckRedirect or wrapping it) would further reduce misuse risk.

**[MINOR]** SetHTTPClient allows replacing the underlying http.Client without enforcing a safe CheckRedirect policy, which could cause Authorization to be forwarded to untrusted hosts on redirects. The docstring warns about this, but adding a runtime safeguard (e.g., rejecting clients with nil CheckRedirect or wrapping it) would further reduce misuse risk.
prev := via[len(via)-1]
Outdated
Review

[NIT] Authorization scheme is hardcoded to "Bearer". While modern GitHub tokens support this, some environments still use the "token" scheme. Consider documenting the expected token type or allowing the scheme to be configurable if needed.

**[NIT]** Authorization scheme is hardcoded to "Bearer". While modern GitHub tokens support this, some environments still use the "token" scheme. Consider documenting the expected token type or allowing the scheme to be configurable if needed.
if req.URL.Host != prev.URL.Host || (prev.URL.Scheme == "https" && req.URL.Scheme == "http") {
req.Header.Del("Authorization")
Outdated
Review

[MINOR] Authorization header always uses the "Bearer" scheme. Some GitHub token types (classic PAT) historically use the "token" scheme. Consider making the auth scheme configurable or auto-detectable to maximize compatibility.

**[MINOR]** Authorization header always uses the "Bearer" scheme. Some GitHub token types (classic PAT) historically use the "token" scheme. Consider making the auth scheme configurable or auto-detectable to maximize compatibility.
}
Outdated
Review

[MINOR] Authorization header is sent to whatever baseURL is configured. If baseURL can be influenced by untrusted input, this could leak tokens to an attacker-controlled host (SSRF/token exfiltration). Ensure baseURL is treated as trusted configuration and consider allowlisting expected hosts at a higher layer.

**[MINOR]** Authorization header is sent to whatever baseURL is configured. If baseURL can be influenced by untrusted input, this could leak tokens to an attacker-controlled host (SSRF/token exfiltration). Ensure baseURL is treated as trusted configuration and consider allowlisting expected hosts at a higher layer.
return nil
Outdated
Review

[MINOR] The CheckRedirect lambda is duplicated verbatim in both NewClient and SetHTTPClient(nil). Extract it to a package-level function (e.g., defaultCheckRedirect) to eliminate the duplication and ensure both code paths stay in sync when the policy changes.

**[MINOR]** The CheckRedirect lambda is duplicated verbatim in both NewClient and SetHTTPClient(nil). Extract it to a package-level function (e.g., `defaultCheckRedirect`) to eliminate the duplication and ensure both code paths stay in sync when the policy changes.
},
Outdated
Review

[NIT] Retry-After parsing only handles delta-seconds via Atoi. RFC 7231 allows an HTTP-date format; optionally support parsing HTTP-date to fully respect server guidance.

**[NIT]** Retry-After parsing only handles delta-seconds via Atoi. RFC 7231 allows an HTTP-date format; optionally support parsing HTTP-date to fully respect server guidance.
}
}
Outdated
Review

[MINOR] When c.http.Do(req) returns an error (network failure, context cancellation), the function returns immediately without retrying. For transient network errors, a retry could be valuable. The current behavior is reasonable for the stated scope (only retry on 429), but the comment says 'It respects the Retry-After header when present' without mentioning the no-retry-on-transport-error behavior. This is a documentation gap rather than a bug.

**[MINOR]** When `c.http.Do(req)` returns an error (network failure, context cancellation), the function returns immediately without retrying. For transient network errors, a retry could be valuable. The current behavior is reasonable for the stated scope (only retry on 429), but the comment says 'It respects the Retry-After header when present' without mentioning the no-retry-on-transport-error behavior. This is a documentation gap rather than a bug.
c.httpClient = hc
}
Outdated
Review

[MINOR] The security check if !c.allowInsecureHTTP && req.URL.Scheme != "https" is performed inside the retry loop, meaning it will fail on every retry attempt rather than being checked once before the loop starts. Since the URL doesn't change between retries, this is wasteful and the error message is slightly misleading (it mentions req.URL.Host but the real issue is the scheme). Moving the check before the retry loop or to NewClient would be cleaner.

**[MINOR]** The security check `if !c.allowInsecureHTTP && req.URL.Scheme != "https"` is performed inside the retry loop, meaning it will fail on every retry attempt rather than being checked once before the loop starts. Since the URL doesn't change between retries, this is wasteful and the error message is slightly misleading (it mentions `req.URL.Host` but the real issue is the scheme). Moving the check before the retry loop or to `NewClient` would be cleaner.
19
@@ -199,6 +212,9 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
return nil, fmt.Errorf("create request: %w", err)
}
if c.token != "" {
// Bearer is the OAuth2 standard and is accepted by GitHub for both
Outdated
Review

[NIT] Calling timer.Stop() after the timer has already fired is unnecessary. It’s harmless, but can be removed for clarity; only the ctx.Done() path needs Stop before return.

**[NIT]** Calling timer.Stop() after the timer has already fired is unnecessary. It’s harmless, but can be removed for clarity; only the ctx.Done() path needs Stop before return.
// classic PATs and fine-grained tokens. The alternative "token" scheme
// is GitHub-specific and offers no additional compatibility.
Outdated
Review

[MINOR] Timer leak on the happy path of the retry loop. When timer.C fires normally (case <-timer.C), timer.Stop() is never called. The timer has already fired so this is not a resource leak in the traditional sense, but idiomatic Go calls timer.Stop() on all branches to release the internal runtime resources promptly. The fix: case <-timer.C: timer.Stop() or use defer timer.Stop() before the select.

**[MINOR]** Timer leak on the happy path of the retry loop. When `timer.C` fires normally (case `<-timer.C`), `timer.Stop()` is never called. The timer has already fired so this is not a resource leak in the traditional sense, but idiomatic Go calls `timer.Stop()` on all branches to release the internal runtime resources promptly. The fix: `case <-timer.C: timer.Stop()` or use `defer timer.Stop()` before the select.
req.Header.Set("Authorization", "Bearer "+c.token)
}
req.Header.Set("User-Agent", userAgent)
Review

[NIT] The timer pattern timer.Stop() // no-op after fire; kept for symmetry in the case <-timer.C branch is slightly misleading. Stop() after a channel receive is indeed a no-op since the channel has already fired, but the comment could be clearer: timer.Stop() after <-timer.C does nothing and could simply be omitted. This is a readability nit, not a correctness issue.

**[NIT]** The timer pattern `timer.Stop() // no-op after fire; kept for symmetry` in the `case <-timer.C` branch is slightly misleading. `Stop()` after a channel receive is indeed a no-op since the channel has already fired, but the comment could be clearer: `timer.Stop()` after `<-timer.C` does nothing and could simply be omitted. This is a readability nit, not a correctness issue.
11
@@ -232,7 +248,8 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
// Retry on 429 rate limit
if resp.StatusCode == http.StatusTooManyRequests && attempt < maxAttempts-1 {
// Check for Retry-After header and override backoff if present
// Check for Retry-After header and override backoff if present.
// Supports both integer seconds (common) and HTTP-date format (RFC 7231).
if ra := resp.Header.Get("Retry-After"); ra != "" {
if seconds, err := strconv.Atoi(ra); err == nil && seconds > 0 {
delay := time.Duration(seconds) * time.Second
@@ -242,6 +259,17 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
if attempt < len(backoff) {
Outdated
Review

[MINOR] The int64 cast in if int64(len(body)) >= maxResponseBytes is unnecessary since maxResponseBytes is an untyped constant and len() returns int. Both sides of the comparison are int-typed. The cast is harmless but adds noise.

**[MINOR]** The `int64` cast in `if int64(len(body)) >= maxResponseBytes` is unnecessary since `maxResponseBytes` is an untyped constant and `len()` returns `int`. Both sides of the comparison are `int`-typed. The cast is harmless but adds noise.
backoff[attempt] = delay
}
} else if t, err := http.ParseTime(ra); err == nil {
delay := time.Until(t)
Review

[NIT] In doRequest, after the handleResponse call, there's a check if resp.StatusCode == http.StatusTooManyRequests but resp could theoretically have been closed by handleResponse. The body is closed, but the resp.StatusCode field is still accessible on the struct. This is correct and safe in Go's net/http — just worth being aware of.

**[NIT]** In `doRequest`, after the `handleResponse` call, there's a check `if resp.StatusCode == http.StatusTooManyRequests` but `resp` could theoretically have been closed by `handleResponse`. The body is closed, but the `resp.StatusCode` field is still accessible on the struct. This is correct and safe in Go's net/http — just worth being aware of.
if delay < 0 {
delay = 0
}
if delay > maxRetryAfter {
delay = maxRetryAfter
}
if attempt < len(backoff) {
backoff[attempt] = delay
}
}
}
continue
+81
View File
4
@@ -263,6 +263,84 @@ func TestDoRequest_RetryAfterDoesNotMutateBackoff(t *testing.T) {
}
}
func TestDoRequest_429RetryAfterHTTPDate(t *testing.T) {
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
// Use HTTP-date format (RFC 7231) — a time 2 seconds in the future.
future := time.Now().Add(2 * time.Second).UTC()
w.Header().Set("Retry-After", future.Format(http.TimeFormat))
w.WriteHeader(429)
w.Write([]byte(`{"message":"rate limit"}`))
return
}
w.WriteHeader(200)
w.Write([]byte(`{"ok":true}`))
}))
defer srv.Close()
c := NewClient("token", srv.URL, AllowInsecureHTTP())
c.SetHTTPClient(srv.Client())
Outdated
Review

[NIT] TestDoRequest_LimitsResponseBody tests a constant value rather than actual behavior. The comment acknowledges this limitation. This is acceptable as a documentation-style test, but it adds no real safety guarantee — if maxResponseBytes is set correctly but the io.LimitReader call is removed, the test would still pass. Consider removing it or replacing with a test that actually sends a response exceeding the limit.

**[NIT]** `TestDoRequest_LimitsResponseBody` tests a constant value rather than actual behavior. The comment acknowledges this limitation. This is acceptable as a documentation-style test, but it adds no real safety guarantee — if `maxResponseBytes` is set correctly but the `io.LimitReader` call is removed, the test would still pass. Consider removing it or replacing with a test that actually sends a response exceeding the limit.
c.SetRetryBackoff([]time.Duration{1 * time.Millisecond, 1 * time.Millisecond})
start := time.Now()
body, err := c.doGet(context.Background(), srv.URL+"/test")
elapsed := time.Since(start)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if string(body) != `{"ok":true}` {
t.Errorf("unexpected body: %s", body)
Outdated
Review

[NIT] TestDoRequest_LimitsResponseBody doesn't actually test the limit behavior — it only checks the constant value. The comment acknowledges this. This is a weak test; consider removing it or replacing it with an actual test using a small limit (e.g., setting maxResponseBytes to a small value via a test helper, or accepting that this particular boundary isn't testable without refactoring).

**[NIT]** `TestDoRequest_LimitsResponseBody` doesn't actually test the limit behavior — it only checks the constant value. The comment acknowledges this. This is a weak test; consider removing it or replacing it with an actual test using a small limit (e.g., setting `maxResponseBytes` to a small value via a test helper, or accepting that this particular boundary isn't testable without refactoring).
}
if attempts != 2 {
t.Errorf("expected 2 attempts, got %d", attempts)
}
// HTTP-date was ~2s in the future; by the time client processes it,
// time.Until gives ~1-2s. Verify it's meaningfully delayed (not instant).
if elapsed < 500*time.Millisecond {
t.Errorf("expected meaningful delay from HTTP-date Retry-After, got %v", elapsed)
}
}
func TestDoRequest_429RetryAfterHTTPDateInPast(t *testing.T) {
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
// Use a time in the past — should result in zero/immediate retry.
past := time.Now().Add(-10 * time.Second).UTC()
w.Header().Set("Retry-After", past.Format(http.TimeFormat))
w.WriteHeader(429)
w.Write([]byte(`{"message":"rate limit"}`))
return
}
w.WriteHeader(200)
w.Write([]byte(`{"ok":true}`))
}))
defer srv.Close()
c := NewClient("token", srv.URL, AllowInsecureHTTP())
c.SetHTTPClient(srv.Client())
c.SetRetryBackoff([]time.Duration{5 * time.Second, 5 * time.Second})
start := time.Now()
_, err := c.doGet(context.Background(), srv.URL+"/test")
elapsed := time.Since(start)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if attempts != 2 {
t.Errorf("expected 2 attempts, got %d", attempts)
}
// Past date should override the 5s backoff to ~0
if elapsed > 500*time.Millisecond {
t.Errorf("expected near-instant retry for past HTTP-date, got %v", elapsed)
}
}
func TestDoRequest_SetsUserAgentHeader(t *testing.T) {
var gotUA string
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
@@ -392,4 +470,7 @@ func TestSetHTTPClient_NilRestoresDefault(t *testing.T) {
if c.httpClient.Timeout != 30*time.Second {
t.Errorf("expected 30s timeout, got %v", c.httpClient.Timeout)
}
if c.httpClient.CheckRedirect == nil {
t.Fatal("expected CheckRedirect policy after SetHTTPClient(nil)")
}
}
+4
View File
7
@@ -22,6 +22,10 @@ func (c *Client) GetFileContent(ctx context.Context, owner, repo, path, ref stri
// If the path points to a single file (not a directory), the API returns
// a JSON object instead of an array; this is handled by returning a
// single-element slice.
//
// Note: dot-segments ("." and "..") in the path are silently removed to
// prevent path traversal. This means a path like "foo/../bar" resolves
// to "foo/bar" rather than "bar".
func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]vcs.ContentEntry, error) {
reqURL := fmt.Sprintf("%s/repos/%s/%s/contents/%s",
c.baseURL, url.PathEscape(owner), url.PathEscape(repo), escapePath(path))
18
+4
View File
10
@@ -123,6 +123,10 @@ func (c *Client) GetPullRequestFiles(ctx context.Context, owner, repo string, nu
// GetFileContentAtRef fetches a file at a specific ref from a repo.
// If ref is empty, the query parameter is omitted (uses default branch).
//
// Note: dot-segments ("." and "..") in the path are silently removed to
Outdated
Review

[NIT] The GetFileContentAtRef method is placed in pr.go but it's also the implementation delegate for GetFileContent in files.go. Conceptually it belongs in files.go alongside the other file-related methods. The split works because Go doesn't care which file a method is in, but for navigability (per the style.md pattern of organizing by concept) moving it to files.go would be cleaner.

**[NIT]** The `GetFileContentAtRef` method is placed in `pr.go` but it's also the implementation delegate for `GetFileContent` in `files.go`. Conceptually it belongs in `files.go` alongside the other file-related methods. The split works because Go doesn't care which file a method is in, but for navigability (per the `style.md` pattern of organizing by concept) moving it to `files.go` would be cleaner.
// prevent path traversal. This means a path like "foo/../bar" resolves
// to "foo/bar" rather than "bar".
func (c *Client) GetFileContentAtRef(ctx context.Context, owner, repo, path, ref string) (string, error) {
reqURL := fmt.Sprintf("%s/repos/%s/%s/contents/%s",
Review

[NIT] The GetPullRequestFiles comment says 'Returns nil (not an empty slice) when the PR has no changed files' but if the first page returns an empty array, allFiles remains nil — this is correct. However the GitHub API actually returns an empty array for PRs with 0 changed files (valid edge case), so the nil vs empty distinction in the doc comment is accurate but subtle. No code change needed.

**[NIT]** The `GetPullRequestFiles` comment says 'Returns nil (not an empty slice) when the PR has no changed files' but if the first page returns an empty array, `allFiles` remains nil — this is correct. However the GitHub API actually returns an empty array for PRs with 0 changed files (valid edge case), so the nil vs empty distinction in the doc comment is accurate but subtle. No code change needed.
c.baseURL, url.PathEscape(owner), url.PathEscape(repo), escapePath(path))
19