feat(github): implement PRReader + FileReader client (#80) #93

Closed
rodin wants to merge 16 commits from review-bot-issue-80 into feature/github-support
3 changed files with 32 additions and 13 deletions
Showing only changes of commit 80af5037b2 - Show all commits
+20 -8
View File
2
@@ -27,9 +27,10 @@ const (
// It carries the status code so callers can distinguish between
// different failure modes (e.g. 404 vs 500).
//
// Note: Error() includes up to 200 bytes of the response body for debugging.
// Callers should avoid logging raw error messages in production if the upstream
// server may return sensitive details in error responses.
// The Body field stores up to 4 KiB of the raw response for programmatic
// inspection. Error() truncates to 200 bytes for safe logging, but callers
Outdated
Review

[NIT] APIError.Error includes up to 200 bytes of the response body. If callers log errors verbatim, this could leak server-provided details (e.g., repository names). Consider further redaction or requiring callers to log status codes without bodies in production.

**[NIT]** APIError.Error includes up to 200 bytes of the response body. If callers log errors verbatim, this could leak server-provided details (e.g., repository names). Consider further redaction or requiring callers to log status codes without bodies in production.
// should avoid logging or propagating Body directly in production since it may
// contain sensitive details from the upstream server.
Outdated
Review

[NIT] APIError.Error includes up to 200 bytes of the response body. If callers log errors, this could surface sensitive server details. Consider reducing or masking returned body content, or clearly documenting that callers should avoid logging raw error messages in production.

**[NIT]** APIError.Error includes up to 200 bytes of the response body. If callers log errors, this could surface sensitive server details. Consider reducing or masking returned body content, or clearly documenting that callers should avoid logging raw error messages in production.
type APIError struct {
StatusCode int
Body string
24
@@ -87,8 +88,9 @@ func AllowInsecureHTTP() ClientOption {
}
Outdated
Review

[MINOR] NewClient accepts any baseURL without enforcing HTTPS or validating against a trusted allowlist. If a misconfiguration allows an attacker-controlled baseURL, the client could send the Authorization token to an untrusted host or over plaintext HTTP.

**[MINOR]** NewClient accepts any baseURL without enforcing HTTPS or validating against a trusted allowlist. If a misconfiguration allows an attacker-controlled baseURL, the client could send the Authorization token to an untrusted host or over plaintext HTTP.
// Client interacts with the GitHub API.
// A Client is safe for concurrent use by multiple goroutines;
// however, SetHTTPClient and SetRetryBackoff must not be called concurrently with requests.
// A Client is safe for concurrent use by multiple goroutines.
// SetHTTPClient and SetRetryBackoff are intended for test setup only and must
// be called before any goroutines issue requests; they have no synchronization.
type Client struct {
Outdated
Review

[NIT] The SetHTTPClient and SetRetryBackoff methods are exported as test-injection points but documented as unsafe for concurrent use with requests. The convention in the patterns (style.md) suggests using unexported helpers or the functional options pattern for test injection. The current design works but exposes mutation methods on what is documented as a concurrent-safe type. A //nolint or explicit doc note would make the intent clearer.

**[NIT]** The `SetHTTPClient` and `SetRetryBackoff` methods are exported as test-injection points but documented as unsafe for concurrent use with requests. The convention in the patterns (style.md) suggests using unexported helpers or the functional options pattern for test injection. The current design works but exposes mutation methods on what is documented as a concurrent-safe type. A `//nolint` or explicit doc note would make the intent clearer.
baseURL string
token string
23
@@ -141,9 +143,15 @@ func NewClient(token, baseURL string, opts ...ClientOption) *Client {
}
Outdated
Review

[NIT] The doRequest method signature uses url as a parameter name, which shadows the imported net/url package. While net/url is not imported in client.go (it's only in pr.go and files.go), this naming is a minor style concern since url as a local variable name is idiomatic but shadows the package name if ever imported. Not a bug here, just worth noting.

**[NIT]** The `doRequest` method signature uses `url` as a parameter name, which shadows the imported `net/url` package. While `net/url` is not imported in `client.go` (it's only in `pr.go` and `files.go`), this naming is a minor style concern since `url` as a local variable name is idiomatic but shadows the package name if ever imported. Not a bug here, just worth noting.
// SetHTTPClient sets the underlying HTTP client used for requests.
Review

[MINOR] SetHTTPClient allows replacing the underlying http.Client without enforcing a safe CheckRedirect policy, which could cause Authorization to be forwarded to untrusted hosts on redirects. The docstring warns about this, but adding a runtime safeguard (e.g., rejecting clients with nil CheckRedirect or wrapping it) would further reduce misuse risk.

**[MINOR]** SetHTTPClient allows replacing the underlying http.Client without enforcing a safe CheckRedirect policy, which could cause Authorization to be forwarded to untrusted hosts on redirects. The docstring warns about this, but adding a runtime safeguard (e.g., rejecting clients with nil CheckRedirect or wrapping it) would further reduce misuse risk.
// This is intended for testing to inject mock transports.
// This is intended for test setup only to inject mock transports; it must be
Outdated
Review

[NIT] Authorization scheme is hardcoded to "Bearer". While modern GitHub tokens support this, some environments still use the "token" scheme. Consider documenting the expected token type or allowing the scheme to be configurable if needed.

**[NIT]** Authorization scheme is hardcoded to "Bearer". While modern GitHub tokens support this, some environments still use the "token" scheme. Consider documenting the expected token type or allowing the scheme to be configurable if needed.
// called before any goroutines issue requests.
//
Outdated
Review

[MINOR] Authorization header always uses the "Bearer" scheme. Some GitHub token types (classic PAT) historically use the "token" scheme. Consider making the auth scheme configurable or auto-detectable to maximize compatibility.

**[MINOR]** Authorization header always uses the "Bearer" scheme. Some GitHub token types (classic PAT) historically use the "token" scheme. Consider making the auth scheme configurable or auto-detectable to maximize compatibility.
// Passing nil restores the default client (30s timeout + auth-stripping
Outdated
Review

[MINOR] Authorization header is sent to whatever baseURL is configured. If baseURL can be influenced by untrusted input, this could leak tokens to an attacker-controlled host (SSRF/token exfiltration). Ensure baseURL is treated as trusted configuration and consider allowlisting expected hosts at a higher layer.

**[MINOR]** Authorization header is sent to whatever baseURL is configured. If baseURL can be influenced by untrusted input, this could leak tokens to an attacker-controlled host (SSRF/token exfiltration). Ensure baseURL is treated as trusted configuration and consider allowlisting expected hosts at a higher layer.
// CheckRedirect policy matching NewClient).
Outdated
Review

[MINOR] The CheckRedirect lambda is duplicated verbatim in both NewClient and SetHTTPClient(nil). Extract it to a package-level function (e.g., defaultCheckRedirect) to eliminate the duplication and ensure both code paths stay in sync when the policy changes.

**[MINOR]** The CheckRedirect lambda is duplicated verbatim in both NewClient and SetHTTPClient(nil). Extract it to a package-level function (e.g., `defaultCheckRedirect`) to eliminate the duplication and ensure both code paths stay in sync when the policy changes.
//
Outdated
Review

[NIT] Retry-After parsing only handles delta-seconds via Atoi. RFC 7231 allows an HTTP-date format; optionally support parsing HTTP-date to fully respect server guidance.

**[NIT]** Retry-After parsing only handles delta-seconds via Atoi. RFC 7231 allows an HTTP-date format; optionally support parsing HTTP-date to fully respect server guidance.
// Callers providing a non-nil client are responsible for configuring a safe
// CheckRedirect policy. Without one, the default net/http behavior will follow
Outdated
Review

[MINOR] When c.http.Do(req) returns an error (network failure, context cancellation), the function returns immediately without retrying. For transient network errors, a retry could be valuable. The current behavior is reasonable for the stated scope (only retry on 429), but the comment says 'It respects the Retry-After header when present' without mentioning the no-retry-on-transport-error behavior. This is a documentation gap rather than a bug.

**[MINOR]** When `c.http.Do(req)` returns an error (network failure, context cancellation), the function returns immediately without retrying. For transient network errors, a retry could be valuable. The current behavior is reasonable for the stated scope (only retry on 429), but the comment says 'It respects the Retry-After header when present' without mentioning the no-retry-on-transport-error behavior. This is a documentation gap rather than a bug.
// redirects and may forward the Authorization header to untrusted hosts.
func (c *Client) SetHTTPClient(hc *http.Client) {
Outdated
Review

[MINOR] The security check if !c.allowInsecureHTTP && req.URL.Scheme != "https" is performed inside the retry loop, meaning it will fail on every retry attempt rather than being checked once before the loop starts. Since the URL doesn't change between retries, this is wasteful and the error message is slightly misleading (it mentions req.URL.Host but the real issue is the scheme). Moving the check before the retry loop or to NewClient would be cleaner.

**[MINOR]** The security check `if !c.allowInsecureHTTP && req.URL.Scheme != "https"` is performed inside the retry loop, meaning it will fail on every retry attempt rather than being checked once before the loop starts. Since the URL doesn't change between retries, this is wasteful and the error message is slightly misleading (it mentions `req.URL.Host` but the real issue is the scheme). Moving the check before the retry loop or to `NewClient` would be cleaner.
if hc == nil {
hc = &http.Client{
security-review-bot marked this conversation as resolved Outdated
Outdated
Review

[MINOR] Retry-After header is applied without an upper bound. If the API (or a malicious endpoint in a misconfigured environment) returns an excessively large value, the client may sleep for a very long time, enabling a denial-of-service style delay. Consider capping the duration to a sane maximum.

**[MINOR]** Retry-After header is applied without an upper bound. If the API (or a malicious endpoint in a misconfigured environment) returns an excessively large value, the client may sleep for a very long time, enabling a denial-of-service style delay. Consider capping the duration to a sane maximum.
4
@@ -155,6 +163,7 @@ func (c *Client) SetHTTPClient(hc *http.Client) {
}
Outdated
Review

[MINOR] The Retry-After handling mutates the backoff slice in-place: backoff[attempt] = time.Duration(seconds) * time.Second. When c.RetryBackoff is non-nil (e.g. in tests), this modifies the caller's slice, which is surprising and could cause test pollution if the same slice is reused. A local copy should be made before mutation, or the mutation should only apply to the local backoff variable (which it does when RetryBackoff is nil since a new slice is allocated, but not when it's non-nil).

**[MINOR]** The Retry-After handling mutates the `backoff` slice in-place: `backoff[attempt] = time.Duration(seconds) * time.Second`. When `c.RetryBackoff` is non-nil (e.g. in tests), this modifies the caller's slice, which is surprising and could cause test pollution if the same slice is reused. A local copy should be made before mutation, or the mutation should only apply to the local `backoff` variable (which it does when `RetryBackoff` is nil since a new slice is allocated, but not when it's non-nil).
// SetRetryBackoff configures the retry backoff durations for testing.
// It must be called before any goroutines issue requests.
// In production the default {1s, 2s} applies.
func (c *Client) SetRetryBackoff(d []time.Duration) {
Outdated
Review

[MINOR] After a successful response is read, resp.Body.Close() is called directly after io.ReadAll. If io.ReadAll returns an error (e.g. partial read), the body is still closed via the subsequent line, which is fine. However, the pattern is slightly inconsistent with the error path below it — consider using defer resp.Body.Close() paired with a drain before close on the error path for symmetry. This is purely stylistic; the current approach is correct.

**[MINOR]** After a successful response is read, `resp.Body.Close()` is called directly after `io.ReadAll`. If `io.ReadAll` returns an error (e.g. partial read), the body is still closed via the subsequent line, which is fine. However, the pattern is slightly inconsistent with the error path below it — consider using `defer resp.Body.Close()` paired with a drain before close on the error path for symmetry. This is purely stylistic; the current approach is correct.
c.retryBackoff = d
5
@@ -175,7 +184,10 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
backoff = []time.Duration{1 * time.Second, 2 * time.Second}
}
Outdated
Review

[NIT] The timer in the retry loop has a comment // Timer already fired; Stop() is a no-op here. — this is slightly inaccurate. Stop() on an already-fired timer is not a no-op; it returns false and the channel has already been drained by the case <-timer.C: branch, so there's no leak. The comment could be clearer: // Channel already drained; no need to call Stop().

**[NIT]** The timer in the retry loop has a comment `// Timer already fired; Stop() is a no-op here.` — this is slightly inaccurate. `Stop()` on an already-fired timer is not a no-op; it returns false and the channel has already been drained by the `case <-timer.C:` branch, so there's no leak. The comment could be clearer: `// Channel already drained; no need to call Stop().`
const maxErrorBodyBytes = 64 * 1024
// maxErrorBodyBytes limits how much of an error response body is stored.
// Kept small (4 KiB) to reduce the risk of sensitive data leakage if callers
Outdated
Review

[MINOR] Authorization header uses the "Bearer" scheme unconditionally. GitHub REST v3 commonly expects the "token" scheme for classic/fine-grained PATs, while "Bearer" is used in some flows (e.g., OAuth or app tokens). Consider supporting both schemes or making the scheme configurable to maximize compatibility.

**[MINOR]** Authorization header uses the "Bearer" scheme unconditionally. GitHub REST v3 commonly expects the "token" scheme for classic/fine-grained PATs, while "Bearer" is used in some flows (e.g., OAuth or app tokens). Consider supporting both schemes or making the scheme configurable to maximize compatibility.
// log APIError.Body directly. Error() further truncates to 200 bytes.
const maxErrorBodyBytes = 4 * 1024
// Reject non-HTTPS URLs early since the URL is immutable across retries.
if c.token != "" && !c.allowInsecureHTTP {
Outdated
Review

[MINOR] Retry-After parsing only supports integer seconds. The HTTP spec allows an HTTP-date as well; consider parsing both forms to be more robust (fallback to date parsing when Atoi fails).

**[MINOR]** Retry-After parsing only supports integer seconds. The HTTP spec allows an HTTP-date as well; consider parsing both forms to be more robust (fallback to date parsing when Atoi fails).
4
@@ -199,7 +211,7 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
timer := time.NewTimer(delay)
select {
case <-timer.C:
// Backoff elapsed, proceed with retry.
timer.Stop() // no-op after fire, releases runtime resources promptly
case <-ctx.Done():
Outdated
Review

[NIT] Calling timer.Stop() after the timer has already fired is unnecessary. It’s harmless, but can be removed for clarity; only the ctx.Done() path needs Stop before return.

**[NIT]** Calling timer.Stop() after the timer has already fired is unnecessary. It’s harmless, but can be removed for clarity; only the ctx.Done() path needs Stop before return.
timer.Stop()
return nil, ctx.Err()
Outdated
Review

[MINOR] Timer leak on the happy path of the retry loop. When timer.C fires normally (case <-timer.C), timer.Stop() is never called. The timer has already fired so this is not a resource leak in the traditional sense, but idiomatic Go calls timer.Stop() on all branches to release the internal runtime resources promptly. The fix: case <-timer.C: timer.Stop() or use defer timer.Stop() before the select.

**[MINOR]** Timer leak on the happy path of the retry loop. When `timer.C` fires normally (case `<-timer.C`), `timer.Stop()` is never called. The timer has already fired so this is not a resource leak in the traditional sense, but idiomatic Go calls `timer.Stop()` on all branches to release the internal runtime resources promptly. The fix: `case <-timer.C: timer.Stop()` or use `defer timer.Stop()` before the select.
14
+8 -5
View File
9
@@ -42,6 +42,8 @@ func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]
// The GitHub contents API returns an array for directories and an object
// for single files. Try array first (common case), then fall back to object.
Outdated
Review

[MINOR] The fallback from array→object JSON parsing uses err from the outer json.Unmarshal in the error return (fmt.Errorf("parse contents JSON: %w", err)), but err here is the array-parse error ('cannot unmarshal object into Go value of type []entry'), which is misleading — the real parse failure is err2. Should return err2 instead: fmt.Errorf("parse contents JSON: %w", err2).

**[MINOR]** The fallback from array→object JSON parsing uses `err` from the outer `json.Unmarshal` in the error return (`fmt.Errorf("parse contents JSON: %w", err)`), but `err` here is the array-parse error ('cannot unmarshal object into Go value of type []entry'), which is misleading — the real parse failure is `err2`. Should return `err2` instead: `fmt.Errorf("parse contents JSON: %w", err2)`.
Outdated
Review

[MINOR] The fallback from array to object JSON parsing is an unusual approach. The error from the first json.Unmarshal (array attempt) is discarded silently. If the response is truly malformed JSON (not just an object vs array mismatch), only err2 from the single-entry unmarshal is surfaced. This is intentional per the comment but it means a corrupted array response that happens to partially parse as an object could go undetected. Consider checking if the body starts with [ vs { to choose the parsing path, which would be more explicit and preserve the original error on genuine parse failures.

**[MINOR]** The fallback from array to object JSON parsing is an unusual approach. The error from the first `json.Unmarshal` (array attempt) is discarded silently. If the response is truly malformed JSON (not just an object vs array mismatch), only `err2` from the single-entry unmarshal is surfaced. This is intentional per the comment but it means a corrupted array response that happens to partially parse as an object could go undetected. Consider checking if the body starts with `[` vs `{` to choose the parsing path, which would be more explicit and preserve the original error on genuine parse failures.
Outdated
Review

[NIT] The fallback from array to single-object JSON unmarshal silently swallows the original array parse error. If the body is valid JSON but neither an array nor an object matching the entry struct (e.g., a JSON number), the second unmarshal will succeed with zero values and return a single empty entry. Consider checking that single.Name != "" or similar before accepting the fallback.

**[NIT]** The fallback from array to single-object JSON unmarshal silently swallows the original array parse error. If the body is valid JSON but neither an array nor an object matching the `entry` struct (e.g., a JSON number), the second unmarshal will succeed with zero values and return a single empty entry. Consider checking that `single.Name != ""` or similar before accepting the fallback.
// An empty array ([]) is valid — it represents an empty directory — and
// results in a zero-length slice returned without error.
var entries []entry
if err := json.Unmarshal(body, &entries); err != nil {
var single entry
10
@@ -69,11 +71,12 @@ func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]
// escapePath escapes each segment of a relative file path for use in URLs.
// Slashes are preserved as path separators; other special characters are escaped.
// Dot-segments ("." and "..") are silently removed to prevent path traversal.
// This is intentional: callers may receive a different path than requested without
// error. The function is package-private, and all callers (GetFileContentAtRef,
// ListContents) already handle missing-file errors from the API if the cleaned
// path doesn't match what the caller intended.
// Dot-segments ("." and "..") and empty segments (from consecutive slashes like
Review

[MINOR] The dual-unmarshal fallback for array vs. object in ListContents is correct, but the error variable shadowing with err and err2 is slightly awkward. The outer err from array unmarshal is passed into the error string but err2 is wrapped with %w — this means only the object-unmarshal error is unwrappable. Consider wrapping both or using a different format. Low impact since callers only check for nil.

**[MINOR]** The dual-unmarshal fallback for array vs. object in `ListContents` is correct, but the error variable shadowing with `err` and `err2` is slightly awkward. The outer `err` from array unmarshal is passed into the error string but `err2` is wrapped with `%w` — this means only the object-unmarshal error is unwrappable. Consider wrapping both or using a different format. Low impact since callers only check for nil.
// "a//b") are silently removed to prevent path traversal and produce canonical
// paths. This is intentional: callers may receive a different path than requested
// without error. The function is package-private, and all callers
// (GetFileContentAtRef, ListContents) already handle missing-file errors from the
Outdated
Review

[NIT] The escapePath function silently drops empty segments produced by consecutive slashes (e.g., a//b becomes a/b). The existing documentation mentions dot-segment removal but not double-slash collapsing. Minor doc gap.

**[NIT]** The `escapePath` function silently drops empty segments produced by consecutive slashes (e.g., `a//b` becomes `a/b`). The existing documentation mentions dot-segment removal but not double-slash collapsing. Minor doc gap.
// API if the cleaned path doesn't match what the caller intended.
Review

[MINOR] The fallback logic for ListContents when the array unmarshal fails uses the error from the object unmarshal (err2) as the returned error, discarding the original array unmarshal error. This could produce a confusing error message if the response was genuinely a malformed array rather than a single-file object. The original err might be more informative in the pure malformed-JSON case.

**[MINOR]** The fallback logic for `ListContents` when the array unmarshal fails uses the error from the object unmarshal (`err2`) as the returned error, discarding the original array unmarshal error. This could produce a confusing error message if the response was genuinely a malformed array rather than a single-file object. The original `err` might be more informative in the pure malformed-JSON case.
func escapePath(p string) string {
parts := strings.Split(p, "/")
var clean []string
1
+4
View File
7
@@ -89,6 +89,8 @@ const maxPages = 100
// GetPullRequestFiles fetches the list of files changed in a PR.
// Paginates through all pages (100 per page) to collect all files.
// Returns nil (not an empty slice) when the PR has no changed files.
// Callers can safely range over or check len() on a nil slice.
func (c *Client) GetPullRequestFiles(ctx context.Context, owner, repo string, number int) ([]vcs.ChangedFile, error) {
var allFiles []vcs.ChangedFile
10
@@ -156,6 +158,8 @@ func (c *Client) GetFileContentAtRef(ctx context.Context, owner, repo, path, ref
// GetCommitStatuses fetches both commit statuses and check runs for a SHA,
// merging them into a unified []vcs.CommitStatus slice.
// If the commit statuses endpoint fails (e.g. 404 for an unknown SHA), the
// function returns immediately without attempting the check-runs endpoint.
func (c *Client) GetCommitStatuses(ctx context.Context, owner, repo, sha string) ([]vcs.CommitStatus, error) {
var result []vcs.CommitStatus
14