feat(github): implement PRReader + FileReader client (#80) #93
@@ -93,11 +93,16 @@ func (c *Client) SetHTTPClient(hc *http.Client) {
|
||||
}
|
||||
|
||||
|
|
||||
// doRequest performs an HTTP request with retry on 429 rate limit responses.
|
||||
// It respects the Retry-After header when present.
|
||||
// It respects the Retry-After header when present (capped at maxRetryAfter).
|
||||
func (c *Client) doRequest(ctx context.Context, method, url string, accept string) ([]byte, error) {
|
||||
const maxAttempts = 3
|
||||
backoff := c.RetryBackoff
|
||||
if backoff == nil {
|
||||
const maxRetryAfter = 120 * time.Second
|
||||
|
[MINOR] CheckRedirect strips Authorization on cross-host or https→http redirects but still follows the redirect. Following cross-host redirects can be an SSRF vector in misconfigured environments; consider blocking cross-host redirects entirely rather than proceeding without Authorization. **[MINOR]** CheckRedirect strips Authorization on cross-host or https→http redirects but still follows the redirect. Following cross-host redirects can be an SSRF vector in misconfigured environments; consider blocking cross-host redirects entirely rather than proceeding without Authorization.
|
||||
|
||||
var backoff []time.Duration
|
||||
|
gpt-review-bot
commented
[NIT] The parameter name 'url' in doRequest shadows the conceptual URL type used elsewhere (e.g., net/url in other files). Renaming to 'reqURL' could improve clarity, though this is purely stylistic. **[NIT]** The parameter name 'url' in doRequest shadows the conceptual URL type used elsewhere (e.g., net/url in other files). Renaming to 'reqURL' could improve clarity, though this is purely stylistic.
|
||||
if c.RetryBackoff != nil {
|
||||
backoff = make([]time.Duration, len(c.RetryBackoff))
|
||||
copy(backoff, c.RetryBackoff)
|
||||
|
sonnet-review-bot
commented
[NIT] The doc comment for **[NIT]** The doc comment for `Client` says `SetHTTPClient` and `SetRetryBackoff` must not be called concurrently with requests, but these are public methods and there's no enforcement or `noCopy` guard. Given the concurrent-use note, a brief comment in `SetHTTPClient` and `SetRetryBackoff` reiterating the constraint would help (though this matches the stdlib's `tls.Config` immutable-after-use convention, so it's acceptable as-is).
|
||||
} else {
|
||||
backoff = []time.Duration{1 * time.Second, 2 * time.Second}
|
||||
}
|
||||
|
[MINOR] The CheckRedirect handler allows following cross-host and HTTPS→HTTP redirects (while stripping Authorization). Although the token isn’t leaked, following cross-host redirects may contact untrusted hosts. Consider restricting redirects to same-host and HTTPS-only or fail on cross-host redirects to reduce SSRF-style risks. **[MINOR]** The CheckRedirect handler allows following cross-host and HTTPS→HTTP redirects (while stripping Authorization). Although the token isn’t leaked, following cross-host redirects may contact untrusted hosts. Consider restricting redirects to same-host and HTTPS-only or fail on cross-host redirects to reduce SSRF-style risks.
|
||||
|
||||
@@ -125,7 +130,9 @@ func (c *Client) doRequest(ctx context.Context, method, url string, accept strin
|
||||
if err != nil {
|
||||
|
sonnet-review-bot
commented
[MINOR] The Retry-After override mutates the **[MINOR]** The Retry-After override mutates the `backoff` slice in-place. When `c.RetryBackoff` is non-nil (set by tests or callers), `backoff` is a reference to the same slice. This means a Retry-After header on attempt 0 will permanently modify `c.RetryBackoff[0]` on the caller's struct, which is surprising shared-state mutation. Either copy the slice at the start of doRequest, or use a local variable for the overridden delay rather than mutating the slice.
[MINOR] Successful responses are read with io.ReadAll without a size limit. If the configured baseURL responds with a very large body, this could lead to memory exhaustion (DoS). Consider bounding successful response sizes or streaming with explicit limits. **[MINOR]** Successful responses are read with io.ReadAll without a size limit. If the configured baseURL responds with a very large body, this could lead to memory exhaustion (DoS). Consider bounding successful response sizes or streaming with explicit limits.
|
||||
return nil, fmt.Errorf("create request: %w", err)
|
||||
|
gpt-review-bot
commented
[NIT] The doRequest parameter is named 'url', which can be confusing since 'url' is also a common package import name. Consider renaming to 'reqURL' for clarity and to avoid future shadowing if net/url is imported. **[NIT]** The doRequest parameter is named 'url', which can be confusing since 'url' is also a common package import name. Consider renaming to 'reqURL' for clarity and to avoid future shadowing if net/url is imported.
|
||||
}
|
||||
req.Header.Set("Authorization", "Bearer "+c.token)
|
||||
if c.token != "" {
|
||||
|
gpt-review-bot
commented
[NIT] Calling timer.Stop() after the timer has already fired is unnecessary. You could simplify the delay logic with select { case <-time.After(delay): case <-ctx.Done(): } or only Stop in the ctx.Done branch. **[NIT]** Calling timer.Stop() after the timer has already fired is unnecessary. You could simplify the delay logic with select { case <-time.After(delay): case <-ctx.Done(): } or only Stop in the ctx.Done branch.
|
||||
req.Header.Set("Authorization", "Bearer "+c.token)
|
||||
}
|
||||
if accept != "" {
|
||||
|
gpt-review-bot
commented
[MINOR] Retry-After is parsed only as delta-seconds; per RFC 7231 it may also be an HTTP-date. Consider falling back to parsing an HTTP-date when Atoi fails to honor server guidance more robustly. **[MINOR]** Retry-After is parsed only as delta-seconds; per RFC 7231 it may also be an HTTP-date. Consider falling back to parsing an HTTP-date when Atoi fails to honor server guidance more robustly.
|
||||
req.Header.Set("Accept", accept)
|
||||
|
gpt-review-bot
commented
[MINOR] Non-HTTPS rejection checks strings.HasPrefix(url, "https://"). This is case-sensitive and string-based; safer to parse with net/url and check URL.Scheme case-insensitively (e.g., u.Scheme == "https"). **[MINOR]** Non-HTTPS rejection checks strings.HasPrefix(url, "https://"). This is case-sensitive and string-based; safer to parse with net/url and check URL.Scheme case-insensitively (e.g., u.Scheme == "https").
|
||||
} else {
|
||||
@@ -156,8 +163,12 @@ func (c *Client) doRequest(ctx context.Context, method, url string, accept strin
|
||||
// Check for Retry-After header and override backoff if present
|
||||
|
sonnet-review-bot
commented
[MINOR] The Retry-After handling mutates the **[MINOR]** The Retry-After handling mutates the `backoff` slice in-place: `backoff[attempt] = time.Duration(seconds) * time.Second`. When `c.RetryBackoff` is non-nil (e.g. in tests), this modifies the caller's slice, which is surprising and could cause test pollution if the same slice is reused. A local copy should be made before mutation, or the mutation should only apply to the local `backoff` variable (which it does when `RetryBackoff` is nil since a new slice is allocated, but not when it's non-nil).
|
||||
if ra := resp.Header.Get("Retry-After"); ra != "" {
|
||||
if seconds, err := strconv.Atoi(ra); err == nil && seconds > 0 {
|
||||
delay := time.Duration(seconds) * time.Second
|
||||
if delay > maxRetryAfter {
|
||||
delay = maxRetryAfter
|
||||
|
sonnet-review-bot
commented
[MINOR] After a successful response is read, **[MINOR]** After a successful response is read, `resp.Body.Close()` is called directly after `io.ReadAll`. If `io.ReadAll` returns an error (e.g. partial read), the body is still closed via the subsequent line, which is fine. However, the pattern is slightly inconsistent with the error path below it — consider using `defer resp.Body.Close()` paired with a drain before close on the error path for symmetry. This is purely stylistic; the current approach is correct.
|
||||
}
|
||||
if attempt < len(backoff) {
|
||||
|
gpt-review-bot
commented
[NIT] Authorization header uses the 'Bearer' scheme. GitHub PATs typically use the 'token' scheme; while 'Bearer' may work for some token types, consider aligning with GitHub’s documented scheme or making it configurable. **[NIT]** Authorization header uses the 'Bearer' scheme. GitHub PATs typically use the 'token' scheme; while 'Bearer' may work for some token types, consider aligning with GitHub’s documented scheme or making it configurable.
|
||||
backoff[attempt] = time.Duration(seconds) * time.Second
|
||||
backoff[attempt] = delay
|
||||
}
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] The comment **[MINOR]** The comment `// Timer already fired; Stop() is a no-op here.` is misleading. `time.NewTimer` creates a timer that fires after the duration; when the select case `<-timer.C` fires, the timer has already expired so `Stop()` does return false, but the comment conflates 'fired' with 'Stop is no-op'. More importantly, there's no `defer timer.Stop()` before the select — while technically fine here because the goroutine either drains the channel or returns, the pattern is fragile and deviates from the canonical `defer timer.Stop()` idiom. The standard pattern is: `timer := time.NewTimer(delay); defer timer.Stop(); select { case <-timer.C: ... case <-ctx.Done(): ... }`. The current code avoids the leak correctly but via an atypical path.
|
||||
}
|
||||
|
||||
@@ -185,6 +185,9 @@ func TestIsUnauthorized(t *testing.T) {
|
||||
}
|
||||
|
||||
func TestDoRequest_429RetryAfterHeader(t *testing.T) {
|
||||
if testing.Short() {
|
||||
t.Skip("skipping slow retry test in short mode")
|
||||
}
|
||||
attempts := 0
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
attempts++
|
||||
@@ -222,3 +225,39 @@ func TestDoRequest_429RetryAfterHeader(t *testing.T) {
|
||||
t.Errorf("expected ~1s delay from Retry-After, got %v", elapsed)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDoRequest_RetryAfterDoesNotMutateBackoff(t *testing.T) {
|
||||
if testing.Short() {
|
||||
|
sonnet-review-bot
commented
[NIT] **[NIT]** `TestDoRequest_LimitsResponseBody` allocates `maxResponseBytes+1024` bytes (over 10 MiB) as a string for each test run. The comment acknowledges this but the fix isn't applied — consider using `io.LimitedReader` in the test handler or generating the body lazily to avoid the 10 MiB allocation in tests.
|
||||
t.Skip("skipping slow retry test in short mode")
|
||||
}
|
||||
attempts := 0
|
||||
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
|
||||
attempts++
|
||||
if attempts == 1 {
|
||||
w.Header().Set("Retry-After", "1")
|
||||
w.WriteHeader(429)
|
||||
w.Write([]byte(`{"message":"rate limit"}`))
|
||||
return
|
||||
}
|
||||
w.WriteHeader(200)
|
||||
w.Write([]byte(`{"ok":true}`))
|
||||
}))
|
||||
defer srv.Close()
|
||||
|
||||
c := NewClient("token", srv.URL)
|
||||
c.SetHTTPClient(srv.Client())
|
||||
c.RetryBackoff = []time.Duration{1 * time.Millisecond, 1 * time.Millisecond}
|
||||
|
||||
_, err := c.doGet(context.Background(), srv.URL+"/test")
|
||||
if err != nil {
|
||||
t.Fatalf("unexpected error: %v", err)
|
||||
}
|
||||
|
||||
// Verify the original RetryBackoff slice was not mutated
|
||||
if c.RetryBackoff[0] != 1*time.Millisecond {
|
||||
t.Errorf("RetryBackoff[0] was mutated: got %v, want 1ms", c.RetryBackoff[0])
|
||||
}
|
||||
if c.RetryBackoff[1] != 1*time.Millisecond {
|
||||
t.Errorf("RetryBackoff[1] was mutated: got %v, want 1ms", c.RetryBackoff[1])
|
||||
}
|
||||
}
|
||||
|
||||
@@ -11,8 +11,8 @@ import (
|
||||
"gitea.weiker.me/rodin/review-bot/vcs"
|
||||
)
|
||||
|
||||
// GetFileContent fetches a file from the default branch of a repo.
|
||||
// Delegates to GetFileContentAtRef with an empty ref.
|
||||
// GetFileContent fetches a file from a repo at the given ref.
|
||||
// Delegates to GetFileContentAtRef with the provided ref.
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `GetFileContent` signature accepts a `ref` parameter and delegates to `GetFileContentAtRef`, making `GetFileContent` essentially an alias. The doc comment says 'Fetches a file from the default branch' but it actually passes through whatever ref is given. If the interface requires `GetFileContent` to always use the default branch, the `ref` parameter shouldn't be accepted (or it should be ignored). If it accepts a ref, the doc comment is misleading.
gpt-review-bot
commented
[MINOR] Comment for GetFileContent is misleading: it states delegation with an empty ref, but the function accepts and forwards a ref parameter. Update the comment to reflect behavior (delegates to GetFileContentAtRef; default branch is used only when ref is empty). **[MINOR]** Comment for GetFileContent is misleading: it states delegation with an empty ref, but the function accepts and forwards a ref parameter. Update the comment to reflect behavior (delegates to GetFileContentAtRef; default branch is used only when ref is empty).
|
||||
func (c *Client) GetFileContent(ctx context.Context, owner, repo, path, ref string) (string, error) {
|
||||
|
sonnet-review-bot
commented
[NIT] **[NIT]** `GetFileContent` simply delegates to `GetFileContentAtRef`. The doc comment says 'Delegates to GetFileContentAtRef with the provided ref' which is accurate but redundant given that the implementation is a one-liner. The `vcs.FileReader` interface likely has a different signature than `vcs.PRReader`'s method — this delegation is the right approach, but the doc comment could explain *why* this wrapper exists (to satisfy the FileReader interface which has a different signature from GetFileContentAtRef).
|
||||
return c.GetFileContentAtRef(ctx, owner, repo, path, ref)
|
||||
|
sonnet-review-bot
commented
[MINOR] GetFileContent's signature includes a **[MINOR]** GetFileContent's signature includes a `ref` parameter but the doc comment says 'Delegates to GetFileContentAtRef with an empty ref', which contradicts the actual implementation (it passes `ref` through). The doc comment is wrong and should say 'Delegates to GetFileContentAtRef with the provided ref'.
sonnet-review-bot
commented
[MINOR] GetFileContent is defined in files.go but it's a pure delegation to GetFileContentAtRef which lives in pr.go. Both implement the FileReader interface, but splitting the delegation from the implementation across files is mildly confusing. Both could live in files.go or the delegation could be removed and GetFileContentAtRef used directly. Minor organization issue. **[MINOR]** GetFileContent is defined in files.go but it's a pure delegation to GetFileContentAtRef which lives in pr.go. Both implement the FileReader interface, but splitting the delegation from the implementation across files is mildly confusing. Both could live in files.go or the delegation could be removed and GetFileContentAtRef used directly. Minor organization issue.
|
||||
}
|
||||
|
sonnet-review-bot
commented
[NIT] **[NIT]** `GetFileContent` doc says 'Delegates to GetFileContentAtRef with the provided ref' — this is accurate but the function adds no value over calling `GetFileContentAtRef` directly. This is presumably required to satisfy a `FileReader` interface. If that's the case, a comment noting it satisfies the interface would help readers understand why this thin wrapper exists.
|
||||
@@ -47,12 +47,17 @@ func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]
|
||||
|
||||
// escapePath escapes each segment of a relative file path for use in URLs.
|
||||
// Slashes are preserved as path separators; other special characters are escaped.
|
||||
// Dot-segments ("." and "..") are removed to prevent path traversal.
|
||||
|
security-review-bot marked this conversation as resolved
Outdated
[MINOR] escapePath preserves dot-segments ("." and "..") in path components. Some servers normalize dot-segments, which could let an untrusted path escape the intended "contents" endpoint (e.g., "/contents/../../pulls"). While this stays on the same host, it could lead to unexpected behavior. Consider rejecting or encoding dot-segments to prevent path traversal in URL construction. **[MINOR]** escapePath preserves dot-segments ("." and "..") in path components. Some servers normalize dot-segments, which could let an untrusted path escape the intended "contents" endpoint (e.g., "/contents/../../pulls"). While this stays on the same host, it could lead to unexpected behavior. Consider rejecting or encoding dot-segments to prevent path traversal in URL construction.
|
||||
func escapePath(p string) string {
|
||||
|
sonnet-review-bot
commented
[MINOR] Silent empty-array handling: when the GitHub Contents API returns an empty JSON array **[MINOR]** Silent empty-array handling: when the GitHub Contents API returns an empty JSON array `[]`, `json.Unmarshal` into `[]entry` succeeds with `len(entries) == 0` and the function returns an empty slice without error. This is probably the correct behavior for an empty directory, but it's undocumented and could silently mask an unexpected server response. A comment clarifying this intent would be valuable.
|
||||
parts := strings.Split(p, "/")
|
||||
for i, part := range parts {
|
||||
parts[i] = url.PathEscape(part)
|
||||
var clean []string
|
||||
for _, part := range parts {
|
||||
if part == "." || part == ".." || part == "" {
|
||||
continue
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `ListContents` fallback from array→object parse uses `err2` for the object parse error but discards the original array parse error `err`. If both fail (e.g., the response is valid JSON but neither an array nor an object matching `entry`), only the object-unmarshal error is returned, which may be less informative. This is a minor diagnostics issue.
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] **[MINOR]** `escapePath` silently removes `..` and `.` segments rather than returning an error. For a security-sensitive operation (path traversal prevention), silently stripping these is arguably better than erroring, but the caller has no way to know the path was modified. The function is package-private so this is contained, but callers like `GetFileContentAtRef` will fetch a different path than requested without any indication. A comment explaining the deliberate behavior (not a bug) would improve maintainability.
|
||||
clean = append(clean, url.PathEscape(part))
|
||||
|
sonnet-review-bot
commented
[MINOR] The **[MINOR]** The `ListContents` function silently accepts `{}` (empty JSON object) as a valid response — the fallback from array parse failure to single-object parse succeeds for `{}`, and the `single.Name == "" && single.Path == "" && single.Type == ""` guard catches this case correctly. However, `json.Unmarshal(body, &entries)` where body is `{}` (an object, not array) will return an error, and the fallback will parse it as an empty entry — the zero-value check does catch this. This is correct, but the logic is subtle. A brief inline comment explaining that the zero-value check guards against `{}` or unexpected object shapes would help future readers.
|
||||
}
|
||||
|
gpt-review-bot
commented
[MINOR] escapePath silently drops '.' and '..' segments, intentionally returning a potentially different path without error. While documented, this may hide caller mistakes. Consider returning an error on traversal attempts or preserving segments and letting the API respond. **[MINOR]** escapePath silently drops '.' and '..' segments, intentionally returning a potentially different path without error. While documented, this may hide caller mistakes. Consider returning an error on traversal attempts or preserving segments and letting the API respond.
sonnet-review-bot
commented
[NIT] The fallback for **[NIT]** The fallback for `{}` (empty object) guards against `single.Name == "" && single.Path == "" && single.Type == ""`. A partial response with only `type` set (e.g. `{"type":"file"}`) would pass this guard and be returned as a valid entry with empty name/path. This edge case is unlikely from GitHub but the guard could be tightened to `single.Name == ""` alone since Name is the minimal required field.
|
||||
return strings.Join(parts, "/")
|
||||
return strings.Join(clean, "/")
|
||||
}
|
||||
|
gpt-review-bot
commented
[MINOR] decodeBase64Content strips only '\n'. GitHub or proxies may use CRLF line breaks. Consider removing both '\r' and '\n' (or all whitespace) before decoding, or use base64.NewDecoder which tolerates newlines. **[MINOR]** decodeBase64Content strips only '\n'. GitHub or proxies may use CRLF line breaks. Consider removing both '\r' and '\n' (or all whitespace) before decoding, or use base64.NewDecoder which tolerates newlines.
|
||||
|
||||
// decodeBase64Content decodes base64-encoded content from the GitHub contents API.
|
||||
|
gpt-review-bot
commented
[NIT] escapePath silently removes dot segments ("." and ".."), which may surprise callers if their requested path is altered without error. The comment notes this intent; consider documenting this behavior in the public method docs (e.g., ListContents/GetFileContentAtRef) or returning an explicit error when dot-segments are provided. **[NIT]** escapePath silently removes dot segments ("." and ".."), which may surprise callers if their requested path is altered without error. The comment notes this intent; consider documenting this behavior in the public method docs (e.g., ListContents/GetFileContentAtRef) or returning an explicit error when dot-segments are provided.
|
||||
|
||||
@@ -275,3 +275,23 @@ func TestDecodeBase64Content_Invalid(t *testing.T) {
|
||||
t.Fatal("expected error for invalid base64")
|
||||
}
|
||||
}
|
||||
|
||||
func TestEscapePath_RejectsDotSegments(t *testing.T) {
|
||||
tests := []struct {
|
||||
input string
|
||||
want string
|
||||
}{
|
||||
{"src/main.go", "src/main.go"},
|
||||
{"../etc/passwd", "etc/passwd"},
|
||||
{"./src/../main.go", "src/main.go"},
|
||||
{"a/b/c", "a/b/c"},
|
||||
{"file with spaces.go", "file%20with%20spaces.go"},
|
||||
{"a/./b/../c", "a/b/c"},
|
||||
}
|
||||
for _, tt := range tests {
|
||||
got := escapePath(tt.input)
|
||||
if got != tt.want {
|
||||
t.Errorf("escapePath(%q) = %q, want %q", tt.input, got, tt.want)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -205,7 +205,7 @@ func (c *Client) GetCommitStatuses(ctx context.Context, owner, repo, sha string)
|
||||
}
|
||||
|
||||
|
sonnet-review-bot
commented
[NIT] In **[NIT]** In `GetCommitStatuses`, the check run pagination loop uses `checkPage` as the variable name while the PR files loop uses `page`. Naming is internally consistent within each function, but the inconsistency between the two sibling pagination loops is a minor style nit.
|
||||
// mapCheckRunStatus maps a check run conclusion+status to a vcs.CommitStatus status string.
|
||||
|
sonnet-review-bot
commented
[MINOR] mapCheckRunStatus has a dead code path: **[MINOR]** mapCheckRunStatus has a dead code path: `case "in_progress", "queued"` will never be reached because `conclusion == nil` is checked first (returning "pending"), and when conclusion is non-nil it cannot be "in_progress" or "queued" (those are status values, not conclusion values in the GitHub API). This case is unreachable and misleading.
|
||||
func mapCheckRunStatus(conclusion *string, status string) string {
|
||||
func mapCheckRunStatus(conclusion *string, _ string) string {
|
||||
if conclusion == nil {
|
||||
// Still running or queued
|
||||
|
sonnet-review-bot
commented
[MINOR] In **[MINOR]** In `mapCheckRunStatus`, the cases `"in_progress"` and `"queued"` are in the `switch *conclusion` block, but `conclusion` is non-nil when this switch is reached. GitHub's API sets `conclusion` to null for in-progress/queued check runs, not to the strings `"in_progress"` or `"queued"` — those are `status` field values. So these cases are dead code and will never be reached. The nil-conclusion check at the top already handles the pending case correctly.
|
||||
return "pending"
|
||||
@@ -217,8 +217,6 @@ func mapCheckRunStatus(conclusion *string, status string) string {
|
||||
return "failure"
|
||||
case "cancelled", "skipped", "neutral":
|
||||
return "success" // non-blocking
|
||||
case "in_progress", "queued":
|
||||
return "pending"
|
||||
default:
|
||||
return "pending"
|
||||
}
|
||||
|
||||
[NIT] The
SetHTTPClientandSetRetryBackoffmethods are exported as test-injection points but documented as unsafe for concurrent use with requests. The convention in the patterns (style.md) suggests using unexported helpers or the functional options pattern for test injection. The current design works but exposes mutation methods on what is documented as a concurrent-safe type. A//nolintor explicit doc note would make the intent clearer.