feat(github): implement PRReader + FileReader client (#80) #93
@@ -114,6 +114,11 @@ func defaultCheckRedirect(req *http.Request, via []*http.Request) error {
|
||||
if len(via) >= 10 {
|
||||
return fmt.Errorf("stopped after 10 redirects")
|
||||
|
|
||||
}
|
||||
|
gpt-review-bot
commented
[MINOR] Authorization header is always set to "Bearer "+token even when token is empty. Consider only setting the header when token is non-empty to avoid sending an empty bearer token on unauthenticated requests. **[MINOR]** Authorization header is always set to "Bearer "+token even when token is empty. Consider only setting the header when token is non-empty to avoid sending an empty bearer token on unauthenticated requests.
gpt-review-bot
commented
[NIT] GitHub classic PATs typically use the "token" scheme while fine-grained tokens use "Bearer". If supporting both is desired, consider documenting or adapting the auth scheme based on token type. **[NIT]** GitHub classic PATs typically use the "token" scheme while fine-grained tokens use "Bearer". If supporting both is desired, consider documenting or adapting the auth scheme based on token type.
sonnet-review-bot
commented
[MINOR] The doc comment on **[MINOR]** The doc comment on `defaultCheckRedirect` says it "strips the Authorization header on cross-host redirects or protocol downgrades (HTTPS→HTTP) to prevent credential leakage, while still following the redirect." However, a protocol downgrade from HTTPS to HTTP is a genuine security issue — stripping the header and still following is debatable. Consider returning an error on HTTPS→HTTP downgrade rather than silently following. This is a design choice that has security implications, not a bug per se, but worth flagging.
gpt-review-bot
commented
[MINOR] defaultCheckRedirect indexes via[len(via)-1] without guarding for len(via) == 0. net/http currently guarantees at least one prior request in via, but adding a len(via) check would make this more robust against misuse. **[MINOR]** defaultCheckRedirect indexes via[len(via)-1] without guarding for len(via) == 0. net/http currently guarantees at least one prior request in via, but adding a len(via) check would make this more robust against misuse.
|
||||
// Guard: net/http guarantees len(via) >= 1 but this is undocumented;
|
||||
// defend against zero-length to avoid panic on index out of range.
|
||||
|
sonnet-review-bot
commented
[NIT] The concurrency safety doc comment says "SetHTTPClient and SetRetryBackoff must not be called concurrently with requests" but these are exported methods with no enforcement (no mutex, no atomic). This is an acceptable design for a test-support method, but the comment could be stronger: e.g., "These methods are intended for test setup only and must be called before any goroutines issue requests." **[NIT]** The concurrency safety doc comment says "SetHTTPClient and SetRetryBackoff must not be called concurrently with requests" but these are exported methods with no enforcement (no mutex, no atomic). This is an acceptable design for a test-support method, but the comment could be stronger: e.g., "These methods are intended for test setup only and must be called before any goroutines issue requests."
|
||||
if len(via) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
gpt-review-bot
commented
[NIT] Parameter name "url" in doRequest shadows the common concept of URLs and can reduce readability given other files import net/url. Consider renaming the parameter to reqURL for clarity. **[NIT]** Parameter name "url" in doRequest shadows the common concept of URLs and can reduce readability given other files import net/url. Consider renaming the parameter to reqURL for clarity.
[MINOR] Redirects to different hosts or to HTTP are allowed (Authorization is stripped), which can lead to consuming responses from untrusted or downgraded endpoints. While token leakage is prevented, consider rejecting cross-host redirects and HTTPS→HTTP downgrades entirely to avoid integrity/confidentiality risks. **[MINOR]** Redirects to different hosts or to HTTP are allowed (Authorization is stripped), which can lead to consuming responses from untrusted or downgraded endpoints. While token leakage is prevented, consider rejecting cross-host redirects and HTTPS→HTTP downgrades entirely to avoid integrity/confidentiality risks.
|
||||
prev := via[len(via)-1]
|
||||
// Reject protocol downgrade: HTTPS→HTTP leaks request metadata over plaintext.
|
||||
if prev.URL.Scheme == "https" && req.URL.Scheme == "http" {
|
||||
@@ -262,7 +267,10 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
|
||||
return body, nil
|
||||
}
|
||||
|
||||
errBody, _ := io.ReadAll(io.LimitReader(resp.Body, maxErrorBodyBytes))
|
||||
errBody, readErr := io.ReadAll(io.LimitReader(resp.Body, maxErrorBodyBytes))
|
||||
if readErr != nil && len(errBody) == 0 {
|
||||
errBody = []byte(fmt.Sprintf("[error reading response body: %v]", readErr))
|
||||
}
|
||||
resp.Body.Close()
|
||||
|
||||
lastErr = &APIError{StatusCode: resp.StatusCode, Body: string(errBody)}
|
||||
|
||||
@@ -17,6 +17,39 @@ func (c *Client) GetFileContent(ctx context.Context, owner, repo, path, ref stri
|
||||
return c.GetFileContentAtRef(ctx, owner, repo, path, ref)
|
||||
|
sonnet-review-bot
commented
[MINOR] GetFileContent's signature includes a **[MINOR]** GetFileContent's signature includes a `ref` parameter but the doc comment says 'Delegates to GetFileContentAtRef with an empty ref', which contradicts the actual implementation (it passes `ref` through). The doc comment is wrong and should say 'Delegates to GetFileContentAtRef with the provided ref'.
sonnet-review-bot
commented
[MINOR] GetFileContent is defined in files.go but it's a pure delegation to GetFileContentAtRef which lives in pr.go. Both implement the FileReader interface, but splitting the delegation from the implementation across files is mildly confusing. Both could live in files.go or the delegation could be removed and GetFileContentAtRef used directly. Minor organization issue. **[MINOR]** GetFileContent is defined in files.go but it's a pure delegation to GetFileContentAtRef which lives in pr.go. Both implement the FileReader interface, but splitting the delegation from the implementation across files is mildly confusing. Both could live in files.go or the delegation could be removed and GetFileContentAtRef used directly. Minor organization issue.
|
||||
}
|
||||
|
sonnet-review-bot
commented
[NIT] **[NIT]** `GetFileContent` doc says 'Delegates to GetFileContentAtRef with the provided ref' — this is accurate but the function adds no value over calling `GetFileContentAtRef` directly. This is presumably required to satisfy a `FileReader` interface. If that's the case, a comment noting it satisfies the interface would help readers understand why this thin wrapper exists.
|
||||
|
||||
// GetFileContentAtRef fetches a file at a specific ref from a repo.
|
||||
|
gpt-review-bot
commented
[MINOR] ListContents assumes the contents API returns a JSON array. The GitHub Contents API returns an object for single-file paths. Consider handling both array and object responses (or return a clearer error) to avoid a JSON unmarshal error for file paths. **[MINOR]** ListContents assumes the contents API returns a JSON array. The GitHub Contents API returns an object for single-file paths. Consider handling both array and object responses (or return a clearer error) to avoid a JSON unmarshal error for file paths.
|
||||
// If ref is empty, the query parameter is omitted (uses default branch).
|
||||
//
|
||||
// Note: dot-segments ("." and "..") in the path are silently removed to
|
||||
// prevent path traversal. This means a path like "foo/../bar" resolves
|
||||
// to "foo/bar" rather than "bar".
|
||||
func (c *Client) GetFileContentAtRef(ctx context.Context, owner, repo, path, ref string) (string, error) {
|
||||
reqURL := fmt.Sprintf("%s/repos/%s/%s/contents/%s",
|
||||
c.baseURL, url.PathEscape(owner), url.PathEscape(repo), escapePath(path))
|
||||
if ref != "" {
|
||||
reqURL += "?ref=" + url.QueryEscape(ref)
|
||||
}
|
||||
body, err := c.doGet(ctx, reqURL)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("fetch file %s: %w", path, err)
|
||||
}
|
||||
var resp struct {
|
||||
Content string `json:"content"`
|
||||
Encoding string `json:"encoding"`
|
||||
|
gpt-review-bot
commented
[MINOR] escapePath drops ".." segments but does not remove the preceding segment, so paths like "a/./b/../c" normalize to "a/b/c" instead of "a/c". Clarify this behavior in the comment or consider implementing proper dot-segment normalization to avoid surprising results. **[MINOR]** escapePath drops ".." segments but does not remove the preceding segment, so paths like "a/./b/../c" normalize to "a/b/c" instead of "a/c". Clarify this behavior in the comment or consider implementing proper dot-segment normalization to avoid surprising results.
|
||||
}
|
||||
if err := json.Unmarshal(body, &resp); err != nil {
|
||||
return "", fmt.Errorf("parse file content JSON: %w", err)
|
||||
|
gpt-review-bot
commented
[NIT] escapePath removes dot-segments and empty segments, which can turn root ("" or ".") into an empty path, resulting in a trailing slash in the constructed URL (…/contents/). This is likely fine for GitHub, but consider documenting the behavior or handling root explicitly for clarity. **[NIT]** escapePath removes dot-segments and empty segments, which can turn root ("" or ".") into an empty path, resulting in a trailing slash in the constructed URL (…/contents/). This is likely fine for GitHub, but consider documenting the behavior or handling root explicitly for clarity.
|
||||
}
|
||||
if resp.Encoding != "base64" {
|
||||
return "", fmt.Errorf("unexpected encoding %q for file %s", resp.Encoding, path)
|
||||
|
sonnet-review-bot
commented
[MINOR] The fallback from array→object JSON parsing uses **[MINOR]** The fallback from array→object JSON parsing uses `err` from the outer `json.Unmarshal` in the error return (`fmt.Errorf("parse contents JSON: %w", err)`), but `err` here is the array-parse error ('cannot unmarshal object into Go value of type []entry'), which is misleading — the real parse failure is `err2`. Should return `err2` instead: `fmt.Errorf("parse contents JSON: %w", err2)`.
sonnet-review-bot
commented
[MINOR] The fallback from array to object JSON parsing is an unusual approach. The error from the first **[MINOR]** The fallback from array to object JSON parsing is an unusual approach. The error from the first `json.Unmarshal` (array attempt) is discarded silently. If the response is truly malformed JSON (not just an object vs array mismatch), only `err2` from the single-entry unmarshal is surfaced. This is intentional per the comment but it means a corrupted array response that happens to partially parse as an object could go undetected. Consider checking if the body starts with `[` vs `{` to choose the parsing path, which would be more explicit and preserve the original error on genuine parse failures.
sonnet-review-bot
commented
[NIT] The fallback from array to single-object JSON unmarshal silently swallows the original array parse error. If the body is valid JSON but neither an array nor an object matching the **[NIT]** The fallback from array to single-object JSON unmarshal silently swallows the original array parse error. If the body is valid JSON but neither an array nor an object matching the `entry` struct (e.g., a JSON number), the second unmarshal will succeed with zero values and return a single empty entry. Consider checking that `single.Name != ""` or similar before accepting the fallback.
|
||||
}
|
||||
decoded, err := decodeBase64Content(resp.Content)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("decode base64 content for %s: %w", path, err)
|
||||
}
|
||||
return decoded, nil
|
||||
|
security-review-bot marked this conversation as resolved
Outdated
[MINOR] escapePath preserves dot-segments ("." and "..") in path components. Some servers normalize dot-segments, which could let an untrusted path escape the intended "contents" endpoint (e.g., "/contents/../../pulls"). While this stays on the same host, it could lead to unexpected behavior. Consider rejecting or encoding dot-segments to prevent path traversal in URL construction. **[MINOR]** escapePath preserves dot-segments ("." and "..") in path components. Some servers normalize dot-segments, which could let an untrusted path escape the intended "contents" endpoint (e.g., "/contents/../../pulls"). While this stays on the same host, it could lead to unexpected behavior. Consider rejecting or encoding dot-segments to prevent path traversal in URL construction.
|
||||
}
|
||||
|
sonnet-review-bot
commented
[MINOR] Silent empty-array handling: when the GitHub Contents API returns an empty JSON array **[MINOR]** Silent empty-array handling: when the GitHub Contents API returns an empty JSON array `[]`, `json.Unmarshal` into `[]entry` succeeds with `len(entries) == 0` and the function returns an empty slice without error. This is probably the correct behavior for an empty directory, but it's undocumented and could silently mask an unexpected server response. A comment clarifying this intent would be valuable.
|
||||
|
||||
// ListContents lists files and directories at a given path in a repo.
|
||||
// Returns the directory listing from the GitHub contents API.
|
||||
// If the path points to a single file (not a directory), the API returns
|
||||
|
||||
@@ -33,7 +33,6 @@ type changedFileResponse struct {
|
||||
|
||||
// commitStatusResponse is the GitHub combined status API response.
|
||||
type commitStatusResponse struct {
|
||||
State string `json:"state"`
|
||||
Statuses []struct {
|
||||
Context string `json:"context"`
|
||||
State string `json:"state"`
|
||||
@@ -123,39 +122,6 @@ func (c *Client) GetPullRequestFiles(ctx context.Context, owner, repo string, nu
|
||||
return allFiles, nil
|
||||
}
|
||||
|
||||
// GetFileContentAtRef fetches a file at a specific ref from a repo.
|
||||
// If ref is empty, the query parameter is omitted (uses default branch).
|
||||
//
|
||||
// Note: dot-segments ("." and "..") in the path are silently removed to
|
||||
// prevent path traversal. This means a path like "foo/../bar" resolves
|
||||
// to "foo/bar" rather than "bar".
|
||||
func (c *Client) GetFileContentAtRef(ctx context.Context, owner, repo, path, ref string) (string, error) {
|
||||
reqURL := fmt.Sprintf("%s/repos/%s/%s/contents/%s",
|
||||
c.baseURL, url.PathEscape(owner), url.PathEscape(repo), escapePath(path))
|
||||
if ref != "" {
|
||||
reqURL += "?ref=" + url.QueryEscape(ref)
|
||||
}
|
||||
body, err := c.doGet(ctx, reqURL)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("fetch file %s: %w", path, err)
|
||||
}
|
||||
var resp struct {
|
||||
Content string `json:"content"`
|
||||
Encoding string `json:"encoding"`
|
||||
}
|
||||
if err := json.Unmarshal(body, &resp); err != nil {
|
||||
return "", fmt.Errorf("parse file content JSON: %w", err)
|
||||
}
|
||||
if resp.Encoding != "base64" {
|
||||
return "", fmt.Errorf("unexpected encoding %q for file %s", resp.Encoding, path)
|
||||
}
|
||||
decoded, err := decodeBase64Content(resp.Content)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("decode base64 content for %s: %w", path, err)
|
||||
}
|
||||
return decoded, nil
|
||||
}
|
||||
|
||||
// GetCommitStatuses fetches both commit statuses and check runs for a SHA,
|
||||
// merging them into a unified []vcs.CommitStatus slice.
|
||||
// Returns nil (not an empty slice) when there are no statuses or check runs.
|
||||
|
sonnet-review-bot
commented
[NIT] The **[NIT]** The `GetFileContentAtRef` method is placed in `pr.go` but it's also the implementation delegate for `GetFileContent` in `files.go`. Conceptually it belongs in `files.go` alongside the other file-related methods. The split works because Go doesn't care which file a method is in, but for navigability (per the `style.md` pattern of organizing by concept) moving it to `files.go` would be cleaner.
|
||||
|
||||
[MINOR] defaultCheckRedirect follows HTTPS→HTTP redirects after stripping Authorization. While credentials are protected, this still permits plaintext requests to proceed, which can leak metadata and expands attack surface if a misconfigured or compromised server issues such redirects. Prefer failing closed on protocol downgrades.