feat(github): implement PRReader + FileReader client (#80) #93

Closed
rodin wants to merge 16 commits from review-bot-issue-80 into feature/github-support
3 changed files with 42 additions and 35 deletions
Showing only changes of commit 6e8e744816 - Show all commits
+9 -1
View File
37
@@ -114,6 +114,11 @@ func defaultCheckRedirect(req *http.Request, via []*http.Request) error {
if len(via) >= 10 {
return fmt.Errorf("stopped after 10 redirects")
Review

[MINOR] defaultCheckRedirect follows HTTPS→HTTP redirects after stripping Authorization. While credentials are protected, this still permits plaintext requests to proceed, which can leak metadata and expands attack surface if a misconfigured or compromised server issues such redirects. Prefer failing closed on protocol downgrades.

**[MINOR]** defaultCheckRedirect follows HTTPS→HTTP redirects after stripping Authorization. While credentials are protected, this still permits plaintext requests to proceed, which can leak metadata and expands attack surface if a misconfigured or compromised server issues such redirects. Prefer failing closed on protocol downgrades.
}
Outdated
Review

[MINOR] Authorization header is always set to "Bearer "+token even when token is empty. Consider only setting the header when token is non-empty to avoid sending an empty bearer token on unauthenticated requests.

**[MINOR]** Authorization header is always set to "Bearer "+token even when token is empty. Consider only setting the header when token is non-empty to avoid sending an empty bearer token on unauthenticated requests.
Outdated
Review

[NIT] GitHub classic PATs typically use the "token" scheme while fine-grained tokens use "Bearer". If supporting both is desired, consider documenting or adapting the auth scheme based on token type.

**[NIT]** GitHub classic PATs typically use the "token" scheme while fine-grained tokens use "Bearer". If supporting both is desired, consider documenting or adapting the auth scheme based on token type.
Review

[MINOR] The doc comment on defaultCheckRedirect says it "strips the Authorization header on cross-host redirects or protocol downgrades (HTTPS→HTTP) to prevent credential leakage, while still following the redirect." However, a protocol downgrade from HTTPS to HTTP is a genuine security issue — stripping the header and still following is debatable. Consider returning an error on HTTPS→HTTP downgrade rather than silently following. This is a design choice that has security implications, not a bug per se, but worth flagging.

**[MINOR]** The doc comment on `defaultCheckRedirect` says it "strips the Authorization header on cross-host redirects or protocol downgrades (HTTPS→HTTP) to prevent credential leakage, while still following the redirect." However, a protocol downgrade from HTTPS to HTTP is a genuine security issue — stripping the header and still following is debatable. Consider returning an error on HTTPS→HTTP downgrade rather than silently following. This is a design choice that has security implications, not a bug per se, but worth flagging.
Review

[MINOR] defaultCheckRedirect indexes via[len(via)-1] without guarding for len(via) == 0. net/http currently guarantees at least one prior request in via, but adding a len(via) check would make this more robust against misuse.

**[MINOR]** defaultCheckRedirect indexes via[len(via)-1] without guarding for len(via) == 0. net/http currently guarantees at least one prior request in via, but adding a len(via) check would make this more robust against misuse.
// Guard: net/http guarantees len(via) >= 1 but this is undocumented;
// defend against zero-length to avoid panic on index out of range.
Outdated
Review

[NIT] The concurrency safety doc comment says "SetHTTPClient and SetRetryBackoff must not be called concurrently with requests" but these are exported methods with no enforcement (no mutex, no atomic). This is an acceptable design for a test-support method, but the comment could be stronger: e.g., "These methods are intended for test setup only and must be called before any goroutines issue requests."

**[NIT]** The concurrency safety doc comment says "SetHTTPClient and SetRetryBackoff must not be called concurrently with requests" but these are exported methods with no enforcement (no mutex, no atomic). This is an acceptable design for a test-support method, but the comment could be stronger: e.g., "These methods are intended for test setup only and must be called before any goroutines issue requests."
if len(via) == 0 {
return nil
}
Outdated
Review

[NIT] Parameter name "url" in doRequest shadows the common concept of URLs and can reduce readability given other files import net/url. Consider renaming the parameter to reqURL for clarity.

**[NIT]** Parameter name "url" in doRequest shadows the common concept of URLs and can reduce readability given other files import net/url. Consider renaming the parameter to reqURL for clarity.
Outdated
Review

[MINOR] Redirects to different hosts or to HTTP are allowed (Authorization is stripped), which can lead to consuming responses from untrusted or downgraded endpoints. While token leakage is prevented, consider rejecting cross-host redirects and HTTPS→HTTP downgrades entirely to avoid integrity/confidentiality risks.

**[MINOR]** Redirects to different hosts or to HTTP are allowed (Authorization is stripped), which can lead to consuming responses from untrusted or downgraded endpoints. While token leakage is prevented, consider rejecting cross-host redirects and HTTPS→HTTP downgrades entirely to avoid integrity/confidentiality risks.
prev := via[len(via)-1]
// Reject protocol downgrade: HTTPS→HTTP leaks request metadata over plaintext.
if prev.URL.Scheme == "https" && req.URL.Scheme == "http" {
52
@@ -262,7 +267,10 @@ func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept st
return body, nil
}
errBody, _ := io.ReadAll(io.LimitReader(resp.Body, maxErrorBodyBytes))
errBody, readErr := io.ReadAll(io.LimitReader(resp.Body, maxErrorBodyBytes))
if readErr != nil && len(errBody) == 0 {
errBody = []byte(fmt.Sprintf("[error reading response body: %v]", readErr))
}
resp.Body.Close()
lastErr = &APIError{StatusCode: resp.StatusCode, Body: string(errBody)}
+33
View File
3
@@ -17,6 +17,39 @@ func (c *Client) GetFileContent(ctx context.Context, owner, repo, path, ref stri
return c.GetFileContentAtRef(ctx, owner, repo, path, ref)
Review

[MINOR] GetFileContent's signature includes a ref parameter but the doc comment says 'Delegates to GetFileContentAtRef with an empty ref', which contradicts the actual implementation (it passes ref through). The doc comment is wrong and should say 'Delegates to GetFileContentAtRef with the provided ref'.

**[MINOR]** GetFileContent's signature includes a `ref` parameter but the doc comment says 'Delegates to GetFileContentAtRef with an empty ref', which contradicts the actual implementation (it passes `ref` through). The doc comment is wrong and should say 'Delegates to GetFileContentAtRef with the provided ref'.
Review

[MINOR] GetFileContent is defined in files.go but it's a pure delegation to GetFileContentAtRef which lives in pr.go. Both implement the FileReader interface, but splitting the delegation from the implementation across files is mildly confusing. Both could live in files.go or the delegation could be removed and GetFileContentAtRef used directly. Minor organization issue.

**[MINOR]** GetFileContent is defined in files.go but it's a pure delegation to GetFileContentAtRef which lives in pr.go. Both implement the FileReader interface, but splitting the delegation from the implementation across files is mildly confusing. Both could live in files.go or the delegation could be removed and GetFileContentAtRef used directly. Minor organization issue.
}
Review

[NIT] GetFileContent doc says 'Delegates to GetFileContentAtRef with the provided ref' — this is accurate but the function adds no value over calling GetFileContentAtRef directly. This is presumably required to satisfy a FileReader interface. If that's the case, a comment noting it satisfies the interface would help readers understand why this thin wrapper exists.

**[NIT]** `GetFileContent` doc says 'Delegates to GetFileContentAtRef with the provided ref' — this is accurate but the function adds no value over calling `GetFileContentAtRef` directly. This is presumably required to satisfy a `FileReader` interface. If that's the case, a comment noting it satisfies the interface would help readers understand why this thin wrapper exists.
// GetFileContentAtRef fetches a file at a specific ref from a repo.
Outdated
Review

[MINOR] ListContents assumes the contents API returns a JSON array. The GitHub Contents API returns an object for single-file paths. Consider handling both array and object responses (or return a clearer error) to avoid a JSON unmarshal error for file paths.

**[MINOR]** ListContents assumes the contents API returns a JSON array. The GitHub Contents API returns an object for single-file paths. Consider handling both array and object responses (or return a clearer error) to avoid a JSON unmarshal error for file paths.
// If ref is empty, the query parameter is omitted (uses default branch).
//
// Note: dot-segments ("." and "..") in the path are silently removed to
// prevent path traversal. This means a path like "foo/../bar" resolves
// to "foo/bar" rather than "bar".
func (c *Client) GetFileContentAtRef(ctx context.Context, owner, repo, path, ref string) (string, error) {
reqURL := fmt.Sprintf("%s/repos/%s/%s/contents/%s",
c.baseURL, url.PathEscape(owner), url.PathEscape(repo), escapePath(path))
if ref != "" {
reqURL += "?ref=" + url.QueryEscape(ref)
}
body, err := c.doGet(ctx, reqURL)
if err != nil {
return "", fmt.Errorf("fetch file %s: %w", path, err)
}
var resp struct {
Content string `json:"content"`
Encoding string `json:"encoding"`
Outdated
Review

[MINOR] escapePath drops ".." segments but does not remove the preceding segment, so paths like "a/./b/../c" normalize to "a/b/c" instead of "a/c". Clarify this behavior in the comment or consider implementing proper dot-segment normalization to avoid surprising results.

**[MINOR]** escapePath drops ".." segments but does not remove the preceding segment, so paths like "a/./b/../c" normalize to "a/b/c" instead of "a/c". Clarify this behavior in the comment or consider implementing proper dot-segment normalization to avoid surprising results.
}
if err := json.Unmarshal(body, &resp); err != nil {
return "", fmt.Errorf("parse file content JSON: %w", err)
Outdated
Review

[NIT] escapePath removes dot-segments and empty segments, which can turn root ("" or ".") into an empty path, resulting in a trailing slash in the constructed URL (…/contents/). This is likely fine for GitHub, but consider documenting the behavior or handling root explicitly for clarity.

**[NIT]** escapePath removes dot-segments and empty segments, which can turn root ("" or ".") into an empty path, resulting in a trailing slash in the constructed URL (…/contents/). This is likely fine for GitHub, but consider documenting the behavior or handling root explicitly for clarity.
}
if resp.Encoding != "base64" {
return "", fmt.Errorf("unexpected encoding %q for file %s", resp.Encoding, path)
Outdated
Review

[MINOR] The fallback from array→object JSON parsing uses err from the outer json.Unmarshal in the error return (fmt.Errorf("parse contents JSON: %w", err)), but err here is the array-parse error ('cannot unmarshal object into Go value of type []entry'), which is misleading — the real parse failure is err2. Should return err2 instead: fmt.Errorf("parse contents JSON: %w", err2).

**[MINOR]** The fallback from array→object JSON parsing uses `err` from the outer `json.Unmarshal` in the error return (`fmt.Errorf("parse contents JSON: %w", err)`), but `err` here is the array-parse error ('cannot unmarshal object into Go value of type []entry'), which is misleading — the real parse failure is `err2`. Should return `err2` instead: `fmt.Errorf("parse contents JSON: %w", err2)`.
Outdated
Review

[MINOR] The fallback from array to object JSON parsing is an unusual approach. The error from the first json.Unmarshal (array attempt) is discarded silently. If the response is truly malformed JSON (not just an object vs array mismatch), only err2 from the single-entry unmarshal is surfaced. This is intentional per the comment but it means a corrupted array response that happens to partially parse as an object could go undetected. Consider checking if the body starts with [ vs { to choose the parsing path, which would be more explicit and preserve the original error on genuine parse failures.

**[MINOR]** The fallback from array to object JSON parsing is an unusual approach. The error from the first `json.Unmarshal` (array attempt) is discarded silently. If the response is truly malformed JSON (not just an object vs array mismatch), only `err2` from the single-entry unmarshal is surfaced. This is intentional per the comment but it means a corrupted array response that happens to partially parse as an object could go undetected. Consider checking if the body starts with `[` vs `{` to choose the parsing path, which would be more explicit and preserve the original error on genuine parse failures.
Outdated
Review

[NIT] The fallback from array to single-object JSON unmarshal silently swallows the original array parse error. If the body is valid JSON but neither an array nor an object matching the entry struct (e.g., a JSON number), the second unmarshal will succeed with zero values and return a single empty entry. Consider checking that single.Name != "" or similar before accepting the fallback.

**[NIT]** The fallback from array to single-object JSON unmarshal silently swallows the original array parse error. If the body is valid JSON but neither an array nor an object matching the `entry` struct (e.g., a JSON number), the second unmarshal will succeed with zero values and return a single empty entry. Consider checking that `single.Name != ""` or similar before accepting the fallback.
}
decoded, err := decodeBase64Content(resp.Content)
if err != nil {
return "", fmt.Errorf("decode base64 content for %s: %w", path, err)
}
return decoded, nil
security-review-bot marked this conversation as resolved Outdated
Outdated
Review

[MINOR] escapePath preserves dot-segments ("." and "..") in path components. Some servers normalize dot-segments, which could let an untrusted path escape the intended "contents" endpoint (e.g., "/contents/../../pulls"). While this stays on the same host, it could lead to unexpected behavior. Consider rejecting or encoding dot-segments to prevent path traversal in URL construction.

**[MINOR]** escapePath preserves dot-segments ("." and "..") in path components. Some servers normalize dot-segments, which could let an untrusted path escape the intended "contents" endpoint (e.g., "/contents/../../pulls"). While this stays on the same host, it could lead to unexpected behavior. Consider rejecting or encoding dot-segments to prevent path traversal in URL construction.
}
Outdated
Review

[MINOR] Silent empty-array handling: when the GitHub Contents API returns an empty JSON array [], json.Unmarshal into []entry succeeds with len(entries) == 0 and the function returns an empty slice without error. This is probably the correct behavior for an empty directory, but it's undocumented and could silently mask an unexpected server response. A comment clarifying this intent would be valuable.

**[MINOR]** Silent empty-array handling: when the GitHub Contents API returns an empty JSON array `[]`, `json.Unmarshal` into `[]entry` succeeds with `len(entries) == 0` and the function returns an empty slice without error. This is probably the correct behavior for an empty directory, but it's undocumented and could silently mask an unexpected server response. A comment clarifying this intent would be valuable.
// ListContents lists files and directories at a given path in a repo.
// Returns the directory listing from the GitHub contents API.
// If the path points to a single file (not a directory), the API returns
12
-34
View File
@@ -33,7 +33,6 @@ type changedFileResponse struct {
// commitStatusResponse is the GitHub combined status API response.
type commitStatusResponse struct {
State string `json:"state"`
Statuses []struct {
Context string `json:"context"`
State string `json:"state"`
10
@@ -123,39 +122,6 @@ func (c *Client) GetPullRequestFiles(ctx context.Context, owner, repo string, nu
return allFiles, nil
}
// GetFileContentAtRef fetches a file at a specific ref from a repo.
// If ref is empty, the query parameter is omitted (uses default branch).
//
// Note: dot-segments ("." and "..") in the path are silently removed to
// prevent path traversal. This means a path like "foo/../bar" resolves
// to "foo/bar" rather than "bar".
func (c *Client) GetFileContentAtRef(ctx context.Context, owner, repo, path, ref string) (string, error) {
reqURL := fmt.Sprintf("%s/repos/%s/%s/contents/%s",
c.baseURL, url.PathEscape(owner), url.PathEscape(repo), escapePath(path))
if ref != "" {
reqURL += "?ref=" + url.QueryEscape(ref)
}
body, err := c.doGet(ctx, reqURL)
if err != nil {
return "", fmt.Errorf("fetch file %s: %w", path, err)
}
var resp struct {
Content string `json:"content"`
Encoding string `json:"encoding"`
}
if err := json.Unmarshal(body, &resp); err != nil {
return "", fmt.Errorf("parse file content JSON: %w", err)
}
if resp.Encoding != "base64" {
return "", fmt.Errorf("unexpected encoding %q for file %s", resp.Encoding, path)
}
decoded, err := decodeBase64Content(resp.Content)
if err != nil {
return "", fmt.Errorf("decode base64 content for %s: %w", path, err)
}
return decoded, nil
}
// GetCommitStatuses fetches both commit statuses and check runs for a SHA,
// merging them into a unified []vcs.CommitStatus slice.
// Returns nil (not an empty slice) when there are no statuses or check runs.
Outdated
Review

[NIT] The GetFileContentAtRef method is placed in pr.go but it's also the implementation delegate for GetFileContent in files.go. Conceptually it belongs in files.go alongside the other file-related methods. The split works because Go doesn't care which file a method is in, but for navigability (per the style.md pattern of organizing by concept) moving it to files.go would be cleaner.

**[NIT]** The `GetFileContentAtRef` method is placed in `pr.go` but it's also the implementation delegate for `GetFileContent` in `files.go`. Conceptually it belongs in `files.go` alongside the other file-related methods. The split works because Go doesn't care which file a method is in, but for navigability (per the `style.md` pattern of organizing by concept) moving it to `files.go` would be cleaner.
20