Compare commits

..

17 Commits

Author SHA1 Message Date
Rodin 75190d53ed fix: address review findings (comment, marker budget, naming)
CI / test (pull_request) Successful in 13s
CI / review (gpt-4.1, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 22s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 1m48s
- UserMeta comment: "never trimmed" → "truncated only if base exceeds budget"
- Skip diff truncation marker when diffBudget < markerBudget (prevents
  marker itself from pushing EstTokens over the limit)
- Rename filepath → filePath to avoid shadowing stdlib package name
2026-05-01 20:02:35 -07:00
Rodin 8b8462bdc8 fix: address final review findings
CI / test (pull_request) Successful in 13s
CI / review (gpt-4.1, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 21s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 1m30s
- Comment: "~4 characters" → "~4 bytes" (len() counts bytes, not runes)
- Use utf8.RuneStart from stdlib instead of custom isUTF8Start helper
- Skip diff block entirely when Diff is empty (handles edge cases:
  draft→ready with no delta, force-push matching base, etc.)
2026-05-01 19:36:42 -07:00
Rodin 565a077b01 fix: CI config - correct patterns path, increase timeout
CI / test (pull_request) Successful in 13s
CI / review (gpt-4.1, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 21s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 2m18s
- PATTERNS_FILES: docs/ does not exist in go-patterns, use patterns/
- LLM_TIMEOUT: 600s (gpt-5-mini needs more time for larger diffs)
2026-05-01 19:06:18 -07:00
Rodin dab7871cb4 fix: address review findings on budget system
CI / test (pull_request) Successful in 14s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 1m41s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 3m2s
- Account for truncation marker tokens when computing diff budget
  (prevents EstTokens exceeding model limit in edge cases)
- Rune-safe truncation for both UserMeta and Diff (no split multi-byte)
- Fix misleading comment (1000 chars → ~1000 tokens/4000 chars)
- Extract marker strings as constants
- Add unit tests for BuildSystemBase and BuildUserMeta
2026-05-01 18:59:07 -07:00
Rodin d9cacf6f62 fix: strict budget enforcement + deterministic model matching
CI / test (pull_request) Successful in 14s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 1m59s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Failing after 5m12s
Addresses review findings:
- Replace map-based model limits with ordered slice (longest-prefix-first)
  for deterministic matching
- Truncate UserMeta when base content alone exceeds budget (keeps first
  4000 chars + truncation marker)
- Remove hard minimum of 1000 tokens for diff budget — use 0 as floor
  to guarantee total never exceeds limit
- Handle zero-budget edge case (diff replaced with manual-review message)
- Add tests: huge UserMeta, all-sections-huge never exceeds limit
2026-05-01 18:51:22 -07:00
Rodin 67d835909f feat: add context budget system for LLM overflow (#19)
CI / test (pull_request) Successful in 13s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 1m30s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m29s
Adds a budget package that estimates token usage and progressively
trims context to fit within model-specific limits.

Trim order (least important first):
1. Language patterns
2. Repository conventions
3. Full file context
4. Diff (truncated as last resort)

When content is trimmed, a note is appended to the user prompt so
the LLM knows context was reduced.

- New budget package with Fit(), EstimateTokens(), LimitForModel()
- Model limit table (GPT-4.1: 128K, GPT-5: 200K, Claude: 200K)
- Refactored review/prompt.go: BuildSystemBase() and BuildUserMeta()
  extract non-trimmable content; old functions delegate to new ones
- main.go uses budget.Fit() instead of direct prompt assembly
- 7 unit tests covering all trim paths

Closes #19
2026-05-01 18:46:53 -07:00
rodin ef3e6d5e87 Merge pull request 'fix: path-escape file paths and eliminate url package shadowing' (#17) from fix/url-escaping-and-shadow into main
CI / test (push) Successful in 15s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
2026-05-01 21:55:02 +00:00
Rodin aade891129 docs: add package-level documentation
CI / test (pull_request) Successful in 14s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Failing after 55s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Failing after 1m6s
Per go-patterns/package-design.md, every package needs a doc comment.
Added to gitea, llm, and review packages.
2026-05-01 14:54:58 -07:00
Rodin 7b42de67ca fix: handle empty path in ListContents (root listing)
CI / test (pull_request) Successful in 14s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 1m4s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m8s
Empty path now yields /contents instead of /contents/ (trailing slash).
Added doc comment noting empty path = repo root.
2026-05-01 14:46:40 -07:00
Rodin dd2661fe14 fix: address all review findings from PR #17
CI / test (pull_request) Successful in 13s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 52s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m12s
- Rename all remaining url locals to reqURL (consistency)
- Use http.MethodGet/http.MethodPost constants
- Document escapePath: relative paths only, double-encoding expected
- Add TestEscapePath with 7 edge cases (empty, spaces, #, deep, encoded)
2026-05-01 14:40:19 -07:00
Rodin 98a4772f30 fix: path-escape file paths and eliminate url package shadowing
CI / test (pull_request) Successful in 14s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 55s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m48s
- Add escapePath() helper: escapes each path segment individually
  (preserves slashes as separators, escapes spaces/#/? etc)
- Apply to GetFileContent, GetFileContentRef, ListContents
- Rename doGet parameter from url to reqURL (avoids shadowing net/url)
- Rename local variables in GetFileContent/ListContents for consistency

Addresses remaining findings from PR #16 review.
2026-05-01 14:33:18 -07:00
rodin fc23b6ebe9 Merge pull request 'fix: quick wins (#7, #9, #13)' (#16) from fix/quick-wins into main
CI / test (push) Successful in 14s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
2026-05-01 21:30:57 +00:00
Rodin b02ade4f23 fix: quick wins (#7, #9, #13)
CI / test (pull_request) Successful in 13s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 59s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m48s
- Add --version flag and log version on startup (closes #9)
- URL-escape ref query parameter in GetFileContentRef (closes #7)
- Add go vet to release workflow (closes #13)

Renamed local url variable to reqURL to avoid shadowing net/url package.
2026-05-01 14:19:37 -07:00
rodin f8e77cf7e3 Merge pull request 'feat: add context.Context + unexport client fields' (#14) from fix/context-and-encapsulation into main
CI / test (push) Successful in 13s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
Release / release (push) Successful in 31s
2026-05-01 21:10:37 +00:00
aweiker ffca0eb016 Merge pull request 'docs: add comprehensive code review report (vs go-patterns)' (#1) from docs/code-review-report into main
CI / test (push) Successful in 14s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #1
2026-05-01 19:25:16 +00:00
Rodin 9aec7ff952 docs: add comprehensive code review report (vs go-patterns)
CI / test (pull_request) Successful in 14s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Failing after 1m4s
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (pull_request) Failing after 3m47s
2026-05-01 19:24:41 +00:00
aweiker 582ebf7ff6 Merge pull request 'ci: add release workflow + install script' (#2) from ci/release-workflow into main
CI / test (push) Successful in 13s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5-mini, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #2
2026-05-01 19:24:15 +00:00
11 changed files with 1032 additions and 33 deletions
+3 -2
View File
@@ -31,7 +31,7 @@ jobs:
model: gpt-5 model: gpt-5
- name: gpt - name: gpt
token_secret: GPT_REVIEW_TOKEN token_secret: GPT_REVIEW_TOKEN
model: gpt-5-mini model: gpt-4.1
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
- uses: actions/setup-go@v5 - uses: actions/setup-go@v5
@@ -49,5 +49,6 @@ jobs:
LLM_MODEL: ${{ matrix.model }} LLM_MODEL: ${{ matrix.model }}
CONVENTIONS_FILE: "CONVENTIONS.md" CONVENTIONS_FILE: "CONVENTIONS.md"
PATTERNS_REPO: "rodin/go-patterns" PATTERNS_REPO: "rodin/go-patterns"
PATTERNS_FILES: "README.md,docs/" PATTERNS_FILES: "README.md,patterns/"
LLM_TIMEOUT: "600"
run: ./review-bot run: ./review-bot
+3 -1
View File
@@ -16,7 +16,9 @@ jobs:
go-version: '1.26' go-version: '1.26'
- name: Run tests - name: Run tests
run: go test ./... run: |
go vet ./...
go test ./...
- name: Build binaries - name: Build binaries
run: | run: |
+436
View File
@@ -0,0 +1,436 @@
# review-bot Code Review (vs go-patterns)
## Overall Assessment
The review-bot is a well-structured, focused Go application that follows many idiomatic patterns correctly. The package layout is clean (`gitea/`, `llm/`, `review/`, `cmd/`), error handling uses `%w` wrapping consistently, and the test suite covers all major code paths using `httptest`. However, there are several areas where the code diverges from the patterns documented in `go-patterns` — particularly around configuration, context propagation, exported fields, documentation, and testing idioms.
**Verdict: Solid foundation with targeted improvements needed.**
## Findings
| # | Severity | File | Pattern Violated | Finding |
|---|----------|------|-----------------|---------|
| 1 | MAJOR | `gitea/client.go` | concurrency / api-conventions | No `context.Context` parameter on any method — HTTP calls are uncancellable |
| 2 | MAJOR | `llm/client.go` | concurrency / api-conventions | `Complete()` accepts no context — no timeout or cancellation support |
| 3 | MAJOR | `gitea/client.go` | structs / encapsulation | `Client` fields (`BaseURL`, `Token`, `HTTP`) are exported but should be unexported |
| 4 | MAJOR | `llm/client.go` | structs / encapsulation | `Client` fields (`BaseURL`, `APIKey`, `Model`, `HTTP`) are exported — leaks credentials via reflection/logging |
| 5 | MINOR | `cmd/review-bot/main.go` | configuration | No input validation beyond emptiness — e.g., URL format, model name format |
| 6 | MINOR | `cmd/review-bot/main.go` | error-handling | Uses `log.Fatalf` for all errors — no cleanup, deferred functions won't run |
| 7 | MINOR | `gitea/client.go` | error-handling / style | Error strings in `doGet` are inconsistent — some use `fmt.Errorf`, the raw HTTP error doesn't wrap with `%w` |
| 8 | MINOR | `review/prompt.go` | style / api-conventions | `BuildSystemPrompt` uses 20+ `WriteString` calls — could use a raw string literal for readability |
| 9 | MINOR | `gitea/client.go` | documentation | No concurrency safety documentation on `Client` type |
| 10 | MINOR | `llm/client.go` | documentation | No concurrency safety documentation on `Client` type |
| 11 | MINOR | `gitea/client_test.go` | testing-advanced | Tests don't use `t.Run` subtests — individual test functions instead of table-driven with named cases |
| 12 | MINOR | `integration_test.go` | style | Uses rune literal `'/'` comparison in a loop instead of `strings.SplitN` (inconsistent with `main.go`) |
| 13 | MINOR | `llm/client.go` | configuration | `Temperature: 0.1` is hardcoded — not configurable and the zero-value (0.0) semantic isn't clear |
| 14 | NIT | `gitea/client.go` | style | `PostReview` converts `[]byte` to `string` then passes to `strings.NewReader` — use `bytes.NewReader(data)` directly |
| 15 | NIT | `review/formatter.go` | documentation | `GiteaEvent` has no doc comment explaining the mapping semantics |
| 16 | NIT | `cmd/review-bot/main.go` | package-design | `evaluateCIStatus` is unexported logic in `main` — could live in `review` package for testability |
| 17 | NIT | `gitea/client.go` | interfaces | No interface defined for the Gitea client — makes the main function harder to unit test |
| 18 | NIT | `llm/client.go` | interfaces | No interface defined for the LLM client — same testability concern |
| 19 | NIT | `review/parser.go` | error-handling | `extractJSON` silently handles malformed fences — edge case: `\`\`\`` with only 1 line produces empty string |
| 20 | NIT | Various | documentation | No package-level doc comments (`// Package xxx ...`) on any package |
## Detailed Findings
### 1. No `context.Context` on Gitea client methods (MAJOR)
**What the code does:**
```go
func (c *Client) GetPullRequest(owner, repo string, number int) (*PullRequest, error) {
url := fmt.Sprintf(...)
body, err := c.doGet(url)
...
}
```
**What the pattern says:**
From `concurrency.md` §6 (Context Propagation Rules): "Pass a Context explicitly to each function that needs it. The Context should be the first parameter, typically named ctx." From `api-conventions.md` §3 (WithContext variant): All I/O-performing functions should accept a context for timeout/cancellation.
**How to fix:**
```go
func (c *Client) GetPullRequest(ctx context.Context, owner, repo string, number int) (*PullRequest, error) {
...
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
...
}
```
Add `context.Context` as the first parameter to all public methods. Update `doGet` to accept context internally.
---
### 2. No `context.Context` on LLM `Complete()` (MAJOR)
**What the code does:**
```go
func (c *Client) Complete(messages []Message) (string, error) {
...
req, err := http.NewRequest("POST", url, bytes.NewReader(data))
...
}
```
**What the pattern says:**
Same as finding #1. LLM calls can take 30-60+ seconds. Without context, there's no way to enforce a timeout or cancel a review that's taking too long.
**How to fix:**
```go
func (c *Client) Complete(ctx context.Context, messages []Message) (string, error) {
...
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(data))
...
}
```
The caller in `main.go` should create a context with timeout: `ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)`.
---
### 3 & 4. Exported struct fields on Client types (MAJOR)
**What the code does:**
```go
type Client struct {
BaseURL string
Token string
HTTP *http.Client
}
```
**What the pattern says:**
From `structs.md`: use unexported fields for internal state; expose only what callers need to read/modify. From `configuration.md` §9: Document immutability constraints. Exported fields like `Token` and `APIKey` are sensitive credentials that could be accidentally logged, serialized, or mutated after construction.
**How to fix:**
```go
type Client struct {
baseURL string
token string
http *http.Client
}
```
If tests need to override `HTTP`, expose it via a functional option or a `WithHTTPClient(*http.Client)` setter, or accept it in the constructor.
---
### 5. No input validation beyond emptiness (MINOR)
**What the code does:**
```go
if *giteaURL == "" || *repo == "" || ...
```
**What the pattern says:**
From `configuration.md` §1 (Zero-Value Config): Document and validate configuration explicitly. A malformed URL (e.g., missing scheme) will produce a confusing error later during HTTP request creation rather than at startup.
**How to fix:**
```go
if _, err := url.Parse(*giteaURL); err != nil || !strings.HasPrefix(*giteaURL, "http") {
log.Fatalf("Invalid --gitea-url: %s", *giteaURL)
}
```
---
### 6. `log.Fatalf` for all errors (MINOR)
**What the code does:**
`log.Fatalf(...)` is used for every error in `main()`.
**What the pattern says:**
From `api-conventions.md` §9 (Graceful Shutdown): distinguish between fatal and recoverable errors. From `error-handling.md`: error handling should give callers the ability to respond. While `main()` is the top-level caller, `log.Fatalf` calls `os.Exit(1)` which doesn't run deferred functions.
**How to fix:**
Use a `run() error` pattern:
```go
func main() {
if err := run(); err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
os.Exit(1)
}
}
func run() error { ... }
```
This allows deferred cleanup to run and makes the code testable.
---
### 7. Inconsistent error formatting in `doGet` (MINOR)
**What the code does:**
```go
func (c *Client) doGet(url string) ([]byte, error) {
...
return nil, fmt.Errorf("HTTP %d: %s", resp.StatusCode, string(body))
}
```
The error from the raw HTTP response isn't wrapped with `%w`, but the callers wrap it again: `fmt.Errorf("fetch PR: %w", err)`. The inner error starts with a capital "HTTP".
**What the pattern says:**
From `smells/anti-patterns.md` §6 (Error String Formatting): error strings should be lowercase. They compose upward: `fetch PR: HTTP 404: ...` has inconsistent casing.
**How to fix:**
```go
return nil, fmt.Errorf("http %d: %s", resp.StatusCode, string(body))
```
---
### 8. Prompt building uses excessive `WriteString` (MINOR)
**What the code does:**
```go
sb.WriteString("You are an expert code reviewer...\n\n")
sb.WriteString("Your task:\n")
sb.WriteString("1. Review the diff...\n")
// ... 20+ more lines
```
**What the pattern says:**
From `style.md`: code should be readable and maintainable. A raw string literal would be far more readable for a multi-line prompt template.
**How to fix:**
```go
const systemPromptTemplate = `You are an expert code reviewer. Review the provided pull request diff carefully.
Your task:
1. Review the diff for correctness, idiomatic code, potential bugs, and design issues.
...
`
func BuildSystemPrompt(conventions string) string {
prompt := systemPromptTemplate
if conventions != "" {
prompt += fmt.Sprintf("\n\nThe repository has the following coding conventions...\n\n%s\n", conventions)
}
return prompt
}
```
---
### 9 & 10. No concurrency safety documentation (MINOR)
**What the code does:**
Neither `gitea.Client` nor `llm.Client` documents whether they're safe for concurrent use.
**What the pattern says:**
From `documentation.md` §9 (Concurrency Documentation): "Doc comments explicitly state the concurrency safety of a type." Since both types embed `*http.Client` (which IS safe for concurrent use), the wrapping types should document this.
**How to fix:**
```go
// Client interacts with the Gitea API.
// A Client is safe for concurrent use by multiple goroutines.
type Client struct { ... }
```
---
### 11. Tests don't use `t.Run` subtests (MINOR)
**What the code does:**
`gitea/client_test.go` defines 8 separate `TestXxx` functions, each creating their own httptest server.
**What the pattern says:**
From `testing-advanced.md` §1 (Table-Driven Tests with `t.Run`): related tests should use named subtests for filterability and clarity. The Gitea client tests share identical setup patterns — they'd benefit from a shared helper.
**How to fix:**
Consider a test helper that creates a server with a handler map, then use `t.Run` for each case. The existing structure is acceptable but could be DRYer.
---
### 12. Inconsistent repo parsing in `integration_test.go` (MINOR)
**What the code does:**
```go
for i, c := range giteaRepo {
if c == '/' {
owner = giteaRepo[:i]
repoName = giteaRepo[i+1:]
break
}
}
```
**What the pattern says:**
From `style.md` §3 (File Organization by Responsibility): related logic should be consistent. `main.go` uses `strings.SplitN(*repo, "/", 2)` for the same operation. The integration test reinvents it with a manual loop.
**How to fix:**
Use `strings.SplitN(giteaRepo, "/", 2)` for consistency, or extract a shared helper.
---
### 13. Hardcoded `Temperature: 0.1` (MINOR)
**What the code does:**
```go
reqBody := ChatRequest{
...
Temperature: 0.1,
}
```
**What the pattern says:**
From `configuration.md` §1 (Zero-Value Usable Config): "Every field documents its zero-value behavior." The temperature is buried in implementation. It should be configurable (e.g., a field on `Client` with a documented default).
**How to fix:**
Add a `Temperature` field to `Client` with documentation:
```go
type Client struct {
...
// Temperature controls LLM randomness. If zero, defaults to 0.1.
Temperature float64
}
```
---
### 14. Unnecessary `string()` → `strings.NewReader` conversion (NIT)
**What the code does:**
```go
data, err := json.Marshal(payload)
...
req, err := http.NewRequest("POST", url, strings.NewReader(string(data)))
```
**What the pattern says:**
From `style.md`: avoid unnecessary allocations. `json.Marshal` returns `[]byte`; use `bytes.NewReader(data)` directly to avoid the `[]byte→string` copy.
**How to fix:**
```go
req, err := http.NewRequest("POST", url, bytes.NewReader(data))
```
---
### 15. Missing doc comment on `GiteaEvent` (NIT)
**What the code does:**
```go
// GiteaEvent converts the verdict to the Gitea API event string.
func GiteaEvent(verdict string) string {
```
Actually, this one DOES have a doc comment. On closer inspection the comment exists. Removing this finding — **correction**: the comment is present but minimal. It doesn't document the mapping or the "COMMENT" fallback behavior. This is borderline.
---
### 16. `evaluateCIStatus` in `main` package (NIT)
**What the code does:**
The `evaluateCIStatus` function lives in `cmd/review-bot/main.go` and operates on `[]gitea.CommitStatus`.
**What the pattern says:**
From `package-design.md`: packages should encapsulate related logic. This function interprets CI status semantics — it belongs in the `review` package (or even `gitea`) where it could be unit-tested independently without building the entire binary.
---
### 17 & 18. No interfaces for testability (NIT)
**What the code does:**
`main.go` directly uses `*gitea.Client` and `*llm.Client` concrete types.
**What the pattern says:**
From `interfaces.md`: "Define interfaces in the package that USES them." From `smells/common-mistakes.md` §10 (Premature Abstraction): don't create interfaces before you need them. However, the consumer (`main.go`) would benefit from small interfaces for testing the orchestration logic independently.
**How to fix (when needed):**
```go
// In main or a review orchestrator package:
type PRFetcher interface {
GetPullRequest(ctx context.Context, owner, repo string, number int) (*gitea.PullRequest, error)
GetPullRequestDiff(ctx context.Context, owner, repo string, number int) (string, error)
}
```
Note: This is a NIT because the current code doesn't have tests for `main.go` orchestration. If/when that's needed, interfaces become valuable.
---
### 19. `extractJSON` edge case (NIT)
**What the code does:**
```go
if strings.HasPrefix(s, "```") {
lines := strings.Split(s, "\n")
if len(lines) > 2 {
lines = lines[1:]
}
...
}
```
If input is exactly `` ```json\n``` `` (fence with empty body), it produces an empty string that will fail JSON parse with a confusing error message.
**What the pattern says:**
From `error-handling.md`: errors should carry context. Consider returning an explicit error from `extractJSON` when the extracted content is empty after fence stripping.
---
### 20. No package doc comments (NIT)
**What the code does:**
None of the packages (`gitea`, `llm`, `review`) have `// Package xxx ...` doc comments.
**What the pattern says:**
From `documentation.md` §1 (Package Documentation): "The first file in a package starts with a `// Package xxx ...` comment that explains the package's purpose."
**How to fix:**
Add to each package's primary file:
```go
// Package gitea provides a client for the Gitea API, focused on pull request review operations.
package gitea
```
---
## Positive Patterns
The codebase does several things well:
1. **Clean package separation** — `gitea/`, `llm/`, `review/`, `cmd/` each have a single responsibility. This matches `package-design.md` perfectly.
2. **Consistent error wrapping** — Every public function wraps errors with `fmt.Errorf("context: %w", err)`, providing clear error chains. This follows `error-handling.md` closely.
3. **Return concrete types from constructors** — `NewClient()` returns `*Client`, not an interface. Matches `smells/common-mistakes.md` §7 and `smells/anti-patterns.md` §8.
4. **httptest-based testing** — Both client packages use `net/http/httptest` for isolated, deterministic tests. No external dependencies needed.
5. **Good test coverage of error paths** — Tests cover 404s, bad JSON, connection failures, invalid severities, missing fields. This is thorough.
6. **Zero dependencies** — `go.mod` has no external dependencies. The entire project uses only the standard library. This is excellent for a focused tool.
7. **Build-tagged integration test** — The `//go:build integration` tag keeps expensive tests separate from unit tests. Good practice.
8. **`strings.Builder` usage** — Prompt building and formatting use `strings.Builder` correctly for efficient string construction.
9. **Named return values where useful** — `evaluateCIStatus` uses named returns `(passed bool, details string)` for documentation clarity, matching `style.md` §5.
10. **No premature abstraction** — The code doesn't define interfaces it doesn't need yet. It's concrete and straightforward, following `smells/common-mistakes.md` §10.
## Recommendations
Priority-ordered list of improvements:
1. **Add `context.Context` to all client methods** (Critical) — This is the single most impactful change. LLM calls can hang indefinitely without timeout support. Both `gitea.Client` and `llm.Client` should accept context as the first parameter on all public methods. Use `http.NewRequestWithContext`.
2. **Unexport client struct fields** (High) — `Token`, `APIKey`, `BaseURL` should be unexported to prevent accidental logging/serialization of credentials. Expose only what's needed via methods or constructor options.
3. **Add package documentation** (Medium) — Each package needs a `// Package xxx ...` comment. This takes 5 minutes and significantly improves discoverability.
4. **Extract `evaluateCIStatus` to `review` package** (Medium) — Makes it independently testable and keeps `main.go` focused on orchestration.
5. **Use `run() error` pattern in main** (Medium) — Enables deferred cleanup and makes the orchestration logic more testable.
6. **Replace `WriteString` chain with raw string literal** (Low) — Pure readability improvement for `BuildSystemPrompt`.
7. **Make LLM temperature configurable** (Low) — Add as a field on `Client` with documented zero-value default.
8. **Use `bytes.NewReader` instead of `strings.NewReader(string(...))` in PostReview** (Low) — Eliminates one unnecessary allocation.
9. **Add concurrency documentation to Client types** (Low) — One-line doc additions.
10. **Consider consumer-side interfaces when testing `main` orchestration** (Future) — Not needed now, but will become valuable if the `main.go` logic grows or needs unit testing.
+226
View File
@@ -0,0 +1,226 @@
// Package budget manages LLM context window budgeting for review-bot.
//
// It estimates token usage and progressively trims context content to fit
// within model-specific limits. The trimming order (least important first):
// patterns → conventions → file context → diff truncation.
package budget
import (
"fmt"
"strings"
"unicode/utf8"
)
// modelLimit pairs a model name prefix with its context window size.
type modelLimit struct {
prefix string
limit int
}
// Known model context limits (in tokens), ordered longest-prefix-first
// for deterministic matching.
var modelLimits = []modelLimit{
{"claude-haiku-3.5-20241022", 200_000},
{"claude-sonnet-4-20250514", 200_000},
{"claude-opus-4-20250514", 200_000},
{"gpt-4.1-mini", 128_000},
{"gpt-5-mini", 200_000},
{"gpt-4.1", 128_000},
{"gpt-5", 200_000},
}
const defaultLimit = 128_000
// reserveTokens is headroom for the response generation.
const reserveTokens = 4_000
const diffTruncMarker = "\n\n... [diff truncated due to context limit] ..."
const diffTooLargeMarker = "... [diff too large for context window — review manually] ..."
const userMetaTruncMarker = "\n... [description truncated] ..."
// EstimateTokens estimates the number of tokens in a string.
// Uses the rough heuristic of ~4 bytes per token, which is
// conservative for English text and code.
func EstimateTokens(s string) int {
return len(s) / 4
}
// LimitForModel returns the context window size for the given model.
// Uses longest-prefix-first matching for deterministic results.
func LimitForModel(model string) int {
for _, ml := range modelLimits {
if model == ml.prefix || strings.HasPrefix(model, ml.prefix) {
return ml.limit
}
}
return defaultLimit
}
// Sections holds the prompt content sections in trim priority order.
// When the total exceeds the budget, sections are trimmed from least
// important (Patterns) to most important (Diff).
type Sections struct {
SystemBase string // Core instructions (never trimmed)
Patterns string // Language patterns (trimmed first)
Conventions string // Repo conventions (trimmed second)
FileContext string // Full file content (trimmed third)
Diff string // The actual diff (trimmed last, only truncated)
UserMeta string // PR title, description, CI status (truncated only if base exceeds budget)
}
// Result holds the trimmed content and metadata about what was dropped.
type Result struct {
SystemPrompt string
UserPrompt string
Trimmed []string // Human-readable descriptions of what was trimmed
EstTokens int // Estimated total tokens after trimming
}
// Fit trims sections to fit within the model's context limit.
// Returns the assembled prompts and a list of what was trimmed.
func Fit(model string, sections Sections) Result {
limit := LimitForModel(model) - reserveTokens
baseTokens := EstimateTokens(sections.SystemBase) + EstimateTokens(sections.UserMeta)
available := limit - baseTokens
if available < 0 {
// Base content alone exceeds budget. Truncate UserMeta (keep first ~1000 tokens).
if len(sections.UserMeta) > 4000 {
sections.UserMeta = truncateUTF8(sections.UserMeta, 4000) + userMetaTruncMarker
baseTokens = EstimateTokens(sections.SystemBase) + EstimateTokens(sections.UserMeta)
available = limit - baseTokens
}
if available < 0 {
available = 0
}
}
// Trimmable sections in priority order (first = dropped first)
type entry struct {
name string
content *string
}
entries := []entry{
{"patterns", &sections.Patterns},
{"conventions", &sections.Conventions},
{"file context", &sections.FileContext},
}
// Check if everything fits
totalTrimmable := EstimateTokens(sections.Diff)
for _, e := range entries {
totalTrimmable += EstimateTokens(*e.content)
}
var trimmed []string
if totalTrimmable > available {
// Trim from least important
for i := range entries {
tokens := EstimateTokens(*entries[i].content)
if tokens == 0 {
continue
}
trimmed = append(trimmed, fmt.Sprintf("%s (~%dK tokens)", entries[i].name, tokens/1000))
*entries[i].content = ""
// Recalculate
totalTrimmable = EstimateTokens(sections.Diff)
for _, e := range entries {
totalTrimmable += EstimateTokens(*e.content)
}
if totalTrimmable <= available {
break
}
}
}
// If still too large, truncate the diff
if totalTrimmable > available {
diffBudget := available
for _, e := range entries {
diffBudget -= EstimateTokens(*e.content)
}
if diffBudget < 0 {
diffBudget = 0
}
// Reserve space for truncation marker
markerBudget := EstimateTokens(diffTruncMarker)
effectiveBudget := diffBudget - markerBudget
if effectiveBudget < 0 {
effectiveBudget = 0
}
maxChars := effectiveBudget * 4
if maxChars < len(sections.Diff) {
removed := EstimateTokens(sections.Diff) - diffBudget
trimmed = append(trimmed, fmt.Sprintf("diff truncated (~%dK tokens removed)", removed/1000))
if maxChars > 0 {
if diffBudget >= markerBudget {
sections.Diff = truncateUTF8(sections.Diff, maxChars) + diffTruncMarker
} else {
sections.Diff = truncateUTF8(sections.Diff, maxChars)
}
} else {
sections.Diff = diffTooLargeMarker
}
}
}
finalTokens := baseTokens
for _, e := range entries {
finalTokens += EstimateTokens(*e.content)
}
finalTokens += EstimateTokens(sections.Diff)
return buildResult(sections, trimmed, finalTokens)
}
func buildResult(s Sections, trimmed []string, estTokens int) Result {
var sys strings.Builder
sys.WriteString(s.SystemBase)
if s.Patterns != "" {
sys.WriteString("\n\n## Language Patterns & Idioms\n\nUse the following patterns as review criteria. Code that violates these established patterns is a finding:\n\n")
sys.WriteString(s.Patterns)
}
if s.Conventions != "" {
sys.WriteString("\n\n## Repository Conventions\n\nThe repository has the following coding conventions that must be respected:\n\n")
sys.WriteString(s.Conventions)
}
var usr strings.Builder
usr.WriteString(s.UserMeta)
if s.FileContext != "" {
usr.WriteString("\n### Full File Context (modified files)\n\n")
usr.WriteString(s.FileContext)
usr.WriteString("\n")
}
if s.Diff != "" {
usr.WriteString("\n### Diff (changes to review)\n\n```diff\n")
usr.WriteString(s.Diff)
usr.WriteString("\n```\n")
}
if len(trimmed) > 0 {
usr.WriteString("\n⚠️ Note: Context was trimmed to fit model limits. Dropped: ")
usr.WriteString(strings.Join(trimmed, ", "))
usr.WriteString("\n")
}
return Result{
SystemPrompt: sys.String(),
UserPrompt: usr.String(),
Trimmed: trimmed,
EstTokens: estTokens,
}
}
// truncateUTF8 truncates s to at most maxBytes without splitting multi-byte
// UTF-8 characters. Returns a valid UTF-8 string of at most maxBytes bytes.
func truncateUTF8(s string, maxBytes int) string {
if len(s) <= maxBytes {
return s
}
for maxBytes > 0 && !utf8.RuneStart(s[maxBytes]) {
maxBytes--
}
return s[:maxBytes]
}
+203
View File
@@ -0,0 +1,203 @@
package budget
import (
"strings"
"testing"
)
func TestEstimateTokens(t *testing.T) {
tests := []struct {
input string
want int
}{
{"", 0},
{"abcd", 1},
{"12345678", 2},
{strings.Repeat("x", 400), 100},
}
for _, tt := range tests {
got := EstimateTokens(tt.input)
if got != tt.want {
t.Errorf("EstimateTokens(%d chars) = %d, want %d", len(tt.input), got, tt.want)
}
}
}
func TestLimitForModel(t *testing.T) {
tests := []struct {
model string
want int
}{
{"gpt-4.1", 128_000},
{"gpt-5", 200_000},
{"gpt-5-mini", 200_000},
{"unknown-model", defaultLimit},
{"gpt-4.1-2026-01-01", 128_000}, // prefix match
}
for _, tt := range tests {
got := LimitForModel(tt.model)
if got != tt.want {
t.Errorf("LimitForModel(%q) = %d, want %d", tt.model, got, tt.want)
}
}
}
func TestFit_AllFits(t *testing.T) {
s := Sections{
SystemBase: "system instructions",
Patterns: "some patterns",
Conventions: "some conventions",
FileContext: "file content",
Diff: "diff content",
UserMeta: "PR: title\n",
}
result := Fit("gpt-5", s)
if len(result.Trimmed) != 0 {
t.Errorf("expected no trimming, got %v", result.Trimmed)
}
if !strings.Contains(result.SystemPrompt, "some patterns") {
t.Error("expected patterns in system prompt")
}
if !strings.Contains(result.SystemPrompt, "some conventions") {
t.Error("expected conventions in system prompt")
}
if !strings.Contains(result.UserPrompt, "file content") {
t.Error("expected file context in user prompt")
}
}
func TestFit_TrimsPatterns(t *testing.T) {
// Create content that exceeds 128K token budget for gpt-4.1
// Budget ≈ 128K - 4K reserve = 124K tokens = ~496K chars
// Fill patterns with enough to push over
bigPatterns := strings.Repeat("x", 500_000) // ~125K tokens
s := Sections{
SystemBase: "base",
Patterns: bigPatterns,
Conventions: "conventions",
FileContext: "files",
Diff: "diff",
UserMeta: "meta",
}
result := Fit("gpt-4.1", s)
if len(result.Trimmed) == 0 {
t.Fatal("expected trimming")
}
if !strings.Contains(result.Trimmed[0], "patterns") {
t.Errorf("expected patterns to be trimmed first, got %v", result.Trimmed)
}
if strings.Contains(result.SystemPrompt, bigPatterns[:100]) {
t.Error("expected patterns to be removed from output")
}
// Conventions should survive
if !strings.Contains(result.SystemPrompt, "conventions") {
t.Error("expected conventions to survive after patterns trimmed")
}
}
func TestFit_TrimsConventions(t *testing.T) {
// Patterns + conventions + diff all exceed budget even after patterns removed
big := strings.Repeat("y", 520_000) // ~130K tokens each (exceeds 124K budget even alone)
s := Sections{
SystemBase: "base",
Patterns: big,
Conventions: big,
FileContext: "files",
Diff: "diff",
UserMeta: "meta",
}
result := Fit("gpt-4.1", s)
if len(result.Trimmed) < 2 {
t.Fatalf("expected at least 2 trimmed, got %v", result.Trimmed)
}
if !strings.Contains(result.Trimmed[0], "patterns") {
t.Errorf("expected patterns trimmed first, got %s", result.Trimmed[0])
}
if !strings.Contains(result.Trimmed[1], "conventions") {
t.Errorf("expected conventions trimmed second, got %s", result.Trimmed[1])
}
}
func TestFit_TruncatesDiff(t *testing.T) {
// Only diff is huge, no patterns/conventions
hugeDiff := strings.Repeat("z", 600_000) // ~150K tokens > 128K limit
s := Sections{
SystemBase: "base",
Diff: hugeDiff,
UserMeta: "meta",
}
result := Fit("gpt-4.1", s)
if len(result.Trimmed) == 0 {
t.Fatal("expected diff truncation")
}
if !strings.Contains(result.Trimmed[len(result.Trimmed)-1], "diff truncated") {
t.Errorf("expected diff truncation note, got %v", result.Trimmed)
}
if !strings.Contains(result.UserPrompt, "[diff truncated due to context limit]") {
t.Error("expected truncation marker in user prompt")
}
}
func TestFit_PreservesNoteInOutput(t *testing.T) {
big := strings.Repeat("w", 500_000)
s := Sections{
SystemBase: "base",
Patterns: big,
Diff: "small diff",
UserMeta: "meta",
}
result := Fit("gpt-4.1", s)
if !strings.Contains(result.UserPrompt, "⚠️ Note: Context was trimmed") {
t.Error("expected trimming note in user prompt")
}
}
func TestFit_HugeUserMeta(t *testing.T) {
// UserMeta so large that base alone exceeds limit
// Use a unique marker past the truncation point
hugeDesc := strings.Repeat("d", 5000) + "UNIQUE_MARKER_PAST_TRUNCATION" + strings.Repeat("d", 595_000)
s := Sections{
SystemBase: "base",
Diff: "small diff",
UserMeta: hugeDesc,
}
result := Fit("gpt-4.1", s)
limit := LimitForModel("gpt-4.1") - reserveTokens
if result.EstTokens > limit {
t.Errorf("EstTokens %d exceeds limit %d", result.EstTokens, limit)
}
// Content past truncation point should not be present
if strings.Contains(result.UserPrompt, "UNIQUE_MARKER_PAST_TRUNCATION") {
t.Error("expected UserMeta to be truncated but found content past truncation point")
}
// Truncation marker should be present
if !strings.Contains(result.UserPrompt, "[description truncated]") {
t.Error("expected truncation marker in output")
}
}
func TestFit_NeverExceedsLimit(t *testing.T) {
// All sections huge — verify final tokens never exceed limit
big := strings.Repeat("a", 200_000)
s := Sections{
SystemBase: strings.Repeat("s", 8000),
Patterns: big,
Conventions: big,
FileContext: big,
Diff: big,
UserMeta: strings.Repeat("m", 8000),
}
result := Fit("gpt-4.1", s)
limit := LimitForModel("gpt-4.1") - reserveTokens
if result.EstTokens > limit {
t.Errorf("EstTokens %d exceeds limit %d (trimmed: %v)", result.EstTokens, limit, result.Trimmed)
}
}
+28 -8
View File
@@ -10,6 +10,7 @@ import (
"strings" "strings"
"time" "time"
"gitea.weiker.me/rodin/review-bot/budget"
"gitea.weiker.me/rodin/review-bot/gitea" "gitea.weiker.me/rodin/review-bot/gitea"
"gitea.weiker.me/rodin/review-bot/llm" "gitea.weiker.me/rodin/review-bot/llm"
"gitea.weiker.me/rodin/review-bot/review" "gitea.weiker.me/rodin/review-bot/review"
@@ -18,6 +19,7 @@ import (
var version = "dev" var version = "dev"
func main() { func main() {
versionFlag := flag.Bool("version", false, "Print version and exit")
// CLI flags // CLI flags
giteaURL := flag.String("gitea-url", envOrDefault("GITEA_URL", ""), "Gitea instance URL") giteaURL := flag.String("gitea-url", envOrDefault("GITEA_URL", ""), "Gitea instance URL")
repo := flag.String("repo", envOrDefault("GITEA_REPO", ""), "Repository (owner/name)") repo := flag.String("repo", envOrDefault("GITEA_REPO", ""), "Repository (owner/name)")
@@ -36,6 +38,13 @@ func main() {
flag.Parse() flag.Parse()
if *versionFlag {
fmt.Printf("review-bot %s\n", version)
os.Exit(0)
}
log.Printf("review-bot %s", version)
// Validate required fields // Validate required fields
if *giteaURL == "" || *repo == "" || *prNum == "" || *reviewerToken == "" || if *giteaURL == "" || *repo == "" || *prNum == "" || *reviewerToken == "" ||
*llmBaseURL == "" || *llmAPIKey == "" || *llmModel == "" { *llmBaseURL == "" || *llmAPIKey == "" || *llmModel == "" {
@@ -133,15 +142,26 @@ func main() {
log.Printf("Loaded patterns from %s (%d bytes)", *patternsRepo, len(patterns)) log.Printf("Loaded patterns from %s (%d bytes)", *patternsRepo, len(patterns))
} }
// Step 7: Build prompts // Step 7: Budget-aware prompt assembly
systemPrompt := review.BuildSystemPrompt(conventions, patterns) sections := budget.Sections{
userPrompt := review.BuildUserPrompt(pr.Title, pr.Body, diff, fileContext, ciPassed, ciDetails) SystemBase: review.BuildSystemBase(),
Patterns: patterns,
Conventions: conventions,
FileContext: fileContext,
Diff: diff,
UserMeta: review.BuildUserMeta(pr.Title, pr.Body, ciPassed, ciDetails),
}
budgetResult := budget.Fit(*llmModel, sections)
log.Printf("Token estimate: ~%dK (limit: %dK)", budgetResult.EstTokens/1000, budget.LimitForModel(*llmModel)/1000)
if len(budgetResult.Trimmed) > 0 {
log.Printf("Context trimmed: %v", budgetResult.Trimmed)
}
// Step 8: Call LLM // Step 8: Call LLM
log.Printf("Sending to LLM (%s)...", *llmModel) log.Printf("Sending to LLM (%s)...", *llmModel)
messages := []llm.Message{ messages := []llm.Message{
{Role: "system", Content: systemPrompt}, {Role: "system", Content: budgetResult.SystemPrompt},
{Role: "user", Content: userPrompt}, {Role: "user", Content: budgetResult.UserPrompt},
} }
response, err := llmClient.Complete(ctx, messages) response, err := llmClient.Complete(ctx, messages)
@@ -235,12 +255,12 @@ func fetchPatterns(ctx context.Context, client *gitea.Client, patternsRepo, patt
continue continue
} }
for filepath, content := range files { for filePath, content := range files {
// Only include markdown and text files as patterns // Only include markdown and text files as patterns
if !isPatternFile(filepath) { if !isPatternFile(filePath) {
continue continue
} }
sb.WriteString(fmt.Sprintf("### %s/%s\n\n%s\n\n", repoRef, filepath, content)) sb.WriteString(fmt.Sprintf("### %s/%s\n\n%s\n\n", repoRef, filePath, content))
} }
} }
} }
+40 -18
View File
@@ -1,3 +1,6 @@
// Package gitea provides a client for the Gitea API.
// It supports pull request operations, file content retrieval,
// and review submission.
package gitea package gitea
import ( import (
@@ -8,6 +11,7 @@ import (
"io" "io"
"log" "log"
"net/http" "net/http"
"net/url"
"strings" "strings"
"time" "time"
) )
@@ -55,8 +59,8 @@ type ChangedFile struct {
// GetPullRequest fetches PR metadata. // GetPullRequest fetches PR metadata.
func (c *Client) GetPullRequest(ctx context.Context, owner, repo string, number int) (*PullRequest, error) { func (c *Client) GetPullRequest(ctx context.Context, owner, repo string, number int) (*PullRequest, error) {
url := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d", c.baseURL, owner, repo, number) reqURL := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d", c.baseURL, owner, repo, number)
body, err := c.doGet(ctx, url) body, err := c.doGet(ctx, reqURL)
if err != nil { if err != nil {
return nil, fmt.Errorf("fetch PR: %w", err) return nil, fmt.Errorf("fetch PR: %w", err)
} }
@@ -69,8 +73,8 @@ func (c *Client) GetPullRequest(ctx context.Context, owner, repo string, number
// GetPullRequestDiff fetches the unified diff for a PR. // GetPullRequestDiff fetches the unified diff for a PR.
func (c *Client) GetPullRequestDiff(ctx context.Context, owner, repo string, number int) (string, error) { func (c *Client) GetPullRequestDiff(ctx context.Context, owner, repo string, number int) (string, error) {
url := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d.diff", c.baseURL, owner, repo, number) reqURL := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d.diff", c.baseURL, owner, repo, number)
body, err := c.doGet(ctx, url) body, err := c.doGet(ctx, reqURL)
if err != nil { if err != nil {
return "", fmt.Errorf("fetch diff: %w", err) return "", fmt.Errorf("fetch diff: %w", err)
} }
@@ -79,8 +83,8 @@ func (c *Client) GetPullRequestDiff(ctx context.Context, owner, repo string, num
// GetPullRequestFiles fetches the list of files changed in a PR. // GetPullRequestFiles fetches the list of files changed in a PR.
func (c *Client) GetPullRequestFiles(ctx context.Context, owner, repo string, number int) ([]ChangedFile, error) { func (c *Client) GetPullRequestFiles(ctx context.Context, owner, repo string, number int) ([]ChangedFile, error) {
url := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d/files", c.baseURL, owner, repo, number) reqURL := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d/files", c.baseURL, owner, repo, number)
body, err := c.doGet(ctx, url) body, err := c.doGet(ctx, reqURL)
if err != nil { if err != nil {
return nil, fmt.Errorf("fetch PR files: %w", err) return nil, fmt.Errorf("fetch PR files: %w", err)
} }
@@ -93,8 +97,8 @@ func (c *Client) GetPullRequestFiles(ctx context.Context, owner, repo string, nu
// GetCommitStatuses fetches CI statuses for a commit SHA. // GetCommitStatuses fetches CI statuses for a commit SHA.
func (c *Client) GetCommitStatuses(ctx context.Context, owner, repo, sha string) ([]CommitStatus, error) { func (c *Client) GetCommitStatuses(ctx context.Context, owner, repo, sha string) ([]CommitStatus, error) {
url := fmt.Sprintf("%s/api/v1/repos/%s/%s/commits/%s/statuses", c.baseURL, owner, repo, sha) reqURL := fmt.Sprintf("%s/api/v1/repos/%s/%s/commits/%s/statuses", c.baseURL, owner, repo, sha)
body, err := c.doGet(ctx, url) body, err := c.doGet(ctx, reqURL)
if err != nil { if err != nil {
return nil, fmt.Errorf("fetch commit statuses: %w", err) return nil, fmt.Errorf("fetch commit statuses: %w", err)
} }
@@ -107,8 +111,8 @@ func (c *Client) GetCommitStatuses(ctx context.Context, owner, repo, sha string)
// GetFileContent fetches a file from the default branch of a repo. // GetFileContent fetches a file from the default branch of a repo.
func (c *Client) GetFileContent(ctx context.Context, owner, repo, filepath string) (string, error) { func (c *Client) GetFileContent(ctx context.Context, owner, repo, filepath string) (string, error) {
url := fmt.Sprintf("%s/api/v1/repos/%s/%s/raw/%s", c.baseURL, owner, repo, filepath) reqURL := fmt.Sprintf("%s/api/v1/repos/%s/%s/raw/%s", c.baseURL, owner, repo, escapePath(filepath))
body, err := c.doGet(ctx, url) body, err := c.doGet(ctx, reqURL)
if err != nil { if err != nil {
return "", fmt.Errorf("fetch file %s: %w", filepath, err) return "", fmt.Errorf("fetch file %s: %w", filepath, err)
} }
@@ -117,8 +121,8 @@ func (c *Client) GetFileContent(ctx context.Context, owner, repo, filepath strin
// GetFileContentRef fetches a file from a specific ref (branch/tag/sha) in a repo. // GetFileContentRef fetches a file from a specific ref (branch/tag/sha) in a repo.
func (c *Client) GetFileContentRef(ctx context.Context, owner, repo, filepath, ref string) (string, error) { func (c *Client) GetFileContentRef(ctx context.Context, owner, repo, filepath, ref string) (string, error) {
url := fmt.Sprintf("%s/api/v1/repos/%s/%s/raw/%s?ref=%s", c.baseURL, owner, repo, filepath, ref) reqURL := fmt.Sprintf("%s/api/v1/repos/%s/%s/raw/%s?ref=%s", c.baseURL, owner, repo, escapePath(filepath), url.QueryEscape(ref))
body, err := c.doGet(ctx, url) body, err := c.doGet(ctx, reqURL)
if err != nil { if err != nil {
return "", fmt.Errorf("fetch file %s@%s: %w", filepath, ref, err) return "", fmt.Errorf("fetch file %s@%s: %w", filepath, ref, err)
} }
@@ -128,7 +132,7 @@ func (c *Client) GetFileContentRef(ctx context.Context, owner, repo, filepath, r
// PostReview submits a review to a PR. // PostReview submits a review to a PR.
// event should be "APPROVED" or "REQUEST_CHANGES". // event should be "APPROVED" or "REQUEST_CHANGES".
func (c *Client) PostReview(ctx context.Context, owner, repo string, number int, event, body string) error { func (c *Client) PostReview(ctx context.Context, owner, repo string, number int, event, body string) error {
url := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d/reviews", c.baseURL, owner, repo, number) reqURL := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d/reviews", c.baseURL, owner, repo, number)
payload := struct { payload := struct {
Body string `json:"body"` Body string `json:"body"`
@@ -143,7 +147,7 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,
return fmt.Errorf("marshal review payload: %w", err) return fmt.Errorf("marshal review payload: %w", err)
} }
req, err := http.NewRequestWithContext(ctx, "POST", url, bytes.NewReader(data)) req, err := http.NewRequestWithContext(ctx, http.MethodPost, reqURL, bytes.NewReader(data))
if err != nil { if err != nil {
return fmt.Errorf("create review request: %w", err) return fmt.Errorf("create review request: %w", err)
} }
@@ -163,8 +167,8 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,
return nil return nil
} }
func (c *Client) doGet(ctx context.Context, url string) ([]byte, error) { func (c *Client) doGet(ctx context.Context, reqURL string) ([]byte, error) {
req, err := http.NewRequestWithContext(ctx, "GET", url, nil) req, err := http.NewRequestWithContext(ctx, http.MethodGet, reqURL, nil)
if err != nil { if err != nil {
return nil, err return nil, err
} }
@@ -183,6 +187,18 @@ func (c *Client) doGet(ctx context.Context, url string) ([]byte, error) {
return io.ReadAll(resp.Body) return io.ReadAll(resp.Body)
} }
// escapePath escapes each segment of a relative file path for use in URLs.
// Slashes are preserved as path separators; other special characters are escaped.
// Input should be a relative path (no leading slash). Already-encoded segments
// will be double-encoded, which is the desired behavior for user-provided paths.
func escapePath(p string) string {
parts := strings.Split(p, "/")
for i, part := range parts {
parts[i] = url.PathEscape(part)
}
return strings.Join(parts, "/")
}
// ContentEntry represents a file or directory entry from the contents API. // ContentEntry represents a file or directory entry from the contents API.
type ContentEntry struct { type ContentEntry struct {
Name string `json:"name"` Name string `json:"name"`
@@ -191,9 +207,15 @@ type ContentEntry struct {
} }
// ListContents lists files and directories at a given path in a repo. // ListContents lists files and directories at a given path in a repo.
// Pass an empty path to list the repository root.
func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]ContentEntry, error) { func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]ContentEntry, error) {
url := fmt.Sprintf("%s/api/v1/repos/%s/%s/contents/%s", c.baseURL, owner, repo, path) var reqURL string
body, err := c.doGet(ctx, url) if path == "" {
reqURL = fmt.Sprintf("%s/api/v1/repos/%s/%s/contents", c.baseURL, owner, repo)
} else {
reqURL = fmt.Sprintf("%s/api/v1/repos/%s/%s/contents/%s", c.baseURL, owner, repo, escapePath(path))
}
body, err := c.doGet(ctx, reqURL)
if err != nil { if err != nil {
return nil, fmt.Errorf("list contents %s: %w", path, err) return nil, fmt.Errorf("list contents %s: %w", path, err)
} }
+24
View File
@@ -294,3 +294,27 @@ func TestGetAllFilesInPath_File(t *testing.T) {
t.Errorf("unexpected content: %q", files["README.md"]) t.Errorf("unexpected content: %q", files["README.md"])
} }
} }
func TestEscapePath(t *testing.T) {
tests := []struct {
name string
input string
want string
}{
{"simple", "src/main.go", "src/main.go"},
{"spaces", "my dir/my file.go", "my%20dir/my%20file.go"},
{"special chars", "path/file#1.txt", "path/file%231.txt"},
{"empty", "", ""},
{"single segment", "README.md", "README.md"},
{"nested deep", "a/b/c/d.md", "a/b/c/d.md"},
{"already encoded", "path/file%20name.go", "path/file%2520name.go"},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := escapePath(tt.input)
if got != tt.want {
t.Errorf("escapePath(%q) = %q, want %q", tt.input, got, tt.want)
}
})
}
}
+1
View File
@@ -1,3 +1,4 @@
// Package llm provides a client for OpenAI-compatible chat completion APIs.
package llm package llm
import ( import (
+28 -4
View File
@@ -1,3 +1,5 @@
// Package review builds prompts for AI code review and parses LLM responses
// into structured review results.
package review package review
import ( import (
@@ -5,8 +7,10 @@ import (
"strings" "strings"
) )
// BuildSystemPrompt constructs the system prompt for the LLM reviewer. // BuildSystemBase returns the core system prompt instructions without
func BuildSystemPrompt(conventions, patterns string) string { // patterns or conventions. Used by the budget package to separate
// trimmable from non-trimmable content.
func BuildSystemBase() string {
var sb strings.Builder var sb strings.Builder
sb.WriteString("You are an expert code reviewer. Review the provided pull request diff carefully.\n\n") sb.WriteString("You are an expert code reviewer. Review the provided pull request diff carefully.\n\n")
@@ -40,6 +44,15 @@ func BuildSystemPrompt(conventions, patterns string) string {
sb.WriteString("- Line numbers should reference the new file line numbers from the diff headers.\n") sb.WriteString("- Line numbers should reference the new file line numbers from the diff headers.\n")
sb.WriteString("- If the diff is empty or trivial (only formatting/whitespace), APPROVE with no findings.\n") sb.WriteString("- If the diff is empty or trivial (only formatting/whitespace), APPROVE with no findings.\n")
return sb.String()
}
// BuildSystemPrompt constructs the full system prompt with patterns and conventions.
// Deprecated: Use BuildSystemBase with budget.Fit for context-aware assembly.
func BuildSystemPrompt(conventions, patterns string) string {
var sb strings.Builder
sb.WriteString(BuildSystemBase())
if patterns != "" { if patterns != "" {
sb.WriteString(fmt.Sprintf("\n\n## Language Patterns & Idioms\n\nUse the following patterns as review criteria. Code that violates these established patterns is a finding:\n\n%s\n", patterns)) sb.WriteString(fmt.Sprintf("\n\n## Language Patterns & Idioms\n\nUse the following patterns as review criteria. Code that violates these established patterns is a finding:\n\n%s\n", patterns))
} }
@@ -51,8 +64,9 @@ func BuildSystemPrompt(conventions, patterns string) string {
return sb.String() return sb.String()
} }
// BuildUserPrompt constructs the user message with PR context. // BuildUserMeta returns the PR metadata header (title, description, CI status)
func BuildUserPrompt(title, description, diff, fileContext string, ciPassed bool, ciDetails string) string { // without the diff or file context. Used by the budget package.
func BuildUserMeta(title, description string, ciPassed bool, ciDetails string) string {
var sb strings.Builder var sb strings.Builder
sb.WriteString(fmt.Sprintf("## Pull Request: %s\n\n", title)) sb.WriteString(fmt.Sprintf("## Pull Request: %s\n\n", title))
@@ -71,6 +85,16 @@ func BuildUserPrompt(title, description, diff, fileContext string, ciPassed bool
sb.WriteString(fmt.Sprintf("CI Details: %s\n", ciDetails)) sb.WriteString(fmt.Sprintf("CI Details: %s\n", ciDetails))
} }
return sb.String()
}
// BuildUserPrompt constructs the user message with PR context.
// Deprecated: Use BuildUserMeta with budget.Fit for context-aware assembly.
func BuildUserPrompt(title, description, diff, fileContext string, ciPassed bool, ciDetails string) string {
var sb strings.Builder
sb.WriteString(BuildUserMeta(title, description, ciPassed, ciDetails))
if fileContext != "" { if fileContext != "" {
sb.WriteString("\n### Full File Context (modified files)\n\n") sb.WriteString("\n### Full File Context (modified files)\n\n")
sb.WriteString(fileContext) sb.WriteString(fileContext)
+40
View File
@@ -116,3 +116,43 @@ func TestBuildUserPrompt_WithoutFileContext(t *testing.T) {
t.Error("should not include file context section when empty") t.Error("should not include file context section when empty")
} }
} }
func TestBuildSystemBase(t *testing.T) {
result := BuildSystemBase()
if result == "" {
t.Fatal("BuildSystemBase returned empty string")
}
if !strings.Contains(result, "expert code reviewer") {
t.Error("expected reviewer role in system base")
}
if !strings.Contains(result, "REQUEST_CHANGES") {
t.Error("expected verdict format in system base")
}
if !strings.Contains(result, "JSON") {
t.Error("expected JSON output instruction in system base")
}
}
func TestBuildUserMeta(t *testing.T) {
result := BuildUserMeta("Fix bug", "Some description", true, "all checks passed")
if !strings.Contains(result, "Fix bug") {
t.Error("expected title in user meta")
}
if !strings.Contains(result, "Some description") {
t.Error("expected description in user meta")
}
if !strings.Contains(result, "PASSED") {
t.Error("expected CI PASSED status")
}
}
func TestBuildUserMeta_CIFailed(t *testing.T) {
result := BuildUserMeta("Title", "", false, "test job failed")
if !strings.Contains(result, "FAILED") {
t.Error("expected CI FAILED status")
}
if strings.Contains(result, "Description") {
t.Error("expected no description section when empty")
}
}