feat: add context budget system for LLM overflow (#19) #20

Merged
rodin merged 6 commits from fix/19-context-overflow into main 2026-05-02 03:07:17 +00:00
Owner

Adds a budget package that estimates token usage and progressively trims context to fit within model-specific limits.

Trim Order (least important first)

  1. Language patterns
  2. Repository conventions
  3. Full file context
  4. Diff (truncated as last resort)

When content is trimmed, a note is appended to the user prompt so the LLM knows context was reduced.

Changes

  • New budget package: Fit(), EstimateTokens(), LimitForModel()
  • Model limit table (GPT-4.1: 128K, GPT-5: 200K, Claude: 200K)
  • Refactored review/prompt.go: BuildSystemBase() and BuildUserMeta() extract non-trimmable content
  • main.go uses budget.Fit() instead of direct prompt assembly
  • 7 unit tests covering all trim paths

Closes #19

Adds a `budget` package that estimates token usage and progressively trims context to fit within model-specific limits. ## Trim Order (least important first) 1. Language patterns 2. Repository conventions 3. Full file context 4. Diff (truncated as last resort) When content is trimmed, a note is appended to the user prompt so the LLM knows context was reduced. ## Changes - New `budget` package: `Fit()`, `EstimateTokens()`, `LimitForModel()` - Model limit table (GPT-4.1: 128K, GPT-5: 200K, Claude: 200K) - Refactored `review/prompt.go`: `BuildSystemBase()` and `BuildUserMeta()` extract non-trimmable content - `main.go` uses `budget.Fit()` instead of direct prompt assembly - 7 unit tests covering all trim paths Closes #19
gpt-review-bot approved these changes 2026-05-02 01:49:46 +00:00
Dismissed
gpt-review-bot left a comment
First-time contributor

Summary

The new budget package and prompt refactor integrate cleanly and the trimming logic is reasonable and well-tested. I found only minor issues and suggestions for robustness and clarity; CI passed so no blocking problems were detected.

Findings

# Severity File Line Finding
1 [MINOR] budget/budget.go 34 LimitForModel uses map iteration to perform prefix matching. Iteration order of a Go map is non-deterministic, which could yield different prefix matches for overlapping keys. This is unlikely to be a problem with the current keys but is brittle if future keys overlap.
2 [MINOR] budget/budget.go 29 EstimateTokens uses a simple len(s)/4 heuristic. This is acceptable as a rough estimate, but it can be inaccurate for non-English text or for tokenizers that differ from the assumption. Consider exposing a pluggable tokenizer or documenting the heuristic limits more explicitly.
3 [MINOR] budget/budget.go 70 When the available budget is negative, the code falls back to available = limit / 2. This heuristic is arbitrary and may be surprising; a clearer policy (e.g. clamp to 0 or use a small constant minimum) or a comment explaining the choice would improve maintainability.
4 [NIT] budget/budget.go 119 The diff truncation enforces a hard minimum diffBudget of 1000 tokens. Consider making this constant configurable or adding a comment explaining why 1000 was chosen to help future maintainers.
5 [NIT] budget/budget.go 163 The trimming note includes an emoji (⚠️). This is fine for most outputs, but if the user of the library expects purely ASCII output it could be unexpected. Consider documenting that the note contains emoji or allowing a plain-text mode.

Recommendation

APPROVE — Overall this change is well implemented: the new budget package cleanly separates trimmable vs non-trimmable content, integrates into main, and is covered by unit tests that exercise trim and truncation paths. I recommend merging with minor follow-ups: (1) make LimitForModel prefix matching deterministic (iterate over a sorted list of prefixes or compute longest-prefix match), (2) document or provide a way to plug in a more accurate tokenizer than the 1 token ≈ 4 chars heuristic, (3) clarify the fallback when available < 0 (comment or adjust the heuristic), and (4) consider making the diffBudget minimum configurable or documented. These are non-blocking improvements — given CI passed and tests cover the behavior, this can be approved.

## Summary The new budget package and prompt refactor integrate cleanly and the trimming logic is reasonable and well-tested. I found only minor issues and suggestions for robustness and clarity; CI passed so no blocking problems were detected. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MINOR] | `budget/budget.go` | 34 | LimitForModel uses map iteration to perform prefix matching. Iteration order of a Go map is non-deterministic, which could yield different prefix matches for overlapping keys. This is unlikely to be a problem with the current keys but is brittle if future keys overlap. | | 2 | [MINOR] | `budget/budget.go` | 29 | EstimateTokens uses a simple len(s)/4 heuristic. This is acceptable as a rough estimate, but it can be inaccurate for non-English text or for tokenizers that differ from the assumption. Consider exposing a pluggable tokenizer or documenting the heuristic limits more explicitly. | | 3 | [MINOR] | `budget/budget.go` | 70 | When the available budget is negative, the code falls back to available = limit / 2. This heuristic is arbitrary and may be surprising; a clearer policy (e.g. clamp to 0 or use a small constant minimum) or a comment explaining the choice would improve maintainability. | | 4 | [NIT] | `budget/budget.go` | 119 | The diff truncation enforces a hard minimum diffBudget of 1000 tokens. Consider making this constant configurable or adding a comment explaining why 1000 was chosen to help future maintainers. | | 5 | [NIT] | `budget/budget.go` | 163 | The trimming note includes an emoji (⚠️). This is fine for most outputs, but if the user of the library expects purely ASCII output it could be unexpected. Consider documenting that the note contains emoji or allowing a plain-text mode. | ## Recommendation **APPROVE** — Overall this change is well implemented: the new budget package cleanly separates trimmable vs non-trimmable content, integrates into main, and is covered by unit tests that exercise trim and truncation paths. I recommend merging with minor follow-ups: (1) make LimitForModel prefix matching deterministic (iterate over a sorted list of prefixes or compute longest-prefix match), (2) document or provide a way to plug in a more accurate tokenizer than the 1 token ≈ 4 chars heuristic, (3) clarify the fallback when available < 0 (comment or adjust the heuristic), and (4) consider making the diffBudget minimum configurable or documented. These are non-blocking improvements — given CI passed and tests cover the behavior, this can be approved.
sonnet-review-bot requested changes 2026-05-02 01:53:37 +00:00
Dismissed
sonnet-review-bot left a comment
First-time contributor

Summary

The new budget system is well-structured and the refactor in main.go looks clean, with solid test coverage for most paths. However, there is a critical edge case where diff truncation can still exceed the model budget due to not accounting for the appended truncation marker, which defeats the primary purpose of this feature.

Findings

# Severity File Line Finding
1 [MAJOR] budget/budget.go 144 When truncating the diff, the code appends a truncation marker after slicing to the exact diff budget without accounting for the marker's tokens. This can make EstTokens exceed the model limit in near-budget cases.
2 [MINOR] budget/budget.go 80 Comment says "keep first 1000 chars" when truncating UserMeta, but the code keeps 4000 characters. Update the comment or the logic for consistency.
3 [MINOR] budget/budget.go 82 UserMeta is truncated by byte index, which may split multi-byte UTF-8 characters and produce invalid UTF-8. Prefer rune-safe truncation using utf8.DecodeRune or by slicing runes.
4 [MINOR] budget/budget.go 144 Diff is truncated by byte index, which may split multi-byte UTF-8 characters. Use rune-safe truncation to avoid producing invalid UTF-8 strings.
5 [MINOR] review/prompt.go 13 New exported function BuildSystemBase lacks unit tests. Repository conventions require tests for every exported function.
6 [MINOR] review/prompt.go 69 New exported function BuildUserMeta lacks unit tests. Repository conventions require tests for every exported function.

Recommendation

REQUEST_CHANGES — Address the budget overshoot by accounting for the truncation marker when slicing the diff: define the marker string as a constant, subtract its estimated token cost (or exact length) from diffBudget before computing maxChars, and ensure finalTokens never exceed the limit (e.g., recompute and adjust if necessary). Add a unit test that exercises the diff-truncation path and asserts EstTokens <= limit. Update the misleading comment about "keep first 1000 chars" to match the 4000-character logic (or change the logic to 1000 if that was intended). For robustness, switch to rune-safe truncation for both UserMeta and Diff to avoid cutting multi-byte UTF-8 characters. Finally, add tests for the new exported functions BuildSystemBase and BuildUserMeta to comply with repository conventions.

## Summary The new budget system is well-structured and the refactor in main.go looks clean, with solid test coverage for most paths. However, there is a critical edge case where diff truncation can still exceed the model budget due to not accounting for the appended truncation marker, which defeats the primary purpose of this feature. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MAJOR] | `budget/budget.go` | 144 | When truncating the diff, the code appends a truncation marker after slicing to the exact diff budget without accounting for the marker's tokens. This can make EstTokens exceed the model limit in near-budget cases. | | 2 | [MINOR] | `budget/budget.go` | 80 | Comment says "keep first 1000 chars" when truncating UserMeta, but the code keeps 4000 characters. Update the comment or the logic for consistency. | | 3 | [MINOR] | `budget/budget.go` | 82 | UserMeta is truncated by byte index, which may split multi-byte UTF-8 characters and produce invalid UTF-8. Prefer rune-safe truncation using utf8.DecodeRune or by slicing runes. | | 4 | [MINOR] | `budget/budget.go` | 144 | Diff is truncated by byte index, which may split multi-byte UTF-8 characters. Use rune-safe truncation to avoid producing invalid UTF-8 strings. | | 5 | [MINOR] | `review/prompt.go` | 13 | New exported function BuildSystemBase lacks unit tests. Repository conventions require tests for every exported function. | | 6 | [MINOR] | `review/prompt.go` | 69 | New exported function BuildUserMeta lacks unit tests. Repository conventions require tests for every exported function. | ## Recommendation **REQUEST_CHANGES** — Address the budget overshoot by accounting for the truncation marker when slicing the diff: define the marker string as a constant, subtract its estimated token cost (or exact length) from diffBudget before computing maxChars, and ensure finalTokens never exceed the limit (e.g., recompute and adjust if necessary). Add a unit test that exercises the diff-truncation path and asserts EstTokens <= limit. Update the misleading comment about "keep first 1000 chars" to match the 4000-character logic (or change the logic to 1000 if that was intended). For robustness, switch to rune-safe truncation for both UserMeta and Diff to avoid cutting multi-byte UTF-8 characters. Finally, add tests for the new exported functions BuildSystemBase and BuildUserMeta to comply with repository conventions.
sonnet-review-bot approved these changes 2026-05-02 02:01:03 +00:00
Dismissed
sonnet-review-bot left a comment
First-time contributor

Summary

Solid addition of a budget-aware context fitting system with good test coverage. The trimming logic and refactor to separate non-trimmable content are well-executed and CI passes.

Findings

# Severity File Line Finding
1 [MINOR] budget/budget.go 136 The 'removed' token count for diff truncation uses diffBudget instead of effectiveBudget (which reserves space for the truncation marker). This can produce negative or misleading values in the 'diff truncated' note.
2 [MINOR] budget/budget.go 146 EstTokens does not account for additional constant overhead added in buildResult (section headers, code fences, and the trimming note). This slightly underestimates the final token count.
3 [MINOR] budget/budget.go 75 If baseTokens (SystemBase + truncated UserMeta) still exceed the model limit after the fixed 4000-byte truncation, available is set to 0 and no further reduction is attempted. This can lead to final tokens exceeding the limit in extreme cases.
4 [NIT] budget/budget.go 155 buildResult uses a single-letter parameter name 's' for Sections. Repository conventions recommend meaningful names outside loop indices; consider 'sec' or 'sections' for clarity.

Recommendation

APPROVE — Overall, the feature is well-designed and tested, and CI has passed. Recommend merging with minor tweaks: (1) compute the 'removed' tokens message using effectiveBudget to avoid negative or misleading counts; (2) include a small fixed overhead in EstTokens for the static wrappers (headers, fences, and trimming note) or subtract it from reserveTokens; (3) in Fit, if baseTokens still exceed the limit after initial UserMeta truncation, iteratively reduce UserMeta further to ensure final tokens never exceed the limit even in pathological cases; (4) rename the 's' parameter in buildResult for readability. These are refinements and do not block merging.

## Summary Solid addition of a budget-aware context fitting system with good test coverage. The trimming logic and refactor to separate non-trimmable content are well-executed and CI passes. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MINOR] | `budget/budget.go` | 136 | The 'removed' token count for diff truncation uses diffBudget instead of effectiveBudget (which reserves space for the truncation marker). This can produce negative or misleading values in the 'diff truncated' note. | | 2 | [MINOR] | `budget/budget.go` | 146 | EstTokens does not account for additional constant overhead added in buildResult (section headers, code fences, and the trimming note). This slightly underestimates the final token count. | | 3 | [MINOR] | `budget/budget.go` | 75 | If baseTokens (SystemBase + truncated UserMeta) still exceed the model limit after the fixed 4000-byte truncation, available is set to 0 and no further reduction is attempted. This can lead to final tokens exceeding the limit in extreme cases. | | 4 | [NIT] | `budget/budget.go` | 155 | buildResult uses a single-letter parameter name 's' for Sections. Repository conventions recommend meaningful names outside loop indices; consider 'sec' or 'sections' for clarity. | ## Recommendation **APPROVE** — Overall, the feature is well-designed and tested, and CI has passed. Recommend merging with minor tweaks: (1) compute the 'removed' tokens message using effectiveBudget to avoid negative or misleading counts; (2) include a small fixed overhead in EstTokens for the static wrappers (headers, fences, and trimming note) or subtract it from reserveTokens; (3) in Fit, if baseTokens still exceed the limit after initial UserMeta truncation, iteratively reduce UserMeta further to ensure final tokens never exceed the limit even in pathological cases; (4) rename the 's' parameter in buildResult for readability. These are refinements and do not block merging.
sonnet-review-bot approved these changes 2026-05-02 02:02:09 +00:00
Dismissed
sonnet-review-bot left a comment
First-time contributor

Summary

Well-structured addition of a budget package with sensible trimming order, clear separation of trimmable/essential prompt sections, and thorough tests. CI passed and the integration into main is clean and backwards-compatible via deprecated helpers.

Findings

# Severity File Line Finding
1 [MINOR] budget/budget.go 19 modelLimits relies on manual longest-prefix-first ordering. Consider enforcing this by sorting by prefix length or adding a test to prevent regressions when adding new models.
2 [MINOR] budget/budget.go 214 truncateUTF8 uses byte-boundary checks; while logic appears correct, there are no tests with multi-byte UTF-8 content. Add tests covering multi-byte characters (e.g., emojis, CJK) to ensure correctness under non-ASCII input.
3 [NIT] budget/budget.go 131 The message 'diff truncated (~%dK tokens removed)' uses diffBudget rather than effectiveBudget (which accounts for the truncation marker). This may slightly overstate the reported removed tokens.
4 [NIT] budget/budget.go 201 A diff fenced block is always emitted even if the diff ends up empty. Consider skipping the code fence when Diff is empty to avoid an empty block.

Recommendation

APPROVE — The implementation is solid: token estimation, deterministic model limit matching, and progressive trimming align with the stated trim order. The system/user prompt split is clean and main.go integration respects timeouts and logging. CI is green and tests cover key paths, including over-budget scenarios and ensuring final estimates remain within limits. To further improve robustness, consider: (1) enforcing or testing model limit ordering so new entries don’t accidentally match broader prefixes; (2) adding UTF-8 multi-byte truncation tests; and (3) polishing minor UX details in truncation reporting and empty diff handling. None of these are blockers — approve as is.

## Summary Well-structured addition of a budget package with sensible trimming order, clear separation of trimmable/essential prompt sections, and thorough tests. CI passed and the integration into main is clean and backwards-compatible via deprecated helpers. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MINOR] | `budget/budget.go` | 19 | modelLimits relies on manual longest-prefix-first ordering. Consider enforcing this by sorting by prefix length or adding a test to prevent regressions when adding new models. | | 2 | [MINOR] | `budget/budget.go` | 214 | truncateUTF8 uses byte-boundary checks; while logic appears correct, there are no tests with multi-byte UTF-8 content. Add tests covering multi-byte characters (e.g., emojis, CJK) to ensure correctness under non-ASCII input. | | 3 | [NIT] | `budget/budget.go` | 131 | The message 'diff truncated (~%dK tokens removed)' uses diffBudget rather than effectiveBudget (which accounts for the truncation marker). This may slightly overstate the reported removed tokens. | | 4 | [NIT] | `budget/budget.go` | 201 | A diff fenced block is always emitted even if the diff ends up empty. Consider skipping the code fence when Diff is empty to avoid an empty block. | ## Recommendation **APPROVE** — The implementation is solid: token estimation, deterministic model limit matching, and progressive trimming align with the stated trim order. The system/user prompt split is clean and main.go integration respects timeouts and logging. CI is green and tests cover key paths, including over-budget scenarios and ensuring final estimates remain within limits. To further improve robustness, consider: (1) enforcing or testing model limit ordering so new entries don’t accidentally match broader prefixes; (2) adding UTF-8 multi-byte truncation tests; and (3) polishing minor UX details in truncation reporting and empty diff handling. None of these are blockers — approve as is.
gpt-review-bot approved these changes 2026-05-02 02:02:25 +00:00
Dismissed
gpt-review-bot left a comment
First-time contributor

Summary

The new budget package and prompt refactors are well-structured, tests cover trimming paths, and CI passed. A few minor issues and suggestions around heuristics, magic constants, and a subtle UTF-8 boundary check are noted below.

Findings

# Severity File Line Finding
1 [MINOR] budget/budget.go 87 UserMeta truncation uses a magic byte limit (4000) via len(sections.UserMeta) > 4000 and truncateUTF8(..., 4000). This mixes bytes and the token heuristic (tokens ≈ len/4) and the constant is unexplained; consider documenting the choice or using a token-based truncation for consistency.
2 [MINOR] budget/budget.go 43 EstimateTokens uses a coarse heuristic (len(s)/4). This is acceptable as a conservative estimate, but it may under/over-estimate for some languages. Consider adding a comment that this is intentionally approximate and may be replaced with a real tokenizer if available.
3 [NIT] budget/budget.go 215 truncateUTF8 indexes s[maxBytes] in the loop (isUTF8Start(s[maxBytes])). The current logic is safe because the function returns early when len(s) <= maxBytes, but it's a subtle boundary condition. Consider adding an explanatory comment or rewriting the loop to make the invariants explicit (e.g., using r := []byte(s) or checking maxBytes-1).
4 [MINOR] budget/budget.go 21 modelLimits relies on ordering for longest-prefix-first matching. While the comment notes this, adding a test or sorting/validation at init time would make this more robust if the list is later modified.
5 [NIT] cmd/review-bot/main.go 142 The main flow now logs token estimates using integer division (/1000). This is fine, but the log might be clearer with explicit units (e.g., '≈ %dK tokens') or using floats to avoid truncation in the log message.

Recommendation

APPROVE — Approve: The implementation is correct, tests are comprehensive, and CI passed. The suggestions above are low-risk improvements: document the 4000-byte truncation choice or perform truncation in token units; clarify the EstimateTokens heuristic and add a comment that it's intentionally approximate; add a clarifying comment in truncateUTF8 to make the boundary logic explicit; and optionally add validation for the modelLimits ordering. None of these prevent merging, but addressing them will improve clarity and robustness.

## Summary The new budget package and prompt refactors are well-structured, tests cover trimming paths, and CI passed. A few minor issues and suggestions around heuristics, magic constants, and a subtle UTF-8 boundary check are noted below. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MINOR] | `budget/budget.go` | 87 | UserMeta truncation uses a magic byte limit (4000) via len(sections.UserMeta) > 4000 and truncateUTF8(..., 4000). This mixes bytes and the token heuristic (tokens ≈ len/4) and the constant is unexplained; consider documenting the choice or using a token-based truncation for consistency. | | 2 | [MINOR] | `budget/budget.go` | 43 | EstimateTokens uses a coarse heuristic (len(s)/4). This is acceptable as a conservative estimate, but it may under/over-estimate for some languages. Consider adding a comment that this is intentionally approximate and may be replaced with a real tokenizer if available. | | 3 | [NIT] | `budget/budget.go` | 215 | truncateUTF8 indexes s[maxBytes] in the loop (isUTF8Start(s[maxBytes])). The current logic is safe because the function returns early when len(s) <= maxBytes, but it's a subtle boundary condition. Consider adding an explanatory comment or rewriting the loop to make the invariants explicit (e.g., using r := []byte(s) or checking maxBytes-1). | | 4 | [MINOR] | `budget/budget.go` | 21 | modelLimits relies on ordering for longest-prefix-first matching. While the comment notes this, adding a test or sorting/validation at init time would make this more robust if the list is later modified. | | 5 | [NIT] | `cmd/review-bot/main.go` | 142 | The main flow now logs token estimates using integer division (/1000). This is fine, but the log might be clearer with explicit units (e.g., '≈ %dK tokens') or using floats to avoid truncation in the log message. | ## Recommendation **APPROVE** — Approve: The implementation is correct, tests are comprehensive, and CI passed. The suggestions above are low-risk improvements: document the 4000-byte truncation choice or perform truncation in token units; clarify the EstimateTokens heuristic and add a comment that it's intentionally approximate; add a clarifying comment in truncateUTF8 to make the boundary logic explicit; and optionally add validation for the modelLimits ordering. None of these prevent merging, but addressing them will improve clarity and robustness.
gpt-review-bot approved these changes 2026-05-02 02:05:07 +00:00
Dismissed
gpt-review-bot left a comment
First-time contributor

Summary

The PR cleanly adds a budget package to trim LLM prompt context and integrates it into prompt assembly. Tests and CI pass; the implementation is sensible and well-tested. I have a few minor suggestions to improve robustness and clarity (UTF-8 truncation, token-estimate documentation, and an edge-case when untrimmable base exceeds the budget).

Findings

# Severity File Line Finding
1 [MINOR] budget/budget.go 200 truncateUTF8 implements UTF-8-safe truncation by scanning bytes but uses a manual loop over byte indices. This approach is fragile/less readable than using the standard unicode/utf8 helpers (e.g. utf8.DecodeLastRuneInString or utf8.RuneStart). Consider switching to the unicode/utf8 package to make intent and correctness obvious and avoid subtle off-by-one issues.
2 [MINOR] budget/budget.go 35 EstimateTokens uses a fixed heuristic (len(s) / 4). This is acceptable as a simple approximation, but it's worth documenting its limitations (languages with multibyte scripts, code with lots of symbols, or tokenizers used by specific LLMs). Consider making the heuristic configurable or adding a comment that this is intentionally conservative and approximate.
3 [MINOR] budget/budget.go 72 Fit treats SystemBase as permanently non-trimmable. If SystemBase alone exceeds the model budget (after reserve), the function only truncates UserMeta and otherwise leaves available=0. There is no explicit error or warning when the non-trimmable base is larger than the budget; this can produce a Result whose EstTokens still exceeds the limit. Consider returning an explicit indicator/error or logging that the system prompt is too large and cannot be reduced.
4 [MINOR] budget/budget.go 111 Trimmed summary messages format token counts with integer thousands ("~%dK tokens"). This is fine but imprecise. If you plan to expose these numbers to users or logs, consider reporting absolute token counts or formatting more precisely (e.g. "%d tokens"), or document that these are rough estimates.
5 [NIT] review/prompt.go 7 BuildSystemPrompt and BuildUserPrompt are retained as deprecated wrappers and tests updated to use the new BuildSystemBase/BuildUserMeta. This is reasonable for backward compatibility — consider marking these functions with a go doc 'Deprecated:' directive (you already added a comment) and/or planning their removal in a follow-up to keep the public API tidy.
6 [NIT] budget/budget.go 161 The UTF-8 handling helper isUTF8Start uses magic constants (0xC0/0x80). While correct, consider calling out intent in a short comment or using utf8.RuneStart for readability.
7 [NIT] .gitea/workflows/ci.yml 1 The CI workflow was updated to set LLM_TIMEOUT in the matrix run environment. Looks fine; consider documenting the timeout relation to the tool's overall LLM timeout flag to avoid confusion.

Recommendation

APPROVE — Approve. The new budget package is a solid addition: it cleanly separates trimmable content from core instructions, integrates with the existing prompt builders, and is covered by comprehensive unit tests. Before shipping, I recommend small improvements: (1) replace the custom UTF-8 truncation with the standard unicode/utf8 utilities (safer and clearer), (2) document the token-estimation heuristic and consider making it configurable, and (3) add an explicit handling/notice when the non-trimmable SystemBase exceeds the model budget (so callers can be alerted rather than silently producing an oversized prompt). These are low-risk enhancements — they do not block merging but will make the feature more robust and maintainable.

## Summary The PR cleanly adds a budget package to trim LLM prompt context and integrates it into prompt assembly. Tests and CI pass; the implementation is sensible and well-tested. I have a few minor suggestions to improve robustness and clarity (UTF-8 truncation, token-estimate documentation, and an edge-case when untrimmable base exceeds the budget). ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MINOR] | `budget/budget.go` | 200 | truncateUTF8 implements UTF-8-safe truncation by scanning bytes but uses a manual loop over byte indices. This approach is fragile/less readable than using the standard unicode/utf8 helpers (e.g. utf8.DecodeLastRuneInString or utf8.RuneStart). Consider switching to the unicode/utf8 package to make intent and correctness obvious and avoid subtle off-by-one issues. | | 2 | [MINOR] | `budget/budget.go` | 35 | EstimateTokens uses a fixed heuristic (len(s) / 4). This is acceptable as a simple approximation, but it's worth documenting its limitations (languages with multibyte scripts, code with lots of symbols, or tokenizers used by specific LLMs). Consider making the heuristic configurable or adding a comment that this is intentionally conservative and approximate. | | 3 | [MINOR] | `budget/budget.go` | 72 | Fit treats SystemBase as permanently non-trimmable. If SystemBase alone exceeds the model budget (after reserve), the function only truncates UserMeta and otherwise leaves available=0. There is no explicit error or warning when the non-trimmable base is larger than the budget; this can produce a Result whose EstTokens still exceeds the limit. Consider returning an explicit indicator/error or logging that the system prompt is too large and cannot be reduced. | | 4 | [MINOR] | `budget/budget.go` | 111 | Trimmed summary messages format token counts with integer thousands ("~%dK tokens"). This is fine but imprecise. If you plan to expose these numbers to users or logs, consider reporting absolute token counts or formatting more precisely (e.g. "%d tokens"), or document that these are rough estimates. | | 5 | [NIT] | `review/prompt.go` | 7 | BuildSystemPrompt and BuildUserPrompt are retained as deprecated wrappers and tests updated to use the new BuildSystemBase/BuildUserMeta. This is reasonable for backward compatibility — consider marking these functions with a go doc 'Deprecated:' directive (you already added a comment) and/or planning their removal in a follow-up to keep the public API tidy. | | 6 | [NIT] | `budget/budget.go` | 161 | The UTF-8 handling helper isUTF8Start uses magic constants (0xC0/0x80). While correct, consider calling out intent in a short comment or using utf8.RuneStart for readability. | | 7 | [NIT] | `.gitea/workflows/ci.yml` | 1 | The CI workflow was updated to set LLM_TIMEOUT in the matrix run environment. Looks fine; consider documenting the timeout relation to the tool's overall LLM timeout flag to avoid confusion. | ## Recommendation **APPROVE** — Approve. The new budget package is a solid addition: it cleanly separates trimmable content from core instructions, integrates with the existing prompt builders, and is covered by comprehensive unit tests. Before shipping, I recommend small improvements: (1) replace the custom UTF-8 truncation with the standard unicode/utf8 utilities (safer and clearer), (2) document the token-estimation heuristic and consider making it configurable, and (3) add an explicit handling/notice when the non-trimmable SystemBase exceeds the model budget (so callers can be alerted rather than silently producing an oversized prompt). These are low-risk enhancements — they do not block merging but will make the feature more robust and maintainable.
rodin added 1 commit 2026-05-02 02:06:21 +00:00
fix: CI config - correct patterns path, increase timeout
CI / test (pull_request) Successful in 13s
CI / review (gpt-4.1, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 21s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 2m18s
565a077b01
- PATTERNS_FILES: docs/ does not exist in go-patterns, use patterns/
- LLM_TIMEOUT: 600s (gpt-5-mini needs more time for larger diffs)
rodin force-pushed fix/19-context-overflow from e7bbd55dae to 565a077b01 2026-05-02 02:06:21 +00:00 Compare
gpt-review-bot approved these changes 2026-05-02 02:06:57 +00:00
Dismissed
gpt-review-bot left a comment
First-time contributor

Summary

The pull request adds a robust context budget system for prompt assembly, ensuring LLM input safety. Implementation follows idiomatic Go patterns, includes thorough unit tests, and integrates seamlessly into the main pipeline.

Recommendation

APPROVE — This PR provides a well-designed budgeting mechanism that prevents LLM prompt overruns and progressively degrades non-essential context. The approach is idiomatic, tested, and easily extensible to other LLMs. No major or minor issues were found. All CI checks have passed. Merge as-is.

## Summary The pull request adds a robust context budget system for prompt assembly, ensuring LLM input safety. Implementation follows idiomatic Go patterns, includes thorough unit tests, and integrates seamlessly into the main pipeline. ## Recommendation **APPROVE** — This PR provides a well-designed budgeting mechanism that prevents LLM prompt overruns and progressively degrades non-essential context. The approach is idiomatic, tested, and easily extensible to other LLMs. No major or minor issues were found. All CI checks have passed. Merge as-is.
sonnet-review-bot approved these changes 2026-05-02 02:08:55 +00:00
Dismissed
sonnet-review-bot left a comment
First-time contributor

Summary

Solid addition of a context budgeting system with sensible trim order, clean integration, and good test coverage. CI is green and the changes are idiomatic and cohesive.

Findings

# Severity File Line Finding
1 [MINOR] budget/budget.go 35 EstimateTokens comment says '~4 characters per token' but the implementation divides byte length by 4. Consider clarifying the comment to reflect bytes-per-token or using utf8.RuneCountInString if you truly intend characters.
2 [NIT] budget/budget.go 193 truncateUTF8 manually checks UTF-8 boundaries using a custom isUTF8Start helper. Consider using the stdlib utf8.RuneStart (or related helpers) for clarity and to align with standard practices.
3 [NIT] budget/budget.go 171 buildResult always emits a fenced diff block, even when the Diff is empty. Consider omitting the section entirely if Diff is empty to reduce prompt noise.

Recommendation

APPROVE — The budgeting logic is well thought out, the trim order aligns with stated priorities, and tests cover the primary paths (including trimming and diff truncation). The CI workflow adjustments are appropriate. Minor polish recommendations: clarify the EstimateTokens comment to reflect byte-based estimation, consider using utf8.RuneStart in truncateUTF8 for clarity, and optionally skip the diff block when empty. Otherwise, this is good to merge.

## Summary Solid addition of a context budgeting system with sensible trim order, clean integration, and good test coverage. CI is green and the changes are idiomatic and cohesive. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MINOR] | `budget/budget.go` | 35 | EstimateTokens comment says '~4 characters per token' but the implementation divides byte length by 4. Consider clarifying the comment to reflect bytes-per-token or using utf8.RuneCountInString if you truly intend characters. | | 2 | [NIT] | `budget/budget.go` | 193 | truncateUTF8 manually checks UTF-8 boundaries using a custom isUTF8Start helper. Consider using the stdlib utf8.RuneStart (or related helpers) for clarity and to align with standard practices. | | 3 | [NIT] | `budget/budget.go` | 171 | buildResult always emits a fenced diff block, even when the Diff is empty. Consider omitting the section entirely if Diff is empty to reduce prompt noise. | ## Recommendation **APPROVE** — The budgeting logic is well thought out, the trim order aligns with stated priorities, and tests cover the primary paths (including trimming and diff truncation). The CI workflow adjustments are appropriate. Minor polish recommendations: clarify the EstimateTokens comment to reflect byte-based estimation, consider using utf8.RuneStart in truncateUTF8 for clarity, and optionally skip the diff block when empty. Otherwise, this is good to merge.
rodin added 1 commit 2026-05-02 02:36:44 +00:00
fix: address final review findings
CI / test (pull_request) Successful in 13s
CI / review (gpt-4.1, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 21s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 1m30s
8b8462bdc8
- Comment: "~4 characters" → "~4 bytes" (len() counts bytes, not runes)
- Use utf8.RuneStart from stdlib instead of custom isUTF8Start helper
- Skip diff block entirely when Diff is empty (handles edge cases:
  draft→ready with no delta, force-push matching base, etc.)
gpt-review-bot approved these changes 2026-05-02 02:37:18 +00:00
Dismissed
gpt-review-bot left a comment
First-time contributor

Summary

The PR introduces a well-designed context budget system for LLM prompt assembly, with correct trimming logic and thorough test coverage. All changes adhere to repository conventions and Go idioms.

Recommendation

APPROVE — The context budget system (budget package) is robust, testable, and modular. The main prompt assembly flow is now token-aware, prioritizes important information, and never risks LLM context window overflows. All critical paths and edge trimming scenarios are tested. The refactor respects existing boundaries (logic separated from presentation), and CI passes. Merging is recommended.

## Summary The PR introduces a well-designed context budget system for LLM prompt assembly, with correct trimming logic and thorough test coverage. All changes adhere to repository conventions and Go idioms. ## Recommendation **APPROVE** — The context budget system (`budget` package) is robust, testable, and modular. The main prompt assembly flow is now token-aware, prioritizes important information, and never risks LLM context window overflows. All critical paths and edge trimming scenarios are tested. The refactor respects existing boundaries (logic separated from presentation), and CI passes. Merging is recommended.
sonnet-review-bot approved these changes 2026-05-02 02:38:29 +00:00
Dismissed
sonnet-review-bot left a comment
First-time contributor

Summary

Well-structured addition of a budget package with sensible trim order and thorough tests. Integration into main and prompt refactor look clean and idiomatic, and CI has passed.

Findings

# Severity File Line Finding
1 [MINOR] budget/budget.go 62 Sections comment states UserMeta is "never trimmed", but Fit() truncates UserMeta when base content exceeds the budget. Update the comment to reflect actual behavior or adjust logic to truly never trim.
2 [MINOR] budget/budget.go 140 When truncating the diff, if diffBudget < markerBudget, the truncation marker itself can push the final estimated tokens over the limit. Consider omitting the marker when there's no budget for it, or trimming an additional small section to ensure final EstTokens ≤ limit.
3 [NIT] cmd/review-bot/main.go 217 Variable name "filepath" in the fetchPatterns loop may be confusing since it shares a name with the standard library package. Consider renaming to "path" or "filePath" for clarity.

Recommendation

APPROVE — The new budgeting system is a solid improvement with clear trim semantics and good test coverage across scenarios (patterns/conventions trimming, diff truncation, and large UserMeta). The integration in main.go is clean and the prompt refactor properly separates non-trimmable base content. Address the minor inconsistencies and edge case noted: (1) update the Sections.UserMeta comment to reflect that it may be truncated when base content alone would exceed the budget (or change the implementation to match the "never trimmed" promise), and (2) ensure that when remaining diff budget is smaller than the truncation marker, the final estimated tokens cannot exceed the limit (e.g., skip the marker in that case or trim an additional small section). The variable naming nit in fetchPatterns is optional. With these small adjustments, the change set is ready to merge.

## Summary Well-structured addition of a budget package with sensible trim order and thorough tests. Integration into main and prompt refactor look clean and idiomatic, and CI has passed. ## Findings | # | Severity | File | Line | Finding | |---|----------|------|------|--------| | 1 | [MINOR] | `budget/budget.go` | 62 | Sections comment states UserMeta is "never trimmed", but Fit() truncates UserMeta when base content exceeds the budget. Update the comment to reflect actual behavior or adjust logic to truly never trim. | | 2 | [MINOR] | `budget/budget.go` | 140 | When truncating the diff, if diffBudget < markerBudget, the truncation marker itself can push the final estimated tokens over the limit. Consider omitting the marker when there's no budget for it, or trimming an additional small section to ensure final EstTokens ≤ limit. | | 3 | [NIT] | `cmd/review-bot/main.go` | 217 | Variable name "filepath" in the fetchPatterns loop may be confusing since it shares a name with the standard library package. Consider renaming to "path" or "filePath" for clarity. | ## Recommendation **APPROVE** — The new budgeting system is a solid improvement with clear trim semantics and good test coverage across scenarios (patterns/conventions trimming, diff truncation, and large UserMeta). The integration in main.go is clean and the prompt refactor properly separates non-trimmable base content. Address the minor inconsistencies and edge case noted: (1) update the Sections.UserMeta comment to reflect that it may be truncated when base content alone would exceed the budget (or change the implementation to match the "never trimmed" promise), and (2) ensure that when remaining diff budget is smaller than the truncation marker, the final estimated tokens cannot exceed the limit (e.g., skip the marker in that case or trim an additional small section). The variable naming nit in fetchPatterns is optional. With these small adjustments, the change set is ready to merge.
rodin added 1 commit 2026-05-02 03:02:37 +00:00
fix: address review findings (comment, marker budget, naming)
CI / test (pull_request) Successful in 13s
CI / review (gpt-4.1, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 22s
CI / review (gpt-5, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 1m48s
75190d53ed
- UserMeta comment: "never trimmed" → "truncated only if base exceeds budget"
- Skip diff truncation marker when diffBudget < markerBudget (prevents
  marker itself from pushing EstTokens over the limit)
- Rename filepath → filePath to avoid shadowing stdlib package name
gpt-review-bot approved these changes 2026-05-02 03:03:13 +00:00
gpt-review-bot left a comment
First-time contributor

Summary

This PR introduces a robust context window budgeting system for LLM prompt construction, with well-structured and well-tested logic and no regressions to existing functionality. All changes follow Go idioms and explicit repository conventions.

Recommendation

APPROVE — The PR creates a modular budget package that accurately estimates token usage and progressively trims less important context sections while preserving essential content for LLM prompts. The trimming strategy is well implemented and thoroughly tested, with clear prioritization and test coverage for all key trim scenarios. Refactors to prompt assembly separate non-trimmable from trimmable content cleanly, allowing easy extension. All CI checks pass and no style, correctness, or design issues are present. I recommend merging as-is.

## Summary This PR introduces a robust context window budgeting system for LLM prompt construction, with well-structured and well-tested logic and no regressions to existing functionality. All changes follow Go idioms and explicit repository conventions. ## Recommendation **APPROVE** — The PR creates a modular `budget` package that accurately estimates token usage and progressively trims less important context sections while preserving essential content for LLM prompts. The trimming strategy is well implemented and thoroughly tested, with clear prioritization and test coverage for all key trim scenarios. Refactors to prompt assembly separate non-trimmable from trimmable content cleanly, allowing easy extension. All CI checks pass and no style, correctness, or design issues are present. I recommend merging as-is.
sonnet-review-bot approved these changes 2026-05-02 03:04:40 +00:00
sonnet-review-bot left a comment
First-time contributor

Summary

Well-scoped introduction of a budget-aware prompt assembly layer with good separation of non-trimmable vs trimmable content and comprehensive tests. CI is green, and the refactor integrates cleanly into main without disrupting existing behavior.

Recommendation

APPROVE — The new budget package is thoughtfully designed and documented, integrates safely into the prompt-building flow, and includes solid unit tests covering key trim paths. The CI workflow updates look correct, and the small refactors in review/prompt.go keep backward compatibility via deprecated helpers while enabling context-aware assembly. Consider future enhancements such as making model limits configurable and slightly biasing token estimation toward overestimation (e.g., ceiling division) to further reduce any risk of overfilling the context window, but there are no blocking issues for this change.

## Summary Well-scoped introduction of a budget-aware prompt assembly layer with good separation of non-trimmable vs trimmable content and comprehensive tests. CI is green, and the refactor integrates cleanly into main without disrupting existing behavior. ## Recommendation **APPROVE** — The new budget package is thoughtfully designed and documented, integrates safely into the prompt-building flow, and includes solid unit tests covering key trim paths. The CI workflow updates look correct, and the small refactors in review/prompt.go keep backward compatibility via deprecated helpers while enabling context-aware assembly. Consider future enhancements such as making model limits configurable and slightly biasing token estimation toward overestimation (e.g., ceiling division) to further reduce any risk of overfilling the context window, but there are no blocking issues for this change.
rodin merged commit aee903caa2 into main 2026-05-02 03:07:17 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: rodin/review-bot#20