Implement the github package client with Retry-After header parsing that
supports both integer seconds (e.g. "Retry-After: 120") and HTTP-date
format (e.g. "Retry-After: Thu, 01 Dec 2025 16:00:00 GMT") per RFC 7231
§7.1.3.
Key design decisions:
- Use http.ParseTime which handles RFC 1123, RFC 850, and ASCTIME formats
- Cap maximum retry delay at 60s (maxRetryAfter) to prevent stalling
- If HTTP-date is in the past, use delay of 0 (retry immediately)
- Inject time.Now via c.now field for deterministic testing
Closes#94
Update the approved dependency table to document go-yaml subpackage
usage (ast, parser) and remove the deviation comment now that the
proper allowlist process is being followed.
Closes#91
- Clarify depth-aware short-circuit comment to unambiguously describe
the relationship between current depth and previous validation depth
- Add comment to MappingValueNode case explaining intentional depth+2
behavior from parent MappingNode perspective
- Restructure unmarshalYAMLWithDepthLimit doc comment as bullet list
covering all three safety checks (depth, multi-doc, strict fields)
- Replace t.Error with t.Fatal in TestYAMLEmptyFileRejection to remove
redundant nil guard on subsequent err.Error() call
- Add explicit case for *ast.MergeKeyNode in checkYAMLDepth switch to
make it clear this is an intentional leaf (no children to recurse)
rather than relying on the default case. Prevents future library
changes from silently bypassing depth checks.
- Add MaxPersonaFileSize bound check at the top of ParsePersonaBytes.
While callers already check size, the public API should defend itself
(defense in depth) against arbitrarily large inputs that could cause
excessive memory/CPU before AST validation runs.
- Add tests for both behaviors.
Addresses review #2879 findings.
- Add safety note on Strict() decoder not expanding aliases recursively,
since alias resolution uses the pre-validated AST (finding #1)
- Document that ast.Node map keys rely on pointer identity, which holds
because all goccy/go-yaml AST types are pointer receivers (finding #2)
- Clarify AnchorNode comment: effective depth budget is reduced for
anchor+alias pairs, not literally halved (finding #3)
- Improve test depth trace comment for accuracy (finding #4)
- Add HTML comment in CONVENTIONS.md referencing #91 for the two-step
process deviation (finding #5)
- Fix validated map comment: says 'minimum depth' but stores the maximum
depth at which a node was validated (overwritten on deeper visits).
- Replace dec.More() with explicit dec.Decode check for trailing JSON
content. More() is documented for use inside arrays/objects; the
explicit EOF check is clearer at the top-level stream.
MINOR fixes:
- docs/DESIGN-57-yaml-persona.md: fix Error Cases table entry to reflect
custom AST walk (checkYAMLDepth) instead of stale library-level reference
- review/persona.go: add EOF check after JSON decode to reject trailing
garbage after a valid JSON object (prevents silent acceptance of malformed
input like '{"name":"x"}garbage')
- review/persona_test.go: add TestJSONTrailingContentRejected test
NIT fixes:
- review/persona.go: add default case to checkYAMLDepth switch with
explanatory comment about scalar leaf nodes
- review/persona.go: document AnchorNode depth+1 conservative asymmetry
- review/persona.go: simplify redundant if-guard in ListBuiltinPersonas
- Move nodeCount increment after cycle detection to avoid over-counting
cyclic references (sonnet #2)
- Use underscores in test case names used as filenames (sonnet #3)
- Fix function comment: 'prevent silent data loss' → 'prevent confusing
behavior where additional documents are silently ignored' (sonnet #4)
- Mark design doc pseudocode as historical since implementation uses
goccy/go-yaml ast.Node, not gopkg.in/yaml.v3 yaml.Node (sonnet #5)
The global 'seen' set allowed anchored subtrees validated at a shallow
depth to be skipped when later referenced via alias at a greater depth.
This could let effective nesting exceed MaxYAMLDepth, enabling DoS.
Fix: replace the single 'seen' set with two tracking maps:
- validated (node -> min depth): only short-circuits when current depth
<= previously validated depth; re-checks at deeper contexts.
- visiting (node -> bool): per-path recursion stack for true cycle
detection (breaks alias loops without suppressing depth checks).
Add TestYAMLAliasDepthBypass that constructs a document with an
anchored 15-level subtree referenced via alias under 6 levels of
nesting, verifying the combined effective depth (22) is rejected.
Addresses security-review-bot findings on review #2774.
Fixes#87.
PR #58 incorrectly added gopkg.in/yaml.v3 (abandoned library) instead of
github.com/goccy/go-yaml as required by issue #57.
Changes:
- Replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml v1.19.2
- Update review/persona.go to use goccy/go-yaml API:
- parser.ParseBytes for AST-based depth/node count checking
- yaml.Strict() decoder option instead of KnownFields(true)
- ast.Node types instead of yaml.Node for tree walking
- Update review/persona_test.go to use ast types for cycle tests
- Remove gopkg.in/yaml.v3 from go.mod and go.sum
All existing YAML tests pass with the new library.
Add defensive check for empty Name and Path fields when unmarshaling
a single ContentEntry in the fallback path. While Gitea API won't
return empty objects for valid file paths, this guard:
- Explicitly documents the invariant we expect
- Catches potential API behavior changes early
- Costs nothing at runtime
Addresses [MINOR] from sonnet-review-bot on PR #74.
When ListContents is called with a path that points to a file (not a
directory), Gitea returns a single JSON object instead of an array.
Previously this caused json.Unmarshal to fail with:
json: cannot unmarshal object into Go value of type []gitea.ContentEntry
Now ListContents tries array unmarshal first, and falls back to single
object unmarshal, wrapping it in a slice. This allows patterns-files
config to specify individual files like 'README.md' without triggering
a parse error.
Also updates TestGetAllFilesInPath_File to reflect actual Gitea behavior
(single object response, not 404).
Fixes#73
Gitea API rejects "." with HTTP 500 (malformed path component).
When patterns-files is set to ".", normalize it to empty string
before making the API call.
Fixes#70
Move lastErr assignment outside the retry condition so that both
network errors and HTTP 5xx paths return lastErr consistently.
Previously, on the final retry attempt, a network error would return
the raw err variable instead of lastErr. While they held the same
value in practice, the inconsistency was confusing when reading the
code.
Now both paths:
- Network errors: assign lastErr before checking retry, return lastErr
- HTTP 5xx: assign lastErr before checking retry, return lastErr
Addresses review finding #3 (MINOR) from sonnet review on PR #69.
1. Fix non-deterministic test TestDoGet_RetriesOnTemporaryNetError:
- Replace timing-dependent listener approach with mockTransport
- mockTransport allows controlled injection of net.OpError failures
- Test now makes deterministic assertions: exactly 3 attempts (2 fail + 1 success)
- Added SetHTTPClient() method for test transport injection
2. Sanitize error content in retry warning logs:
- Added sanitizeErrorForLog() helper that omits response body content
- For APIError: logs only 'HTTP <status>' instead of full body
- For other errors: preserves error type information
- Addresses security concern about logging server error content at WARN level
- Full error with body still returned to caller for proper error handling
Both changes have corresponding test coverage.
Addresses security review finding: retry warnings were logging the full
request URL which could inadvertently leak sensitive query parameters
if future callers pass them.
Added redactURL() helper that:
- Strips query parameters from URLs before logging (replaces with [redacted])
- Returns [invalid URL] for unparseable URLs to avoid leaking any data
- Preserves the base path for debugging context
The error itself (lastErr) is kept as-is since APIError.Error() already
truncates response bodies to 200 chars, and network errors don't contain
user-controlled data.
Address review feedback on isTemporaryNetError being too broad:
1. RetryBackoff field: Added doc comment clarifying it must be
configured before the first request (addresses concurrency concern).
2. isTemporaryNetError: Now inspects the underlying syscall error
instead of treating all net.OpError as retriable. Only retries on:
- ECONNREFUSED (connection refused)
- ECONNRESET (connection reset)
- ENETUNREACH (network unreachable)
- EHOSTUNREACH (host unreachable)
- ETIMEDOUT (connection timed out)
Permanent errors like EACCES, EPERM are no longer retried.
3. DNS errors: Changed from Temporary() to IsTimeout, since
"no such host" is permanent and shouldn't be retried.
4. Empty backoff slice: Added comment explaining that retry without
delay is intentional when caller explicitly configures it.
Addresses MINOR findings from sonnet-review-bot and gpt-review-bot.
Address review feedback:
1. Make backoff delays injectable via Client.RetryBackoff field
- Defaults to {1s, 2s} when nil for production
- Tests can set shorter values for fast execution
- Fixes slow unit tests that previously waited 3+ seconds
2. Add retry on temporary network errors (net.OpError, net.DNSError)
- Connection refused, network unreachable, DNS failures now retry
- Non-temporary network errors still fail immediately
- Context cancellation still respected during backoff
Added isTemporaryNetError helper and TestIsTemporaryNetError test.
Updated existing retry tests to use configurable short backoffs.
- Remove dead backoff[0] element; array now only contains retry delays
- Fix time.After timer leak by using time.NewTimer with timer.Stop()
- Add io.LimitReader (64KB) for error body reads to bound memory allocation
Addresses feedback from sonnet-review-bot, security-review-bot, and gpt-review-bot.
The read:user scope is needed for the bot to self-request as a
reviewer on PRs. Without it, the bot still functions but cannot
add itself to the reviewer list.
Closes#66
When patterns-repo is configured, now logs at Info level:
- File paths loaded from each repo
- Count of files per repo
At Debug level logs skipped files (non-markdown/txt/yaml).
Warns if no pattern files were loaded from a repo (likely
misconfigured patterns-files path).
Closes#64
MAJOR:
- LoadRepoPersonas: add MaxPersonaFileSize check before parsing to
prevent resource exhaustion from oversized YAML files committed
to target repositories
MINOR:
- isNotFoundError: tighten substring match to 'HTTP 404' only to
avoid masking auth/transport errors containing generic 'not found'
- main.go: remove duplicate flag.Parse() call
- main.go: add comment explaining nil map indexing is safe in Go
when LoadRepoPersonas returns an error
Tests updated to reflect the intentional behavior change in
isNotFoundError and added test case for oversized file rejection.
Implements #60.
- Add ParsePersonaBytes() for parsing personas from byte data
- Add LoadRepoPersonas() to fetch personas from repo via Gitea API
- Add MergePersonas() to combine built-in and repo personas
- Add GetBuiltinPersonasMap() helper
- Update main.go to load repo personas first, fall back to built-in
- Add giteaClientAdapter to bridge gitea.Client to review.GiteaClient
When --persona is specified, the bot now:
1. Attempts to fetch personas from .review-bot/personas/*.yaml
2. If the named persona exists in the repo, uses it
3. Otherwise falls back to built-in personas
This allows repos to define domain-specific personas (e.g., trading
experts for gargoyle, crypto experts for kms-lite) without modifying
the review-bot codebase.
1. Remove dead JSON fallback in LoadBuiltinPersona
- The embed directive only includes *.yaml files
- JSON fallback code could never succeed
- Simplified function to only try YAML
2. JSON parsing now rejects unknown fields
- Switched from json.Unmarshal to json.Decoder
- DisallowUnknownFields() matches YAML's KnownFields(true)
- Added test coverage for JSON unknown field rejection
3. Documented symlink support in LoadPersona
- os.Stat follows symlinks, so symlinks to regular files work
- Added doc comment explaining the behavior
- Added test for symlink support
Address security review findings:
MAJOR: Add cycle detection to checkYAMLDepth using a visited set
(seen map[*yaml.Node]struct{}) to prevent infinite recursion from
crafted YAML with self-referential aliases.
MINOR fixes:
- Add MaxYAMLNodes (1000) limit as defense-in-depth against
wide-but-shallow structures that bypass depth limits
- Increment depth when following alias targets (was incorrectly
passing same depth, allowing alias chains to bypass depth limit)
- Reject multi-document YAML files instead of silently ignoring
additional documents (prevents confusing silent data loss)
Tests added:
- TestYAMLAliasCycleDetection: Direct test of cycle detection logic
- TestYAMLMultiDocumentRejection: Verifies multi-doc files rejected
- TestYAMLNodeCountLimit: Verifies wide structures are rejected
- TestCheckYAMLDepthCycleDetectionDirect: Unit test with artificial cycle
Addresses PR #58 MINOR finding: YAML decoder now rejects unknown fields.
- Enable KnownFields(true) on YAML decoder to catch typos like
'focuss' or 'identiy' in persona files
- Since yaml.Node.Decode() doesn't support KnownFields, we now
do a two-pass decode: first pass checks depth limits, second
pass decodes with strict field checking
- Add tests for unknown field rejection at top-level and nested levels
MAJOR fixes:
- Remove false security claim about gopkg.in/yaml.v3 having built-in depth protection
- Add explicit YAML depth limiting via yaml.Node API (MaxYAMLDepth=20)
- Add file size limit for persona files (MaxPersonaFileSize=64KB)
- Add test for deeply nested YAML rejection
MINOR fixes:
- Add sort.Strings to ListBuiltinPersonas for deterministic ordering
- Update design doc to reflect actual library used (gopkg.in/yaml.v3)
- Update README: 'Zero dependencies' → 'Minimal dependencies'
- Add test for file size limit
- Add test for sorted persona list
- Add gopkg.in/yaml.v3 dependency (approved in CONVENTIONS.md)
- Update parsePersona to detect format by file extension
- Support both .yaml and .yml extensions (case-insensitive)
- Convert built-in personas to YAML format
- Add comprehensive tests for YAML parsing
- Update README with YAML examples and documentation
YAML provides cleaner multi-line strings via literal block scalars
and supports comments, making persona definitions more readable.
JSON remains supported for backwards compatibility.
Closes#57
Addresses GPT review feedback:
1. MAJOR - Test deps now validated: All direct module deps (from go.mod)
are checked against the allowlist, whether used in prod or tests.
2. MINOR - Prefix match: Uses grep -E with word boundary (^pkg(/|$|$))
to avoid false positives on similarly-prefixed modules.
3. MINOR - Bash version check: Script now fails early with helpful
message if Bash < 4 (macOS default). Added shebang: #!/usr/bin/env bash
4. NIT - Removed redundant grep -v '_test' (go list -deps already
excludes test-only deps without -test flag).
Addresses review feedback:
1. MAJOR - Scope enforcement: Script now parses the Scope column and
ensures 'test only' packages don't appear in non-test code. Uses
'go list -deps' to check production imports.
2. MINOR - Portability: Replaced 'grep -P' (GNU-only) with awk-based
parsing that works on macOS/BSD.
3. MINOR - Robustness: Table parsing uses awk to split on '|' and
extract columns properly, handling whitespace variations.
4. MINOR - Glob safety: Prefix matching now uses parameter expansion
instead of glob patterns to prevent metacharacter issues.