Compare commits

...

47 Commits

Author SHA1 Message Date
aweiker dc2e1ca5de Merge pull request 'feat: reject cross-host redirects and HTTPS→HTTP downgrades (#95)' (#111) from review-bot-issue-95 into main
CI / test (push) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #111
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
Reviewed-by: Aaron Weiker <aaron@weiker.org>
2026-05-13 16:58:49 +00:00
claw 7de6fdd9ec fix: address review feedback on redirect policy
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 31s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 52s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m24s
- Replace Unicode arrows (→) with ASCII (->) in error messages and
  comments for log compatibility (gpt-review NITs #19626, #19628)
- Improve guard comment to clarify it exists for testability, not
  runtime safety (sonnet-review NIT #19619)
- Add cross-reference comments noting intentional duplication between
  gitea/client.go and github/client.go (sonnet-review #19618,
  gpt-review #19625, #19627)

Pushed back on:
- internal/ package for dedup: structural overhead not warranted for
  a single ~25-line function
- strings.EqualFold for scheme: Go's url.Parse normalizes schemes to
  lowercase, making case-insensitive comparison unnecessary
2026-05-13 09:28:46 -07:00
claw 1e0959b077 feat: reject cross-host redirects and HTTPS→HTTP downgrades (#95)
CI / test (pull_request) Successful in 16s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 37s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m19s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m47s
Add defaultCheckRedirect to both GitHub and Gitea clients that rejects:
- HTTPS→HTTP protocol downgrades (prevents plaintext leakage)
- Cross-host redirects entirely (prevents consuming untrusted responses)

Same-host, same-or-upgraded-scheme redirects remain allowed.

Both NewClient constructors wire the policy, and SetHTTPClient(nil)
restores it. Callers providing a non-nil client are responsible for
configuring their own safe redirect policy.

Closes #95
2026-05-13 08:51:36 -07:00
aweiker 67c3db70cb Merge pull request 'feat(github): support HTTP-date format in Retry-After header' (#110) from review-bot-issue-94 into main
CI / test (push) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #110
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
Reviewed-by: Aaron Weiker <aaron@weiker.org>
2026-05-13 15:34:25 +00:00
aweiker a845ce32eb Merge pull request 'feat(gitea): harden GetPullRequestDiff against unbounded diff size' (#109) from review-bot-issue-92 into main
CI / test (push) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #109
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
Reviewed-by: Aaron Weiker <aaron@weiker.org>
2026-05-13 14:01:22 +00:00
claw 49d6ca77a3 refactor(gitea): eliminate retry duplication, harden redactURL and MaxInt64 edge case
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 32s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m2s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m21s
- Extract doGetWithReader to share retry/backoff logic between doGet and
  doGetLimited, eliminating ~60 lines of duplicated code (addresses MINOR
  finding from all reviewers).
- redactURL now strips userinfo credentials (user:pass@host) in addition
  to query parameters (addresses security-review-bot finding).
- GetPullRequestDiff treats MaxDiffSize == math.MaxInt64 as disabled,
  preventing the silent enforcement bypass where the overflow clamp makes
  the size check unreachable (addresses security-review-bot finding).
- Improved error message wording: 'response exceeds N bytes' (NIT fix).
2026-05-13 13:39:37 +00:00
claw 6ebf66aefb fix(gitea): address review feedback on diff size limiting
- Add concurrency safety note to MaxDiffSize field documentation,
  mirroring the existing note on RetryBackoff
- Consolidate six individual test functions into a single table-driven
  test (TestGetPullRequestDiff_SizeLimits) reducing repetition
- Add //nolint:errcheck annotation to test handler w.Write calls
2026-05-13 13:39:37 +00:00
claw 004343d05f fix(gitea): address review findings — clamp overflow, clarify maxSize doc
- Clamp maxBytes+1 to prevent integer overflow to negative when
  maxBytes == math.MaxInt64 (falls back to math.MaxInt64)
- Update MaxDiffSize doc: 'any negative value' disables the limit,
  matching actual behavior of 'maxSize < 0' check
2026-05-13 13:39:37 +00:00
claw 92b84976cf feat(gitea): harden GetPullRequestDiff against unbounded diff size
Add a configurable MaxDiffSize field to Client that limits how much
data GetPullRequestDiff will read into memory. The default is 10 MB
(DefaultMaxDiffSize). When the diff exceeds the limit, ErrDiffTooLarge
is returned, allowing callers to skip position translation gracefully.

Implementation uses io.LimitReader to read maxBytes+1, detecting
overflow without buffering the entire response. Setting MaxDiffSize
to -1 disables the limit entirely.

Closes #92
2026-05-13 13:39:37 +00:00
claw 9f8e9aa8d3 fix: timer leak and http field shadowing in github client
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 46s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m57s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 2m17s
- Add timer.Stop() to the timer.C branch to prevent timer leak on the
  normal path (previously only called in ctx.Done branch)
- Rename struct field 'http' to 'httpClient' to avoid shadowing the
  net/http import

Addresses self-review findings on PR #110.
2026-05-13 06:22:30 -07:00
aweiker 881ce232eb Merge pull request 'docs(deps): update CONVENTIONS.md allowlist for go-yaml' (#108) from review-bot-issue-91 into main
CI / test (push) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #108
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
Reviewed-by: Aaron Weiker <aaron@weiker.org>
2026-05-13 13:16:40 +00:00
claw 31a28b1dd5 address review feedback: baseURL TODO, timer-cancel test, fast retry test, doc note
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 21s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 49s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m47s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 2m14s
- Add TODO comment on unused baseURL field explaining its intended
  future use by higher-level exported methods
- Rewrite TestDoRequest_ContextCanceled to actually exercise the
  timer-cancel path in the retry select (goroutine cancels context
  while client is blocked in timer.C select)
- Change Retry-After: 1 to Retry-After: 0 in IntegerSeconds test
  to avoid unnecessary 1s sleep during test runs
- Add doc note on SetRetryBackoff explaining that an empty non-nil
  slice silently drops Retry-After delays
2026-05-13 06:08:51 -07:00
claw e414471a16 fix(github): address review feedback on Retry-After implementation
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 23s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 43s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m21s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m22s
- Fix data race: copy retryBackoff slice per-request to prevent
  concurrent doRequest calls from racing on shared state
- Fix parseRetryAfter: trim whitespace before parsing for robustness
- Fix parseRetryAfter: treat delta-seconds of 0 as valid per RFC 7231
- Add bounded read on success path (10 MB limit) for defense-in-depth
- Clean up TestDoRequest_429_RetryAfter_IntegerSeconds: remove dead
  code block and commented-out reasoning
- Fix import ordering: context before errors (goimports compliance)
2026-05-13 05:54:06 -07:00
claw 41e1d48b54 feat(github): support HTTP-date format in Retry-After header
CI / test (pull_request) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 39s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m35s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m18s
Implement the github package client with Retry-After header parsing that
supports both integer seconds (e.g. "Retry-After: 120") and HTTP-date
format (e.g. "Retry-After: Thu, 01 Dec 2025 16:00:00 GMT") per RFC 7231
§7.1.3.

Key design decisions:
- Use http.ParseTime which handles RFC 1123, RFC 850, and ASCTIME formats
- Cap maximum retry delay at 60s (maxRetryAfter) to prevent stalling
- If HTTP-date is in the past, use delay of 0 (retry immediately)
- Inject time.Now via c.now field for deterministic testing

Closes #94
2026-05-13 05:10:52 -07:00
claw bf52fceea0 docs(deps): update CONVENTIONS.md allowlist for go-yaml
CI / test (pull_request) Successful in 20s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 20s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 48s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 51s
Update the approved dependency table to document go-yaml subpackage
usage (ast, parser) and remove the deviation comment now that the
proper allowlist process is being followed.

Closes #91
2026-05-13 02:56:06 -07:00
aweiker d722035629 Merge pull request 'fix(deps): replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml' (#89) from review-bot-issue-87 into main
CI / test (push) Successful in 20s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #89
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
Reviewed-by: Aaron Weiker <aaron@weiker.org>
2026-05-13 03:47:01 +00:00
claw b9b7be3b4e fix: address review #2888 findings (comment clarity, test cleanup)
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 37s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m0s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m11s
- Clarify depth-aware short-circuit comment to unambiguously describe
  the relationship between current depth and previous validation depth
- Add comment to MappingValueNode case explaining intentional depth+2
  behavior from parent MappingNode perspective
- Restructure unmarshalYAMLWithDepthLimit doc comment as bullet list
  covering all three safety checks (depth, multi-doc, strict fields)
- Replace t.Error with t.Fatal in TestYAMLEmptyFileRejection to remove
  redundant nil guard on subsequent err.Error() call
2026-05-12 19:06:52 -07:00
claw baa917f228 fix: handle MergeKeyNode explicitly in depth check, add size limit to ParsePersonaBytes
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 35s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 58s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m16s
- Add explicit case for *ast.MergeKeyNode in checkYAMLDepth switch to
  make it clear this is an intentional leaf (no children to recurse)
  rather than relying on the default case. Prevents future library
  changes from silently bypassing depth checks.

- Add MaxPersonaFileSize bound check at the top of ParsePersonaBytes.
  While callers already check size, the public API should defend itself
  (defense in depth) against arbitrarily large inputs that could cause
  excessive memory/CPU before AST validation runs.

- Add tests for both behaviors.

Addresses review #2879 findings.
2026-05-12 18:45:48 -07:00
claw b0352ba1c9 docs: address review findings on YAML depth validation
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 34s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m20s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 2m49s
- Add safety note on Strict() decoder not expanding aliases recursively,
  since alias resolution uses the pre-validated AST (finding #1)
- Document that ast.Node map keys rely on pointer identity, which holds
  because all goccy/go-yaml AST types are pointer receivers (finding #2)
- Clarify AnchorNode comment: effective depth budget is reduced for
  anchor+alias pairs, not literally halved (finding #3)
- Improve test depth trace comment for accuracy (finding #4)
- Add HTML comment in CONVENTIONS.md referencing #91 for the two-step
  process deviation (finding #5)
2026-05-12 17:39:38 -07:00
claw 0b16c4143a test: use per-subtest TempDir in TestYAMLEmptyFileRejection
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 44s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m42s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m9s
Move t.TempDir() inside each subtest for idiomatic test isolation,
as suggested by reviewers.
2026-05-12 15:22:27 -07:00
claw 493349e11a fix: correct comment accuracy and improve trailing-content check clarity
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 19s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 34s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m10s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m47s
- Fix validated map comment: says 'minimum depth' but stores the maximum
  depth at which a node was validated (overwritten on deeper visits).
- Replace dec.More() with explicit dec.Decode check for trailing JSON
  content. More() is documented for use inside arrays/objects; the
  explicit EOF check is clearer at the top-level stream.
2026-05-12 14:51:49 -07:00
claw 5cedeee9f4 address self-review findings on PR #89
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 39s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m12s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m33s
MINOR fixes:
- docs/DESIGN-57-yaml-persona.md: fix Error Cases table entry to reflect
  custom AST walk (checkYAMLDepth) instead of stale library-level reference
- review/persona.go: add EOF check after JSON decode to reject trailing
  garbage after a valid JSON object (prevents silent acceptance of malformed
  input like '{"name":"x"}garbage')
- review/persona_test.go: add TestJSONTrailingContentRejected test

NIT fixes:
- review/persona.go: add default case to checkYAMLDepth switch with
  explanatory comment about scalar leaf nodes
- review/persona.go: document AnchorNode depth+1 conservative asymmetry
- review/persona.go: simplify redundant if-guard in ListBuiltinPersonas
2026-05-12 14:42:22 -07:00
claw 01b6af03a8 fix(review): address review 2792 feedback
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 33s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m11s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m53s
- Document nodeCount overcounting as intentional conservative behavior
  (bounds total validation work, not unique nodes)
- Improve TestYAMLDeeplyNestedRejection comment with concrete depth trace
- Replace outdated gopkg.in/yaml.v3 pseudocode in design doc with
  reference to authoritative implementation
- Update PR description to clarify pre-approval via issue #57
2026-05-12 14:24:06 -07:00
claw 80091fb080 fix(review): address feedback from reviews 2788, 2789, 2791
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 23s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 39s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m45s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 2m7s
- Move nodeCount increment after cycle detection to avoid over-counting
  cyclic references (sonnet #2)
- Use underscores in test case names used as filenames (sonnet #3)
- Fix function comment: 'prevent silent data loss' → 'prevent confusing
  behavior where additional documents are silently ignored' (sonnet #4)
- Mark design doc pseudocode as historical since implementation uses
  goccy/go-yaml ast.Node, not gopkg.in/yaml.v3 yaml.Node (sonnet #5)
2026-05-12 14:13:59 -07:00
claw b5f17ddfc4 fix(security): prevent alias depth bypass in YAML validator
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 38s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m18s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m20s
The global 'seen' set allowed anchored subtrees validated at a shallow
depth to be skipped when later referenced via alias at a greater depth.
This could let effective nesting exceed MaxYAMLDepth, enabling DoS.

Fix: replace the single 'seen' set with two tracking maps:
- validated (node -> min depth): only short-circuits when current depth
  <= previously validated depth; re-checks at deeper contexts.
- visiting (node -> bool): per-path recursion stack for true cycle
  detection (breaks alias loops without suppressing depth checks).

Add TestYAMLAliasDepthBypass that constructs a document with an
anchored 15-level subtree referenced via alias under 6 levels of
nesting, verifying the combined effective depth (22) is rejected.

Addresses security-review-bot findings on review #2774.
2026-05-12 14:07:05 -07:00
rodin 144a36a2a7 docs: update DESIGN-57 to reflect goccy/go-yaml as the supported YAML library
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 15s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 31s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m19s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 2m57s
2026-05-12 20:52:37 +00:00
rodin 12f5f5a5e4 docs: update YAML library to github.com/goccy/go-yaml in CONVENTIONS.md
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 16s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 28s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m4s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 2m16s
2026-05-12 20:52:31 +00:00
claw 45d009dd06 fix(review): address review feedback on persona YAML handling
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 30s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m5s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m8s
- Reorder empty doc check before multi-doc check for natural flow
- Detect nil-body docs (whitespace-only, comment-only input)
- Add explanatory comment on pointer identity for cycle detection map
- Improve depth-counting test comment with AST walker specifics
- Add TestYAMLEmptyFileRejection covering empty/whitespace/comment inputs

Addresses MINOR and NIT findings from sonnet, gpt, and security reviews.
MAJOR (allowlist violation) tracked in issue #91.
2026-05-12 13:38:48 -07:00
claw 8991260333 fix(deps): replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml
CI / test (pull_request) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 46s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m38s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m49s
Fixes #87.

PR #58 incorrectly added gopkg.in/yaml.v3 (abandoned library) instead of
github.com/goccy/go-yaml as required by issue #57.

Changes:
- Replace gopkg.in/yaml.v3 with github.com/goccy/go-yaml v1.19.2
- Update review/persona.go to use goccy/go-yaml API:
  - parser.ParseBytes for AST-based depth/node count checking
  - yaml.Strict() decoder option instead of KnownFields(true)
  - ast.Node types instead of yaml.Node for tree walking
- Update review/persona_test.go to use ast types for cycle tests
- Remove gopkg.in/yaml.v3 from go.mod and go.sum

All existing YAML tests pass with the new library.
2026-05-12 13:27:30 -07:00
rodin 6f86e66943 fix(patterns): default patterns-files to empty (fetch all) (#77)
CI / test (push) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
2026-05-11 17:45:19 +00:00
aweiker 019b815280 Merge pull request 'fix(gitea): handle single-object response in ListContents' (#74) from issue-73 into main
CI / test (push) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #74
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
2026-05-11 15:38:59 +00:00
Rodin c27dfd0f08 fix(gitea): guard against empty response in ListContents fallback
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 22s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 26s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 40s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 58s
Add defensive check for empty Name and Path fields when unmarshaling
a single ContentEntry in the fallback path. While Gitea API won't
return empty objects for valid file paths, this guard:
- Explicitly documents the invariant we expect
- Catches potential API behavior changes early
- Costs nothing at runtime

Addresses [MINOR] from sonnet-review-bot on PR #74.
2026-05-11 07:47:03 -07:00
Rodin 1b6c37605f fix(gitea): handle single-object response in ListContents
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 26s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 35s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m9s
When ListContents is called with a path that points to a file (not a
directory), Gitea returns a single JSON object instead of an array.
Previously this caused json.Unmarshal to fail with:
  json: cannot unmarshal object into Go value of type []gitea.ContentEntry

Now ListContents tries array unmarshal first, and falls back to single
object unmarshal, wrapping it in a slice. This allows patterns-files
config to specify individual files like 'README.md' without triggering
a parse error.

Also updates TestGetAllFilesInPath_File to reflect actual Gitea behavior
(single object response, not 404).

Fixes #73
2026-05-11 07:21:15 -07:00
aweiker 036e96d9b7 Merge pull request 'fix(gitea): normalize "." path to empty string in ListContents' (#72) from issue-70 into main
CI / test (push) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #72
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
Reviewed-by: Aaron Weiker <aaron@weiker.org>
2026-05-11 14:16:22 +00:00
Rodin ea74f7e088 ci: use rodin/security-patterns with '.' path for security reviewer
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 23s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 46s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 52s
Tests the dot path normalization fix end-to-end.
2026-05-11 07:12:25 -07:00
Rodin e6b1840ffc fix(gitea): normalize "." path to empty string in ListContents
CI / test (pull_request) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 26s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 34s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 43s
Gitea API rejects "." with HTTP 500 (malformed path component).
When patterns-files is set to ".", normalize it to empty string
before making the API call.

Fixes #70
2026-05-11 06:24:05 -07:00
aweiker 1ca9250e4a Merge pull request 'feat(gitea): add retry logic for 5xx errors' (#69) from issue-68 into main
CI / test (push) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #69
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
Reviewed-by: Aaron Weiker <aaron@weiker.org>
2026-05-11 12:59:50 +00:00
Rodin 1b38e6ad00 fix(gitea): ensure consistent lastErr return for network errors
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 35s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 58s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m37s
Move lastErr assignment outside the retry condition so that both
network errors and HTTP 5xx paths return lastErr consistently.

Previously, on the final retry attempt, a network error would return
the raw err variable instead of lastErr. While they held the same
value in practice, the inconsistency was confusing when reading the
code.

Now both paths:
- Network errors: assign lastErr before checking retry, return lastErr
- HTTP 5xx: assign lastErr before checking retry, return lastErr

Addresses review finding #3 (MINOR) from sonnet review on PR #69.
2026-05-11 05:31:39 -07:00
Rodin 5498dccd60 fix(gitea): address MINOR review findings on retry logic
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 39s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 50s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m28s
1. Fix non-deterministic test TestDoGet_RetriesOnTemporaryNetError:
   - Replace timing-dependent listener approach with mockTransport
   - mockTransport allows controlled injection of net.OpError failures
   - Test now makes deterministic assertions: exactly 3 attempts (2 fail + 1 success)
   - Added SetHTTPClient() method for test transport injection

2. Sanitize error content in retry warning logs:
   - Added sanitizeErrorForLog() helper that omits response body content
   - For APIError: logs only 'HTTP <status>' instead of full body
   - For other errors: preserves error type information
   - Addresses security concern about logging server error content at WARN level
   - Full error with body still returned to caller for proper error handling

Both changes have corresponding test coverage.
2026-05-11 05:15:07 -07:00
Aaron Weiker ecfbfddc7c fix(gitea): redact query params from retry warning logs
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 34s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m14s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m17s
Addresses security review finding: retry warnings were logging the full
request URL which could inadvertently leak sensitive query parameters
if future callers pass them.

Added redactURL() helper that:
- Strips query parameters from URLs before logging (replaces with [redacted])
- Returns [invalid URL] for unparseable URLs to avoid leaking any data
- Preserves the base path for debugging context

The error itself (lastErr) is kept as-is since APIError.Error() already
truncates response bodies to 200 chars, and network errors don't contain
user-controlled data.
2026-05-11 04:52:41 -07:00
Rodin ac53ecfa5d fix(gitea): improve retry logic precision for net.OpError
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 46s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m11s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m23s
Address review feedback on isTemporaryNetError being too broad:

1. RetryBackoff field: Added doc comment clarifying it must be
   configured before the first request (addresses concurrency concern).

2. isTemporaryNetError: Now inspects the underlying syscall error
   instead of treating all net.OpError as retriable. Only retries on:
   - ECONNREFUSED (connection refused)
   - ECONNRESET (connection reset)
   - ENETUNREACH (network unreachable)
   - EHOSTUNREACH (host unreachable)
   - ETIMEDOUT (connection timed out)

   Permanent errors like EACCES, EPERM are no longer retried.

3. DNS errors: Changed from Temporary() to IsTimeout, since
   "no such host" is permanent and shouldn't be retried.

4. Empty backoff slice: Added comment explaining that retry without
   delay is intentional when caller explicitly configures it.

Addresses MINOR findings from sonnet-review-bot and gpt-review-bot.
2026-05-11 04:32:15 -07:00
Rodin 090ae3848c fix(gitea): make retry backoff configurable and retry temp net errors
PR Ready Gate / clear-labels (pull_request) Successful in 1s
CI / test (pull_request) Successful in 17s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 34s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 45s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m34s
Address review feedback:

1. Make backoff delays injectable via Client.RetryBackoff field
   - Defaults to {1s, 2s} when nil for production
   - Tests can set shorter values for fast execution
   - Fixes slow unit tests that previously waited 3+ seconds

2. Add retry on temporary network errors (net.OpError, net.DNSError)
   - Connection refused, network unreachable, DNS failures now retry
   - Non-temporary network errors still fail immediately
   - Context cancellation still respected during backoff

Added isTemporaryNetError helper and TestIsTemporaryNetError test.
Updated existing retry tests to use configurable short backoffs.
2026-05-11 04:23:27 -07:00
Rodin 23da7eedf5 fix(gitea): address review feedback on retry logic
PR Ready Gate / clear-labels (pull_request) Successful in 2s
CI / test (pull_request) Successful in 23s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 30s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 45s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m30s
- Remove dead backoff[0] element; array now only contains retry delays
- Fix time.After timer leak by using time.NewTimer with timer.Stop()
- Add io.LimitReader (64KB) for error body reads to bound memory allocation

Addresses feedback from sonnet-review-bot, security-review-bot, and gpt-review-bot.
2026-05-11 01:08:01 -07:00
Rodin 7279cdd216 feat(gitea): add retry logic for 5xx errors in doGet
CI / test (pull_request) Successful in 28s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 30s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m27s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m33s
Addresses transient HTTP 500 errors from Gitea API during pattern fetches.

Changes:
- Add retry with exponential backoff (1s, 2s) to doGet(), max 3 attempts
- Add IsServerError() helper to detect 5xx responses
- No retry on 4xx errors (client errors should propagate immediately)
- Respects context cancellation during backoff waits
- Logs retries at WARN level for observability

All existing tests pass. New tests:
- TestIsServerError: validates 5xx detection across edge cases
- TestDoGet_RetriesOn500: verifies recovery after transient errors
- TestDoGet_FailsAfterMaxRetries: verifies proper failure after exhaustion
- TestDoGet_NoRetryOn4xx: ensures client errors don't retry
- TestDoGet_RespectsContextCancellation: validates cancellation during backoff

Closes #68
2026-05-11 00:59:47 -07:00
aweiker 4c327b61d4 Merge pull request 'feat: log loaded pattern files for debugging' (#65) from issue-64 into main
CI / test (push) Successful in 15s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #65
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
2026-05-11 06:53:00 +00:00
aweiker 877dbf9999 Merge pull request 'docs: add read:user to required token scopes' (#67) from issue-66 into main
CI / test (push) Successful in 16s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (push) Has been skipped
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (push) Has been skipped
Reviewed-on: #67
Reviewed-by: security-review-bot <10+security-review-bot@noreply.gitea.weiker.me>
2026-05-11 06:52:21 +00:00
Rodin deade3c5a0 feat: log loaded pattern files for debugging
CI / test (pull_request) Successful in 19s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 29s
CI / review (gpt-5, security, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 42s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 1m5s
When patterns-repo is configured, now logs at Info level:
- File paths loaded from each repo
- Count of files per repo

At Debug level logs skipped files (non-markdown/txt/yaml).

Warns if no pattern files were loaded from a repo (likely
misconfigured patterns-files path).

Closes #64
2026-05-10 23:37:14 -07:00
14 changed files with 2232 additions and 152 deletions
+4 -2
View File
@@ -38,6 +38,8 @@ jobs:
- name: security
token_secret: SECURITY_REVIEW_TOKEN
model: gpt-5
patterns_repo: rodin/security-patterns
patterns_files: "."
system_prompt_file: SECURITY_REVIEW.md
steps:
- uses: actions/checkout@v4
@@ -60,8 +62,8 @@ jobs:
AICORE_API_URL: ${{ secrets.AICORE_API_URL }}
AICORE_RESOURCE_GROUP: ${{ secrets.AICORE_RESOURCE_GROUP }}
CONVENTIONS_FILE: "CONVENTIONS.md"
PATTERNS_REPO: "rodin/go-patterns"
PATTERNS_FILES: "README.md,patterns/"
PATTERNS_REPO: ${{ matrix.patterns_repo || 'rodin/go-patterns' }}
PATTERNS_FILES: ${{ matrix.patterns_files || 'README.md,patterns/' }}
LLM_TIMEOUT: "600"
SYSTEM_PROMPT_FILE: ${{ matrix.system_prompt_file }}
run: ./review-bot
+1 -1
View File
@@ -9,7 +9,7 @@
| Package | Use Case | Scope |
|---------|----------|-------|
| `gopkg.in/yaml.v3` | YAML parsing (persona files, config) | production |
| `github.com/goccy/go-yaml` | YAML parsing and AST inspection (subpkgs: `ast`, `parser`) | production |
| `github.com/google/go-cmp` | Test comparisons (`cmp.Diff`) | test only |
**Any import not in this table or the Go standard library is forbidden.**
+30 -7
View File
@@ -65,7 +65,7 @@ func main() {
conventionsFile := flag.String("conventions-file", envOrDefault("CONVENTIONS_FILE", ""), "Conventions file path in repo (e.g. CLAUDE.md)")
systemPromptFile := flag.String("system-prompt-file", envOrDefault("SYSTEM_PROMPT_FILE", ""), "Local file with additional system prompt instructions")
patternsRepo := flag.String("patterns-repo", envOrDefault("PATTERNS_REPO", ""), "Repo with language patterns (e.g. rodin/elixir-patterns)")
patternsFiles := flag.String("patterns-files", envOrDefault("PATTERNS_FILES", "README.md"), "Comma-separated file paths to fetch from patterns repo")
patternsFiles := flag.String("patterns-files", envOrDefault("PATTERNS_FILES", ""), "Comma-separated file paths to fetch from patterns repo (empty = all files)")
dryRun := flag.Bool("dry-run", false, "Print review to stdout instead of posting")
llmTemp := flag.Float64("llm-temperature", envOrDefaultFloat("LLM_TEMPERATURE", 0), "LLM temperature (0 = server default)")
llmTimeout := flag.Int("llm-timeout", envOrDefaultInt("LLM_TIMEOUT", 300), "LLM request timeout in seconds (default 300)")
@@ -523,11 +523,25 @@ func fetchFileContext(ctx context.Context, client *gitea.Client, owner, repo, re
// patternsRepo is comma-separated list of owner/name repos.
// patternsFiles is comma-separated list of file paths or directories.
// If a path ends with / or is a directory, all files within it are fetched recursively.
// If patternsFiles is empty, all files from the repo root are fetched.
func fetchPatterns(ctx context.Context, client *gitea.Client, patternsRepo, patternsFiles string) string {
var sb strings.Builder
repos := strings.Split(patternsRepo, ",")
paths := strings.Split(patternsFiles, ",")
// Build the list of paths to fetch
var paths []string
if patternsFiles == "" {
// Empty patternsFiles means "fetch all files from repo root"
paths = []string{""}
} else {
for _, p := range strings.Split(patternsFiles, ",") {
p = strings.TrimSpace(p)
if p != "" {
paths = append(paths, p)
}
}
}
for _, repoRef := range repos {
if ctx.Err() != nil {
@@ -544,12 +558,10 @@ func fetchPatterns(ctx context.Context, client *gitea.Client, patternsRepo, patt
}
owner, repo := parts[0], parts[1]
for _, path := range paths {
path = strings.TrimSpace(path)
if path == "" {
continue
}
var repoLoadedFiles []string
var repoSkippedFiles []string
for _, path := range paths {
files, err := client.GetAllFilesInPath(ctx, owner, repo, path)
if err != nil {
slog.Warn("could not fetch patterns", "path", path, "repo", repoRef, "error", err)
@@ -559,11 +571,22 @@ func fetchPatterns(ctx context.Context, client *gitea.Client, patternsRepo, patt
for filePath, content := range files {
// Only include markdown and text files as patterns
if !isPatternFile(filePath) {
repoSkippedFiles = append(repoSkippedFiles, filePath)
continue
}
repoLoadedFiles = append(repoLoadedFiles, filePath)
sb.WriteString(fmt.Sprintf("### %s/%s\n\n%s\n\n", repoRef, filePath, content))
}
}
if len(repoLoadedFiles) > 0 {
slog.Info("loaded pattern files", "repo", repoRef, "count", len(repoLoadedFiles), "files", repoLoadedFiles)
} else {
slog.Warn("no pattern files loaded", "repo", repoRef, "paths", paths)
}
if len(repoSkippedFiles) > 0 {
slog.Debug("skipped non-pattern files", "repo", repoRef, "count", len(repoSkippedFiles), "files", repoSkippedFiles)
}
}
return sb.String()
}
+46
View File
@@ -504,6 +504,52 @@ func TestIsPatternFile(t *testing.T) {
}
}
// TestBuildPatternPaths verifies the path-building logic for fetchPatterns.
// Empty patternsFiles means "fetch all from root" (represented as [""]).
func TestBuildPatternPaths(t *testing.T) {
buildPaths := func(patternsFiles string) []string {
if patternsFiles == "" {
return []string{""}
}
var paths []string
for _, p := range strings.Split(patternsFiles, ",") {
p = strings.TrimSpace(p)
if p != "" {
paths = append(paths, p)
}
}
return paths
}
tests := []struct {
name string
input string
want []string
}{
{"empty fetches root", "", []string{""}},
{"single file", "README.md", []string{"README.md"}},
{"multiple files", "README.md,PATTERNS.md", []string{"README.md", "PATTERNS.md"}},
{"trims whitespace", " foo.md , bar.md ", []string{"foo.md", "bar.md"}},
{"skips empty between commas", "foo.md,,bar.md", []string{"foo.md", "bar.md"}},
{"directory path", "patterns/", []string{"patterns/"}},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
got := buildPaths(tc.input)
if len(got) != len(tc.want) {
t.Errorf("buildPaths(%q) = %v, want %v", tc.input, got, tc.want)
return
}
for i := range got {
if got[i] != tc.want[i] {
t.Errorf("buildPaths(%q)[%d] = %q, want %q", tc.input, i, got[i], tc.want[i])
}
}
})
}
}
func TestEvaluateCIStatus(t *testing.T) {
tests := []struct {
name string
+10 -31
View File
@@ -9,7 +9,7 @@ JSON is awkward for persona files that contain multi-line text (identity, severi
- Backwards compatibility: existing JSON personas must continue to work
- Security: protect against DoS via deeply nested YAML (AIKIDO-2024-10486)
- Consistency: use `.yaml` extension (not `.yml`)
- Library: use `gopkg.in/yaml.v3` (approved in CONVENTIONS.md) with explicit depth limiting
- Library: use `github.com/goccy/go-yaml` v1.16.0+ (approved in CONVENTIONS.md); we implement custom AST-based depth/node-count checks for precise alias-aware validation
## Proposed Approach
@@ -33,37 +33,16 @@ func parsePersona(data []byte, source string) (*Persona, error) {
### YAML Parsing with Depth Protection
```go
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
var node yaml.Node
dec := yaml.NewDecoder(bytes.NewReader(data))
if err := dec.Decode(&node); err != nil {
return err
}
if err := checkYAMLDepth(&node, 0, maxDepth); err != nil {
return err
}
return node.Decode(out)
}
We implement a custom AST-based depth/node-count walk (`checkYAMLDepth` in
`review/persona.go`) rather than relying on library decoder options. Key design
decisions:
func checkYAMLDepth(node *yaml.Node, depth, maxDepth int) error {
if depth > maxDepth {
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
}
// Handle alias nodes by following the Alias pointer
if node.Kind == yaml.AliasNode && node.Alias != nil {
return checkYAMLDepth(node.Alias, depth, maxDepth)
}
for _, child := range node.Content {
if err := checkYAMLDepth(child, depth+1, maxDepth); err != nil {
return err
}
}
return nil
}
```
- **Library:** `github.com/goccy/go-yaml` with `ast.Node`-based traversal
- **Dual-map tracking:** `validated` (depth-aware short-circuit) + `visiting` (cycle detection)
- **Node-count limit:** Conservative overcounting bounds total validation work
- **Alias-aware depth:** Aliases increment depth and are re-checked when encountered at greater depths
The `gopkg.in/yaml.v3` library does not have built-in depth protection, so we implement explicit depth checking by first decoding into a `yaml.Node`, walking the tree to verify depth (including alias resolution), then decoding into the target struct.
See `review/persona.go:checkYAMLDepth` for the authoritative implementation.
## State/Data Model
@@ -74,7 +53,7 @@ No new state. Same `Persona` struct, just different parsing.
| Error | Handling |
|-------|----------|
| Invalid YAML syntax | Return parse error with source file |
| Deeply nested YAML | Library rejects (v1.16.0+ fix) |
| Deeply nested YAML | Custom AST walk (`checkYAMLDepth`) rejects before decode |
| Unknown extension | Fall back to JSON parsing |
| Missing required fields | Validation rejects after parse |
+332 -22
View File
@@ -11,9 +11,12 @@ import (
"fmt"
"io"
"log/slog"
"math"
"net"
"net/http"
"net/url"
"strings"
"syscall"
"time"
)
@@ -39,12 +42,68 @@ func IsNotFound(err error) bool {
return errors.As(err, &apiErr) && apiErr.StatusCode == http.StatusNotFound
}
// IsServerError reports whether an error is an API 5xx response.
func IsServerError(err error) bool {
var apiErr *APIError
return errors.As(err, &apiErr) && apiErr.StatusCode >= 500 && apiErr.StatusCode < 600
}
// DefaultMaxDiffSize is the default maximum diff size in bytes (10 MB).
const DefaultMaxDiffSize = 10 * 1024 * 1024
// ErrDiffTooLarge is returned when a PR diff exceeds the configured MaxDiffSize.
var ErrDiffTooLarge = errors.New("diff size exceeds maximum allowed size")
// Client interacts with the Gitea API.
// A Client is safe for concurrent use by multiple goroutines.
type Client struct {
baseURL string
token string
http *http.Client
// RetryBackoff defines the delays between retry attempts.
// RetryBackoff[i] is the delay before attempt i+1 (after attempt i fails).
// If nil, defaults to {1s, 2s}. Set to shorter durations in tests.
//
// This field must be configured before the first request is made.
// Modifying it while requests are in flight is not safe.
RetryBackoff []time.Duration
// MaxDiffSize is the maximum number of bytes allowed when fetching a PR diff.
// If zero, defaults to DefaultMaxDiffSize (10 MB). Set to any negative value
// (or math.MaxInt64) to disable the limit.
//
// This field must be configured before the first request is made.
// Modifying it while requests are in flight is not safe.
MaxDiffSize int64
}
// defaultCheckRedirect is the redirect policy used by NewClient.
// NOTE: This function is intentionally duplicated in github/client.go (and vice versa)
// because the packages are separate. Changes here must be mirrored there.
// It rejects HTTPS->HTTP protocol downgrades (to prevent plaintext leakage)
// and cross-host redirects (to prevent following responses from untrusted
// endpoints). Same-host, same-or-upgraded-scheme redirects are allowed.
func defaultCheckRedirect(req *http.Request, via []*http.Request) error {
if len(via) >= 10 {
return fmt.Errorf("stopped after 10 redirects")
}
// Guard for direct invocation in tests and any future callers;
// net/http guarantees len(via) >= 1 during actual redirects.
if len(via) == 0 {
return nil
}
prev := via[len(via)-1]
// Reject protocol downgrade: HTTPS->HTTP leaks request metadata over plaintext.
if prev.URL.Scheme == "https" && req.URL.Scheme == "http" {
return fmt.Errorf("refusing redirect: HTTPS to HTTP downgrade (%s -> %s)", prev.URL.Host, req.URL.Host)
}
// Reject cross-host redirect entirely to avoid consuming responses
// from untrusted endpoints.
if req.URL.Host != prev.URL.Host {
return fmt.Errorf("refusing redirect: cross-host (%s -> %s)", prev.URL.Host, req.URL.Host)
}
return nil
}
// NewClient creates a new Gitea API client.
@@ -52,10 +111,33 @@ func NewClient(baseURL, token string) *Client {
return &Client{
baseURL: strings.TrimRight(baseURL, "/"),
token: token,
http: &http.Client{Timeout: 30 * time.Second},
http: &http.Client{
Timeout: 30 * time.Second,
CheckRedirect: defaultCheckRedirect,
},
}
}
// SetHTTPClient sets the underlying HTTP client used for requests.
// This is intended for test setup only to inject mock transports; it must be
// called before any goroutines issue requests.
//
// Passing nil restores the default client (30s timeout + redirect-rejecting
// CheckRedirect policy matching NewClient).
//
// Callers providing a non-nil client are responsible for configuring a safe
// CheckRedirect policy. Without one, the default net/http behavior will follow
// redirects and may forward the Authorization header to untrusted hosts.
func (c *Client) SetHTTPClient(hc *http.Client) {
if hc == nil {
hc = &http.Client{
Timeout: 30 * time.Second,
CheckRedirect: defaultCheckRedirect,
}
}
c.http = hc
}
// PullRequest holds relevant PR metadata.
type PullRequest struct {
Title string `json:"title"`
@@ -103,9 +185,28 @@ func (c *Client) GetPullRequest(ctx context.Context, owner, repo string, number
}
// GetPullRequestDiff fetches the unified diff for a PR.
// It enforces MaxDiffSize to prevent unbounded memory allocation.
// Returns ErrDiffTooLarge if the diff exceeds the configured limit.
func (c *Client) GetPullRequestDiff(ctx context.Context, owner, repo string, number int) (string, error) {
reqURL := fmt.Sprintf("%s/api/v1/repos/%s/%s/pulls/%d.diff", c.baseURL, url.PathEscape(owner), url.PathEscape(repo), number)
body, err := c.doGet(ctx, reqURL)
maxSize := c.MaxDiffSize
if maxSize == 0 {
maxSize = DefaultMaxDiffSize
}
// When the limit is disabled (negative) or set to math.MaxInt64 (which
// would overflow the +1 detection and silently disable enforcement),
// use the standard unlimited doGet path.
if maxSize < 0 || maxSize == math.MaxInt64 {
body, err := c.doGet(ctx, reqURL)
if err != nil {
return "", fmt.Errorf("fetch diff: %w", err)
}
return string(body), nil
}
body, err := c.doGetLimited(ctx, reqURL, maxSize)
if err != nil {
return "", fmt.Errorf("fetch diff: %w", err)
}
@@ -210,24 +311,218 @@ func (c *Client) PostReview(ctx context.Context, owner, repo string, number int,
return &review, nil
}
// isTemporaryNetError reports whether err is a temporary network error worth retrying.
// This includes connection refused, network unreachable, connection reset, and DNS
// timeouts. It explicitly excludes permanent errors like permission denied or
// "no such host" DNS failures.
func isTemporaryNetError(err error) bool {
if err == nil {
return false
}
// Check for OpError and inspect the underlying syscall error.
// Not all OpErrors are transient — permission denied, for example, is permanent.
var opErr *net.OpError
if errors.As(err, &opErr) {
return isRetriableSyscallError(opErr.Err)
}
// DNS errors: only retry on timeout, not on "no such host" which is permanent.
var dnsErr *net.DNSError
if errors.As(err, &dnsErr) {
return dnsErr.IsTimeout
}
// Check for net.Error with Timeout() (Temporary is deprecated)
var netErr net.Error
if errors.As(err, &netErr) {
return netErr.Timeout()
}
return false
}
// isRetriableSyscallError reports whether the underlying error from a net.OpError
// is a transient syscall error worth retrying.
func isRetriableSyscallError(err error) bool {
if err == nil {
return false
}
// Check for syscall.Errno directly or wrapped
var errno syscall.Errno
if errors.As(err, &errno) {
switch errno {
case syscall.ECONNREFUSED, // connection refused — server not listening
syscall.ECONNRESET, // connection reset by peer
syscall.ENETUNREACH, // network unreachable
syscall.EHOSTUNREACH, // host unreachable
syscall.ETIMEDOUT: // connection timed out
return true
default:
// EACCES, EPERM, etc. are permanent — don't retry
return false
}
}
// If we can't identify the specific syscall error, be conservative and retry.
// This handles wrapped errors or platform-specific error types.
// The retry count is limited, so erring on the side of retrying is safe.
return true
}
// redactURL strips query parameters and userinfo credentials from a URL for
// safe logging. This prevents accidental exposure of sensitive data (tokens in
// query strings, or user:pass in the authority) in log output.
func redactURL(rawURL string) string {
parsed, err := url.Parse(rawURL)
if err != nil {
// If we cannot parse it, return a safe placeholder rather than
// potentially logging something sensitive.
return "[invalid URL]"
}
if parsed.User != nil {
parsed.User = url.User("REDACTED")
}
if parsed.RawQuery != "" {
parsed.RawQuery = "[redacted]"
}
return parsed.String()
}
// sanitizeErrorForLog returns a loggable version of an error that omits
// potentially sensitive content like response bodies. For APIError, only
// the status code is included; for other errors, the type is preserved.
func sanitizeErrorForLog(err error) string {
if err == nil {
return "<nil>"
}
var apiErr *APIError
if errors.As(err, &apiErr) {
return fmt.Sprintf("HTTP %d", apiErr.StatusCode)
}
return err.Error()
}
// doGetWithReader performs an HTTP GET request with retry on 5xx errors and
// temporary network errors. Retries up to 3 times with exponential backoff
// (1s, 2s delays by default; configurable via Client.RetryBackoff for testing).
// The readBody function is called with the response body on success (2xx) and
// is responsible for reading and closing it.
func (c *Client) doGetWithReader(ctx context.Context, reqURL string, readBody func(io.ReadCloser) ([]byte, error)) ([]byte, error) {
const maxAttempts = 3
// backoff[i] is the delay before attempt i+1 (i.e., after attempt i fails).
// First attempt (i=0) has no delay; retries wait 1s then 2s by default.
backoff := c.RetryBackoff
if backoff == nil {
backoff = []time.Duration{1 * time.Second, 2 * time.Second}
}
// maxErrorBodyBytes limits how much of an error response body we read
// to protect against malicious servers sending unbounded data.
const maxErrorBodyBytes = 64 * 1024 // 64 KB
var lastErr error
for attempt := 0; attempt < maxAttempts; attempt++ {
if attempt > 0 {
// Determine delay: use backoff slice if available, otherwise retry immediately.
// An empty RetryBackoff slice means "retry without delay" — this is intentional
// as the caller explicitly configured no delays.
var delay time.Duration
if attempt-1 < len(backoff) {
delay = backoff[attempt-1]
}
if delay > 0 {
slog.Warn("retrying request after error",
"attempt", attempt+1,
"url", redactURL(reqURL),
"delay", delay.String(),
"lastError", sanitizeErrorForLog(lastErr))
timer := time.NewTimer(delay)
select {
case <-timer.C:
case <-ctx.Done():
timer.Stop()
return nil, ctx.Err()
}
}
}
req, err := http.NewRequestWithContext(ctx, http.MethodGet, reqURL, nil)
if err != nil {
return nil, err
}
req.Header.Set("Authorization", "token "+c.token)
resp, err := c.http.Do(req)
if err != nil {
// Always capture the error for consistent return at loop end.
// This ensures both network errors and HTTP 5xx return lastErr.
lastErr = err
// Only retry temporary network errors when attempts remain.
if attempt < maxAttempts-1 && isTemporaryNetError(err) {
slog.Warn("temporary network error, will retry",
"attempt", attempt+1,
"url", redactURL(reqURL),
"error", err)
continue
}
// Non-retryable network error or final attempt exhausted.
return nil, lastErr
}
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
return readBody(resp.Body)
}
// Error path: limit how much we read from potentially malicious server
errBody, _ := io.ReadAll(io.LimitReader(resp.Body, maxErrorBodyBytes))
resp.Body.Close()
lastErr = &APIError{StatusCode: resp.StatusCode, Body: string(errBody)}
// Only retry on 5xx server errors
if resp.StatusCode < 500 || resp.StatusCode >= 600 {
return nil, lastErr
}
}
return nil, lastErr
}
// doGet performs an HTTP GET request with retry, reading the full response body.
func (c *Client) doGet(ctx context.Context, reqURL string) ([]byte, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, reqURL, nil)
if err != nil {
return nil, err
}
req.Header.Set("Authorization", "token "+c.token)
return c.doGetWithReader(ctx, reqURL, func(body io.ReadCloser) ([]byte, error) {
defer body.Close()
return io.ReadAll(body)
})
}
resp, err := c.http.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
body, _ := io.ReadAll(resp.Body)
return nil, &APIError{StatusCode: resp.StatusCode, Body: string(body)}
}
return io.ReadAll(resp.Body)
// doGetLimited performs an HTTP GET request with retry but enforces a maximum
// response body size. Returns ErrDiffTooLarge if the response exceeds maxBytes.
// It reads maxBytes+1 (clamped to avoid overflow) to detect truncation without
// buffering the entire body.
func (c *Client) doGetLimited(ctx context.Context, reqURL string, maxBytes int64) ([]byte, error) {
return c.doGetWithReader(ctx, reqURL, func(body io.ReadCloser) ([]byte, error) {
defer body.Close()
// Read up to maxBytes+1 to detect overflow.
// Clamp to prevent integer overflow when maxBytes == math.MaxInt64.
limitBytes := maxBytes + 1
if limitBytes <= 0 {
limitBytes = math.MaxInt64
}
limited := io.LimitReader(body, limitBytes)
data, err := io.ReadAll(limited)
if err != nil {
return nil, err
}
if int64(len(data)) > maxBytes {
return nil, fmt.Errorf("%w: response exceeds %d bytes", ErrDiffTooLarge, maxBytes)
}
return data, nil
})
}
// escapePath escapes each segment of a relative file path for use in URLs.
@@ -251,7 +546,13 @@ type ContentEntry struct {
// ListContents lists files and directories at a given path in a repo.
// Pass an empty path to list the repository root.
// If the path points to a file (not a directory), Gitea returns a single
// object instead of an array; this method normalizes both cases to a slice.
func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]ContentEntry, error) {
// Normalize "." to empty string — Gitea API rejects "." with 500
if path == "." {
path = ""
}
var reqURL string
if path == "" {
reqURL = fmt.Sprintf("%s/api/v1/repos/%s/%s/contents", c.baseURL, url.PathEscape(owner), url.PathEscape(repo))
@@ -264,7 +565,16 @@ func (c *Client) ListContents(ctx context.Context, owner, repo, path string) ([]
}
var entries []ContentEntry
if err := json.Unmarshal(body, &entries); err != nil {
return nil, fmt.Errorf("parse contents JSON: %w", err)
// Gitea returns a single object (not an array) when path is a file
var single ContentEntry
if err2 := json.Unmarshal(body, &single); err2 != nil {
return nil, fmt.Errorf("parse contents JSON: %w", err)
}
// Guard against empty/malformed responses
if single.Name == "" && single.Path == "" {
return nil, fmt.Errorf("parse contents JSON: empty response for path %q", path)
}
entries = []ContentEntry{single}
}
return entries, nil
}
@@ -317,9 +627,9 @@ func (c *Client) GetAllFilesInPath(ctx context.Context, owner, repo, path string
// Review represents a pull request review from the Gitea API.
type Review struct {
ID int64 `json:"id"`
Body string `json:"body"`
User struct {
ID int64 `json:"id"`
Body string `json:"body"`
User struct {
Login string `json:"login"`
} `json:"user"`
State string `json:"state"`
+523 -5
View File
@@ -6,10 +6,15 @@ import (
"errors"
"fmt"
"io"
"net"
"net/http"
"net/http/httptest"
"net/url"
"strings"
"sync/atomic"
"syscall"
"testing"
"time"
)
func TestGetPullRequest(t *testing.T) {
@@ -276,11 +281,64 @@ func TestListContents(t *testing.T) {
}
}
func TestListContents_DotPath(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// "." should be normalized to empty path, which hits the root contents endpoint
if r.URL.Path != "/api/v1/repos/owner/repo/contents" {
t.Errorf("expected root contents path, got: %s", r.URL.Path)
}
w.Header().Set("Content-Type", "application/json")
fmt.Fprintf(w, `[{"name":"README.md","path":"README.md","type":"file"}]`)
}))
defer server.Close()
client := NewClient(server.URL, "test-token")
entries, err := client.ListContents(context.Background(), "owner", "repo", ".")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(entries) != 1 {
t.Fatalf("expected 1 entry, got %d", len(entries))
}
if entries[0].Name != "README.md" {
t.Errorf("expected README.md, got %s", entries[0].Name)
}
}
func TestListContents_FilePath(t *testing.T) {
// Gitea returns a single object (not an array) when path is a file
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/api/v1/repos/owner/repo/contents/README.md" {
t.Errorf("unexpected path: %s", r.URL.Path)
}
w.Header().Set("Content-Type", "application/json")
// Single object, not an array
fmt.Fprintf(w, `{"name":"README.md","path":"README.md","type":"file"}`)
}))
defer server.Close()
client := NewClient(server.URL, "test-token")
entries, err := client.ListContents(context.Background(), "owner", "repo", "README.md")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(entries) != 1 {
t.Fatalf("expected 1 entry, got %d", len(entries))
}
if entries[0].Name != "README.md" {
t.Errorf("expected README.md, got %s", entries[0].Name)
}
if entries[0].Type != "file" {
t.Errorf("expected type file, got %s", entries[0].Type)
}
}
func TestGetAllFilesInPath_File(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path == "/api/v1/repos/owner/repo/contents/README.md" {
// Gitea returns 404 for contents API on files (it's not a dir)
http.NotFound(w, r)
// Gitea returns a single object (not array) when path is a file
w.Header().Set("Content-Type", "application/json")
fmt.Fprintf(w, `{"name":"README.md","path":"README.md","type":"file"}`)
return
}
if r.URL.Path == "/api/v1/repos/owner/repo/raw/README.md" {
@@ -584,9 +642,9 @@ func TestGetAllFilesInPath_403Propagates(t *testing.T) {
func TestIsNotFound(t *testing.T) {
tests := []struct {
name string
err error
want bool
name string
err error
want bool
}{
{"nil error", nil, false},
{"non-API error", fmt.Errorf("network timeout"), false},
@@ -743,3 +801,463 @@ func TestResolveComment_Error(t *testing.T) {
t.Fatal("expected error for 404 response")
}
}
func TestIsServerError(t *testing.T) {
tests := []struct {
name string
err error
want bool
}{
{"nil error", nil, false},
{"non-API error", fmt.Errorf("network timeout"), false},
{"404 APIError", &APIError{StatusCode: 404, Body: "not found"}, false},
{"500 APIError", &APIError{StatusCode: 500, Body: "server error"}, true},
{"502 APIError", &APIError{StatusCode: 502, Body: "bad gateway"}, true},
{"503 APIError", &APIError{StatusCode: 503, Body: "unavailable"}, true},
{"599 APIError", &APIError{StatusCode: 599, Body: "edge case"}, true},
{"600 not server error", &APIError{StatusCode: 600, Body: "edge"}, false},
{"400 not server error", &APIError{StatusCode: 400, Body: "bad request"}, false},
{"wrapped 500", fmt.Errorf("fetch: %w", &APIError{StatusCode: 500, Body: "err"}), true},
{"wrapped 404", fmt.Errorf("fetch: %w", &APIError{StatusCode: 404, Body: "err"}), false},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := IsServerError(tt.err)
if got != tt.want {
t.Errorf("IsServerError(%v) = %v, want %v", tt.err, got, tt.want)
}
})
}
}
func TestDoGet_RetriesOn500(t *testing.T) {
attempts := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts < 3 {
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte(`{"message":"transient error"}`))
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"data":"success"}`))
}))
defer server.Close()
client := NewClient(server.URL, "test-token")
// Use short backoff for fast tests
client.RetryBackoff = []time.Duration{1 * time.Millisecond, 1 * time.Millisecond}
body, err := client.doGet(context.Background(), server.URL+"/test")
if err != nil {
t.Fatalf("expected success after retry, got error: %v", err)
}
if string(body) != `{"data":"success"}` {
t.Errorf("body = %q, want %q", string(body), `{"data":"success"}`)
}
if attempts != 3 {
t.Errorf("attempts = %d, want 3", attempts)
}
}
func TestDoGet_FailsAfterMaxRetries(t *testing.T) {
attempts := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte(`{"message":"persistent error"}`))
}))
defer server.Close()
client := NewClient(server.URL, "test-token")
// Use short backoff for fast tests
client.RetryBackoff = []time.Duration{1 * time.Millisecond, 1 * time.Millisecond}
_, err := client.doGet(context.Background(), server.URL+"/test")
if err == nil {
t.Fatal("expected error after max retries")
}
var apiErr *APIError
if !errors.As(err, &apiErr) {
t.Fatalf("expected APIError, got: %v", err)
}
if apiErr.StatusCode != http.StatusInternalServerError {
t.Errorf("status = %d, want 500", apiErr.StatusCode)
}
if attempts != 3 {
t.Errorf("attempts = %d, want 3 (max retries)", attempts)
}
}
func TestDoGet_NoRetryOn4xx(t *testing.T) {
attempts := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
w.WriteHeader(http.StatusForbidden)
w.Write([]byte(`{"message":"forbidden"}`))
}))
defer server.Close()
client := NewClient(server.URL, "test-token")
_, err := client.doGet(context.Background(), server.URL+"/test")
if err == nil {
t.Fatal("expected error for 403")
}
var apiErr *APIError
if !errors.As(err, &apiErr) {
t.Fatalf("expected APIError, got: %v", err)
}
if apiErr.StatusCode != http.StatusForbidden {
t.Errorf("status = %d, want 403", apiErr.StatusCode)
}
if attempts != 1 {
t.Errorf("attempts = %d, want 1 (no retry on 4xx)", attempts)
}
}
func TestDoGet_RespectsContextCancellation(t *testing.T) {
attempts := 0
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
w.WriteHeader(http.StatusInternalServerError)
w.Write([]byte(`{"message":"error"}`))
}))
defer server.Close()
ctx, cancel := context.WithCancel(context.Background())
client := NewClient(server.URL, "test-token")
// Use longer backoff to give us time to cancel during the wait
client.RetryBackoff = []time.Duration{100 * time.Millisecond, 100 * time.Millisecond}
// Cancel after first attempt returns and retry begins
go func() {
time.Sleep(20 * time.Millisecond)
cancel()
}()
_, err := client.doGet(ctx, server.URL+"/test")
if err == nil {
t.Fatal("expected error on context cancellation")
}
// Should have made 1 attempt, then context cancelled during backoff
if attempts != 1 {
t.Errorf("attempts = %d, expected 1 before context cancel during backoff", attempts)
}
}
// mockTransport is a test helper that returns errors for the first N calls,
// then delegates to a real server.
type mockTransport struct {
failCount int32 // number of failures remaining (atomic)
failErr error // error to return on failure
realServer *httptest.Server
attemptsMade atomic.Int32 // tracks total attempts
}
func (m *mockTransport) RoundTrip(req *http.Request) (*http.Response, error) {
m.attemptsMade.Add(1)
remaining := atomic.AddInt32(&m.failCount, -1)
if remaining >= 0 {
// Still have failures to return
return nil, m.failErr
}
// Redirect to real server
req.URL.Host = m.realServer.Listener.Addr().String()
req.URL.Scheme = "http"
return http.DefaultTransport.RoundTrip(req)
}
func TestDoGet_RetriesOnTemporaryNetError(t *testing.T) {
// Real server that will handle successful requests
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"status":"ok"}`))
}))
defer server.Close()
// Mock transport: fail twice with ECONNREFUSED, then succeed
mt := &mockTransport{
failCount: 2,
failErr: &net.OpError{Op: "dial", Net: "tcp", Err: syscall.ECONNREFUSED},
realServer: server,
}
client := NewClient("http://fake-host/", "test-token")
client.SetHTTPClient(&http.Client{Transport: mt})
client.RetryBackoff = []time.Duration{1 * time.Millisecond, 1 * time.Millisecond}
body, err := client.doGet(context.Background(), "http://fake-host/test")
if err != nil {
t.Fatalf("expected success after retries, got error: %v", err)
}
if string(body) != `{"status":"ok"}` {
t.Errorf("body = %q, want %q", string(body), `{"status":"ok"}`)
}
// Should have made exactly 3 attempts: 2 failures + 1 success
if got := mt.attemptsMade.Load(); got != 3 {
t.Errorf("attempts = %d, want 3 (2 failures + 1 success)", got)
}
}
func TestIsTemporaryNetError(t *testing.T) {
tests := []struct {
name string
err error
want bool
}{
{"nil error", nil, false},
{"plain error", fmt.Errorf("some error"), false},
// OpError with retriable syscall errors
{"OpError ECONNREFUSED", &net.OpError{Op: "dial", Err: syscall.ECONNREFUSED}, true},
{"OpError ECONNRESET", &net.OpError{Op: "read", Err: syscall.ECONNRESET}, true},
{"OpError ENETUNREACH", &net.OpError{Op: "dial", Err: syscall.ENETUNREACH}, true},
{"OpError EHOSTUNREACH", &net.OpError{Op: "dial", Err: syscall.EHOSTUNREACH}, true},
{"OpError ETIMEDOUT", &net.OpError{Op: "dial", Err: syscall.ETIMEDOUT}, true},
// OpError with permanent syscall errors — should NOT retry
{"OpError EACCES", &net.OpError{Op: "dial", Err: syscall.EACCES}, false},
{"OpError EPERM", &net.OpError{Op: "dial", Err: syscall.EPERM}, false},
// OpError with unknown inner error — conservative retry
{"OpError unknown inner", &net.OpError{Op: "dial", Err: fmt.Errorf("unknown")}, true},
// DNS errors
{"DNS timeout", &net.DNSError{IsTimeout: true}, true},
{"DNS no such host", &net.DNSError{IsTimeout: false, Name: "bad.host"}, false},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := isTemporaryNetError(tt.err)
if got != tt.want {
t.Errorf("isTemporaryNetError(%v) = %v, want %v", tt.err, got, tt.want)
}
})
}
}
func TestIsRetriableSyscallError(t *testing.T) {
tests := []struct {
name string
err error
want bool
}{
{"nil", nil, false},
{"ECONNREFUSED", syscall.ECONNREFUSED, true},
{"ECONNRESET", syscall.ECONNRESET, true},
{"ENETUNREACH", syscall.ENETUNREACH, true},
{"EHOSTUNREACH", syscall.EHOSTUNREACH, true},
{"ETIMEDOUT", syscall.ETIMEDOUT, true},
{"EACCES (permanent)", syscall.EACCES, false},
{"EPERM (permanent)", syscall.EPERM, false},
{"ENOENT (permanent)", syscall.ENOENT, false},
{"unknown error", fmt.Errorf("something"), true}, // conservative retry
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := isRetriableSyscallError(tt.err)
if got != tt.want {
t.Errorf("isRetriableSyscallError(%v) = %v, want %v", tt.err, got, tt.want)
}
})
}
}
func TestRedactURL(t *testing.T) {
tests := []struct {
name string
input string
want string
}{
{
name: "no query params",
input: "https://gitea.example.com/api/v1/repos/owner/repo/pulls/1",
want: "https://gitea.example.com/api/v1/repos/owner/repo/pulls/1",
},
{
name: "with query params - redacts",
input: "https://gitea.example.com/api/v1/repos/owner/repo/raw/file?ref=main",
want: "https://gitea.example.com/api/v1/repos/owner/repo/raw/file?[redacted]",
},
{
name: "multiple query params",
input: "https://example.com/path?token=secret&page=1",
want: "https://example.com/path?[redacted]",
},
{
name: "invalid URL",
input: "://invalid",
want: "[invalid URL]",
},
{
name: "empty string",
input: "",
want: "",
},
{
name: "with userinfo - redacts credentials",
input: "https://admin:secret@gitea.example.com/api/v1/repos",
want: "https://REDACTED@gitea.example.com/api/v1/repos",
},
{
name: "with userinfo and query params",
input: "https://user:pass@example.com/path?token=abc",
want: "https://REDACTED@example.com/path?[redacted]",
},
{
name: "username only - no password",
input: "https://user@example.com/path",
want: "https://REDACTED@example.com/path",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := redactURL(tt.input)
if got != tt.want {
t.Errorf("redactURL(%q) = %q, want %q", tt.input, got, tt.want)
}
})
}
}
func TestSanitizeErrorForLog(t *testing.T) {
tests := []struct {
name string
err error
want string
}{
{
name: "nil error",
err: nil,
want: "<nil>",
},
{
name: "APIError omits body",
err: &APIError{StatusCode: 500, Body: "internal error: database connection failed"},
want: "HTTP 500",
},
{
name: "APIError with large body still only shows status",
err: &APIError{StatusCode: 502, Body: strings.Repeat("x", 1000)},
want: "HTTP 502",
},
{
name: "non-API error preserved",
err: fmt.Errorf("connection refused"),
want: "connection refused",
},
{
name: "wrapped APIError",
err: fmt.Errorf("request failed: %w", &APIError{StatusCode: 503, Body: "service unavailable"}),
want: "HTTP 503",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := sanitizeErrorForLog(tt.err)
if got != tt.want {
t.Errorf("sanitizeErrorForLog() = %q, want %q", got, tt.want)
}
})
}
}
func TestNewClient_HasCheckRedirect(t *testing.T) {
c := NewClient("https://gitea.example.com", "token")
if c.http.CheckRedirect == nil {
t.Fatal("expected CheckRedirect to be set")
}
}
func TestDefaultCheckRedirect_RejectsHTTPSToHTTP(t *testing.T) {
prev := &http.Request{URL: &url.URL{Scheme: "https", Host: "gitea.example.com", Path: "/foo"}}
req := &http.Request{
URL: &url.URL{Scheme: "http", Host: "gitea.example.com", Path: "/foo"},
Header: http.Header{"Authorization": []string{"token abc"}},
}
err := defaultCheckRedirect(req, []*http.Request{prev})
if err == nil {
t.Fatal("expected error on HTTPS->HTTP redirect")
}
if !strings.Contains(err.Error(), "HTTPS to HTTP downgrade") {
t.Errorf("unexpected error message: %v", err)
}
}
func TestDefaultCheckRedirect_RejectsCrossHost(t *testing.T) {
prev := &http.Request{URL: &url.URL{Scheme: "https", Host: "gitea.example.com", Path: "/foo"}}
req := &http.Request{
URL: &url.URL{Scheme: "https", Host: "cdn.example.com", Path: "/bar"},
Header: http.Header{"Authorization": []string{"token abc"}},
}
err := defaultCheckRedirect(req, []*http.Request{prev})
if err == nil {
t.Fatal("expected error on cross-host redirect")
}
if !strings.Contains(err.Error(), "cross-host") {
t.Errorf("unexpected error message: %v", err)
}
}
func TestDefaultCheckRedirect_AllowsSameHost(t *testing.T) {
prev := &http.Request{URL: &url.URL{Scheme: "https", Host: "gitea.example.com", Path: "/foo"}}
req := &http.Request{
URL: &url.URL{Scheme: "https", Host: "gitea.example.com", Path: "/bar"},
Header: http.Header{"Authorization": []string{"token abc"}},
}
err := defaultCheckRedirect(req, []*http.Request{prev})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if auth := req.Header.Get("Authorization"); auth != "token abc" {
t.Errorf("expected Authorization to be preserved, got %q", auth)
}
}
func TestDefaultCheckRedirect_AllowsSameHostHTTPToHTTP(t *testing.T) {
prev := &http.Request{URL: &url.URL{Scheme: "http", Host: "localhost:3000", Path: "/foo"}}
req := &http.Request{
URL: &url.URL{Scheme: "http", Host: "localhost:3000", Path: "/bar"},
Header: http.Header{},
}
err := defaultCheckRedirect(req, []*http.Request{prev})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}
func TestDefaultCheckRedirect_RejectsTooManyRedirects(t *testing.T) {
via := make([]*http.Request, 10)
for i := range via {
via[i] = &http.Request{URL: &url.URL{Scheme: "https", Host: "gitea.example.com", Path: "/"}}
}
req := &http.Request{URL: &url.URL{Scheme: "https", Host: "gitea.example.com", Path: "/final"}}
err := defaultCheckRedirect(req, via)
if err == nil {
t.Fatal("expected error after 10 redirects")
}
if !strings.Contains(err.Error(), "10 redirects") {
t.Errorf("unexpected error message: %v", err)
}
}
func TestDefaultCheckRedirect_EmptyViaAllowed(t *testing.T) {
req := &http.Request{URL: &url.URL{Scheme: "https", Host: "gitea.example.com", Path: "/foo"}}
err := defaultCheckRedirect(req, nil)
if err != nil {
t.Fatalf("unexpected error with empty via: %v", err)
}
}
func TestSetHTTPClient_NilRestoresDefault(t *testing.T) {
c := NewClient("https://gitea.example.com", "token")
c.SetHTTPClient(nil)
if c.http == nil {
t.Fatal("expected non-nil http client after SetHTTPClient(nil)")
}
if c.http.Timeout != 30*time.Second {
t.Errorf("expected 30s timeout, got %v", c.http.Timeout)
}
if c.http.CheckRedirect == nil {
t.Fatal("expected CheckRedirect policy after SetHTTPClient(nil)")
}
}
+97
View File
@@ -0,0 +1,97 @@
package gitea
import (
"context"
"errors"
"math"
"net/http"
"net/http/httptest"
"strings"
"testing"
"time"
)
func TestGetPullRequestDiff_SizeLimits(t *testing.T) {
tests := []struct {
name string
diff string
maxDiffSize int64
wantErr error
wantDiff string
}{
{
name: "exceeds max size",
diff: strings.Repeat("+ added line\n", 1000), // ~13 KB
maxDiffSize: 100,
wantErr: ErrDiffTooLarge,
},
{
name: "within max size",
diff: "diff --git a/f.go b/f.go\n--- a/f.go\n+++ b/f.go\n@@ -1 +1 @@\n-old\n+new\n",
maxDiffSize: 1024,
wantDiff: "diff --git a/f.go b/f.go\n--- a/f.go\n+++ b/f.go\n@@ -1 +1 @@\n-old\n+new\n",
},
{
name: "exactly at limit",
diff: strings.Repeat("x", 50),
maxDiffSize: 50,
wantDiff: strings.Repeat("x", 50),
},
{
name: "one byte over limit",
diff: strings.Repeat("x", 51),
maxDiffSize: 50,
wantErr: ErrDiffTooLarge,
},
{
name: "disabled limit",
diff: strings.Repeat("x", 10000),
maxDiffSize: -1,
wantDiff: strings.Repeat("x", 10000),
},
{
name: "math.MaxInt64 treated as disabled",
diff: strings.Repeat("x", 10000),
maxDiffSize: math.MaxInt64,
wantDiff: strings.Repeat("x", 10000),
},
{
name: "default limit",
diff: "diff content",
maxDiffSize: 0, // zero means use DefaultMaxDiffSize
wantDiff: "diff content",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte(tt.diff)) //nolint:errcheck // test handler
}))
defer server.Close()
client := NewClient(server.URL, "test-token")
client.MaxDiffSize = tt.maxDiffSize
client.RetryBackoff = []time.Duration{}
got, err := client.GetPullRequestDiff(context.Background(), "owner", "repo", 1)
if tt.wantErr != nil {
if err == nil {
t.Fatal("expected error, got nil")
}
if !errors.Is(err, tt.wantErr) {
t.Errorf("expected %v, got: %v", tt.wantErr, err)
}
return
}
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if got != tt.wantDiff {
t.Errorf("diff mismatch: got length %d, want length %d", len(got), len(tt.wantDiff))
}
})
}
}
+305
View File
@@ -0,0 +1,305 @@
// Package github provides a client for the GitHub API.
// It supports pull request operations, file content retrieval,
// and review submission for both github.com and GitHub Enterprise.
package github
import (
"context"
"errors"
"fmt"
"io"
"net/http"
"strconv"
"strings"
"time"
)
const (
defaultBaseURL = "https://api.github.com"
// maxRetryAttempts is the number of times doRequest will attempt a request.
maxRetryAttempts = 3
// maxRetryAfter caps the maximum delay from a Retry-After header to prevent
// a server from stalling the client indefinitely.
maxRetryAfter = 60 * time.Second
// maxErrorBodyBytes limits how much of an error response body we read
// to protect against malicious servers sending unbounded data.
maxErrorBodyBytes = 64 * 1024 // 64 KB
// maxResponseBodyBytes limits how much of a successful response body we read
// for defense-in-depth against servers returning excessively large payloads.
maxResponseBodyBytes = 10 * 1024 * 1024 // 10 MB
)
// APIError represents an HTTP error response from the GitHub API.
// It carries the status code so callers can distinguish between
// different failure modes (e.g. 404 vs 500).
//
// The Body field stores up to 64 KiB of the raw response for programmatic
// inspection. Error() truncates to 200 bytes for safe logging, but callers
// should avoid logging or propagating Body directly in production since it may
// contain sensitive details from the upstream server.
type APIError struct {
StatusCode int
Body string
}
func (e *APIError) Error() string {
body := e.Body
if len(body) > 200 {
body = body[:200] + "...(truncated)"
}
// Sanitize newlines to prevent log injection from upstream response bodies.
body = strings.ReplaceAll(body, "\n", " ")
body = strings.ReplaceAll(body, "\r", " ")
return fmt.Sprintf("HTTP %d: %s", e.StatusCode, body)
}
// IsNotFound reports whether an error is an API 404 response.
func IsNotFound(err error) bool {
if apiErr, ok := asAPIError(err); ok {
return apiErr.StatusCode == http.StatusNotFound
}
return false
}
// IsUnauthorized reports whether an error is an API 401 response.
func IsUnauthorized(err error) bool {
if apiErr, ok := asAPIError(err); ok {
return apiErr.StatusCode == http.StatusUnauthorized
}
return false
}
func asAPIError(err error) (*APIError, bool) {
if err == nil {
return nil, false
}
var target *APIError
if errors.As(err, &target) {
return target, true
}
return nil, false
}
// Client interacts with the GitHub API.
// A Client is safe for concurrent use by multiple goroutines.
// SetHTTPClient and SetRetryBackoff are intended for test setup only and must
// be called before any goroutines issue requests; they have no synchronization.
type Client struct {
// TODO: baseURL is populated by NewClient but not yet consumed by doRequest/doGet.
// Higher-level exported methods (GetPullRequest, etc.) will use it to
// construct request URLs; remove this field if those methods end up
// accepting full URLs instead.
baseURL string
token string
httpClient *http.Client
// retryBackoff defines the delays between retry attempts for 429 responses.
// retryBackoff[i] is the delay before attempt i+1 (after attempt i fails).
// If nil, defaults to {1s, 2s}.
retryBackoff []time.Duration
// now returns the current time. Defaults to time.Now.
// Override in tests to control HTTP-date Retry-After calculations.
now func() time.Time
}
// defaultCheckRedirect is the redirect policy used by NewClient.
// NOTE: This function is intentionally duplicated in gitea/client.go (and vice versa)
// because the packages are separate. Changes here must be mirrored there.
// It rejects HTTPS->HTTP protocol downgrades (to prevent plaintext leakage)
// and cross-host redirects (to prevent following responses from untrusted
// endpoints). Same-host, same-or-upgraded-scheme redirects are allowed.
func defaultCheckRedirect(req *http.Request, via []*http.Request) error {
if len(via) >= 10 {
return fmt.Errorf("stopped after 10 redirects")
}
// Guard for direct invocation in tests and any future callers;
// net/http guarantees len(via) >= 1 during actual redirects.
if len(via) == 0 {
return nil
}
prev := via[len(via)-1]
// Reject protocol downgrade: HTTPS->HTTP leaks request metadata over plaintext.
if prev.URL.Scheme == "https" && req.URL.Scheme == "http" {
return fmt.Errorf("refusing redirect: HTTPS to HTTP downgrade (%s -> %s)", prev.URL.Host, req.URL.Host)
}
// Reject cross-host redirect entirely to avoid consuming responses
// from untrusted endpoints.
if req.URL.Host != prev.URL.Host {
return fmt.Errorf("refusing redirect: cross-host (%s -> %s)", prev.URL.Host, req.URL.Host)
}
return nil
}
// NewClient creates a new GitHub API client.
// If baseURL is empty, it defaults to https://api.github.com.
// For GitHub Enterprise, pass the API base URL (e.g. https://github.concur.com/api/v3).
func NewClient(token, baseURL string) *Client {
if baseURL == "" {
baseURL = defaultBaseURL
}
return &Client{
baseURL: strings.TrimRight(baseURL, "/"),
token: token,
httpClient: &http.Client{
Timeout: 30 * time.Second,
CheckRedirect: defaultCheckRedirect,
},
now: time.Now,
}
}
// SetHTTPClient sets the underlying HTTP client used for requests.
// This is intended for test setup only to inject mock transports; it must be
// called before any goroutines issue requests.
//
// Passing nil restores the default client (30s timeout + redirect-rejecting
// CheckRedirect policy matching NewClient).
//
// Callers providing a non-nil client are responsible for configuring a safe
// CheckRedirect policy. Without one, the default net/http behavior will follow
// redirects and may forward the Authorization header to untrusted hosts.
func (c *Client) SetHTTPClient(hc *http.Client) {
if hc == nil {
hc = &http.Client{
Timeout: 30 * time.Second,
CheckRedirect: defaultCheckRedirect,
}
}
c.httpClient = hc
}
// SetRetryBackoff sets the delays between retry attempts.
// This is intended for testing to speed up retry tests.
//
// Note: if an empty non-nil slice is provided, Retry-After delays parsed from
// server responses will be computed and capped but not applied (because
// attempt < len(backoff) is always false). This is acceptable for the
// test-only use case but callers should be aware of this edge case.
func (c *Client) SetRetryBackoff(backoff []time.Duration) {
c.retryBackoff = backoff
}
// parseRetryAfter parses a Retry-After header value, supporting both integer
// seconds (e.g. "120") and HTTP-date format (e.g. "Thu, 01 Dec 2025 16:00:00 GMT")
// as specified in RFC 7231 §7.1.3.
//
// For integer values, it returns the duration directly.
// For HTTP-date values, it computes the delay as the difference between the
// parsed time and now. If the date is in the past, it returns 0.
//
// Returns (0, false) if the value cannot be parsed as either format.
func (c *Client) parseRetryAfter(value string) (time.Duration, bool) {
value = strings.TrimSpace(value)
// Try integer seconds first (most common from GitHub).
// RFC 7231 allows delta-seconds of 0 to indicate immediate retry.
if seconds, err := strconv.Atoi(value); err == nil && seconds >= 0 {
return time.Duration(seconds) * time.Second, true
}
// Try HTTP-date format (RFC 7231 §7.1.3).
// http.ParseTime handles RFC 1123, RFC 850, and ASCTIME formats.
if retryAt, err := http.ParseTime(value); err == nil {
delay := retryAt.Sub(c.now())
if delay < 0 {
delay = 0
}
return delay, true
}
return 0, false
}
// doRequest performs an HTTP request with retry on 429 rate limit responses.
// It respects the Retry-After header when present, supporting both integer
// seconds and HTTP-date formats (capped at maxRetryAfter).
func (c *Client) doRequest(ctx context.Context, method, reqURL string, accept string) ([]byte, error) {
var backoff []time.Duration
if c.retryBackoff != nil {
backoff = append([]time.Duration(nil), c.retryBackoff...)
} else {
backoff = []time.Duration{1 * time.Second, 2 * time.Second}
}
var lastErr error
for attempt := 0; attempt < maxRetryAttempts; attempt++ {
if attempt > 0 {
var delay time.Duration
if attempt-1 < len(backoff) {
delay = backoff[attempt-1]
}
if delay > 0 {
timer := time.NewTimer(delay)
select {
case <-timer.C:
timer.Stop()
case <-ctx.Done():
timer.Stop()
return nil, ctx.Err()
}
}
}
req, err := http.NewRequestWithContext(ctx, method, reqURL, nil)
if err != nil {
return nil, fmt.Errorf("create request: %w", err)
}
req.Header.Set("Authorization", "Bearer "+c.token)
if accept != "" {
req.Header.Set("Accept", accept)
} else {
req.Header.Set("Accept", "application/vnd.github+json")
}
resp, err := c.httpClient.Do(req)
if err != nil {
return nil, fmt.Errorf("do request: %w", err)
}
if resp.StatusCode >= 200 && resp.StatusCode < 300 {
body, err := io.ReadAll(io.LimitReader(resp.Body, maxResponseBodyBytes))
resp.Body.Close()
if err != nil {
return nil, fmt.Errorf("read response body: %w", err)
}
return body, nil
}
errBody, _ := io.ReadAll(io.LimitReader(resp.Body, maxErrorBodyBytes))
resp.Body.Close()
lastErr = &APIError{StatusCode: resp.StatusCode, Body: string(errBody)}
// Retry on 429 rate limit
if resp.StatusCode == http.StatusTooManyRequests && attempt < maxRetryAttempts-1 {
// Check for Retry-After header and override backoff if present.
// Supports both integer seconds (common) and HTTP-date format (RFC 7231).
if ra := resp.Header.Get("Retry-After"); ra != "" {
if delay, ok := c.parseRetryAfter(ra); ok {
if delay > maxRetryAfter {
delay = maxRetryAfter
}
if attempt < len(backoff) {
backoff[attempt] = delay
}
}
}
continue
}
// Don't retry other errors
return nil, lastErr
}
return nil, lastErr
}
// doGet is a convenience wrapper for GET requests with the default Accept header.
func (c *Client) doGet(ctx context.Context, url string) ([]byte, error) {
return c.doRequest(ctx, http.MethodGet, url, "")
}
+513
View File
@@ -0,0 +1,513 @@
package github
import (
"context"
"errors"
"net/http"
"net/http/httptest"
"net/url"
"strings"
"testing"
"time"
)
func TestNewClient_DefaultBaseURL(t *testing.T) {
c := NewClient("tok", "")
if c.baseURL != defaultBaseURL {
t.Errorf("baseURL = %q, want %q", c.baseURL, defaultBaseURL)
}
}
func TestNewClient_CustomBaseURL(t *testing.T) {
c := NewClient("tok", "https://github.concur.com/api/v3/")
if c.baseURL != "https://github.concur.com/api/v3" {
t.Errorf("baseURL = %q, want trailing slash stripped", c.baseURL)
}
}
func TestDoRequest_Success(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if got := r.Header.Get("Authorization"); got != "Bearer test-token" {
t.Errorf("Authorization = %q, want Bearer test-token", got)
}
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"ok":true}`))
}))
defer srv.Close()
c := NewClient("test-token", srv.URL)
body, err := c.doGet(context.Background(), srv.URL+"/test")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if string(body) != `{"ok":true}` {
t.Errorf("body = %q, want %q", body, `{"ok":true}`)
}
}
func TestDoRequest_429_RetryAfter_IntegerSeconds(t *testing.T) {
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
w.Header().Set("Retry-After", "0")
w.WriteHeader(http.StatusTooManyRequests)
w.Write([]byte("rate limited"))
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("success"))
}))
defer srv.Close()
c := NewClient("tok", srv.URL)
c.SetRetryBackoff([]time.Duration{0, 0})
body, err := c.doGet(context.Background(), srv.URL+"/test")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if string(body) != "success" {
t.Errorf("body = %q, want %q", body, "success")
}
if attempts != 2 {
t.Errorf("attempts = %d, want 2", attempts)
}
}
func TestDoRequest_429_RetryAfter_HTTPDate(t *testing.T) {
// Fix "now" to a known time for deterministic testing.
fixedNow := time.Date(2025, 12, 1, 15, 59, 59, 0, time.UTC)
retryAt := "Mon, 01 Dec 2025 16:00:00 GMT" // 1 second in the future
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
w.Header().Set("Retry-After", retryAt)
w.WriteHeader(http.StatusTooManyRequests)
w.Write([]byte("rate limited"))
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("success"))
}))
defer srv.Close()
c := NewClient("tok", srv.URL)
c.now = func() time.Time { return fixedNow }
// Initial backoff is 0; the HTTP-date parser will compute 1s and override.
c.SetRetryBackoff([]time.Duration{0, 0})
body, err := c.doGet(context.Background(), srv.URL+"/test")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if string(body) != "success" {
t.Errorf("body = %q, want %q", body, "success")
}
if attempts != 2 {
t.Errorf("attempts = %d, want 2", attempts)
}
}
func TestDoRequest_429_RetryAfter_HTTPDate_InPast(t *testing.T) {
// If the HTTP-date is in the past, delay should be 0 (retry immediately).
fixedNow := time.Date(2025, 12, 1, 17, 0, 0, 0, time.UTC)
retryAt := "Mon, 01 Dec 2025 16:00:00 GMT" // 1 hour in the past
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
w.Header().Set("Retry-After", retryAt)
w.WriteHeader(http.StatusTooManyRequests)
w.Write([]byte("rate limited"))
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("success"))
}))
defer srv.Close()
c := NewClient("tok", srv.URL)
c.now = func() time.Time { return fixedNow }
c.SetRetryBackoff([]time.Duration{0, 0})
body, err := c.doGet(context.Background(), srv.URL+"/test")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if string(body) != "success" {
t.Errorf("body = %q, want %q", body, "success")
}
}
func TestDoRequest_429_NoRetryAfter_UsesDefaultBackoff(t *testing.T) {
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
w.WriteHeader(http.StatusTooManyRequests)
w.Write([]byte("rate limited"))
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("success"))
}))
defer srv.Close()
c := NewClient("tok", srv.URL)
c.SetRetryBackoff([]time.Duration{0, 0})
body, err := c.doGet(context.Background(), srv.URL+"/test")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if string(body) != "success" {
t.Errorf("body = %q, want %q", body, "success")
}
if attempts != 2 {
t.Errorf("attempts = %d, want 2", attempts)
}
}
func TestDoRequest_429_InvalidRetryAfter_UsesDefaultBackoff(t *testing.T) {
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
if attempts == 1 {
w.Header().Set("Retry-After", "not-a-number-or-date")
w.WriteHeader(http.StatusTooManyRequests)
w.Write([]byte("rate limited"))
return
}
w.WriteHeader(http.StatusOK)
w.Write([]byte("success"))
}))
defer srv.Close()
c := NewClient("tok", srv.URL)
c.SetRetryBackoff([]time.Duration{0, 0})
body, err := c.doGet(context.Background(), srv.URL+"/test")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if string(body) != "success" {
t.Errorf("body = %q, want %q", body, "success")
}
}
func TestDoRequest_404_NoRetry(t *testing.T) {
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
w.WriteHeader(http.StatusNotFound)
w.Write([]byte("not found"))
}))
defer srv.Close()
c := NewClient("tok", srv.URL)
_, err := c.doGet(context.Background(), srv.URL+"/test")
if err == nil {
t.Fatal("expected error, got nil")
}
if !IsNotFound(err) {
t.Errorf("expected IsNotFound, got %v", err)
}
if attempts != 1 {
t.Errorf("attempts = %d, want 1 (no retry on 404)", attempts)
}
}
func TestDoRequest_401_NoRetry(t *testing.T) {
attempts := 0
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
attempts++
w.WriteHeader(http.StatusUnauthorized)
w.Write([]byte("unauthorized"))
}))
defer srv.Close()
c := NewClient("tok", srv.URL)
_, err := c.doGet(context.Background(), srv.URL+"/test")
if err == nil {
t.Fatal("expected error, got nil")
}
if !IsUnauthorized(err) {
t.Errorf("expected IsUnauthorized, got %v", err)
}
if attempts != 1 {
t.Errorf("attempts = %d, want 1 (no retry on 401)", attempts)
}
}
func TestDoRequest_ContextCanceled(t *testing.T) {
// This test exercises the timer-cancel path in the retry select:
// select { case <-timer.C; case <-ctx.Done() }
// The server returns 429 with a long Retry-After, and we cancel the
// context shortly after the first response so that cancellation races
// against the timer rather than preventing the initial HTTP round-trip.
requestReceived := make(chan struct{}, 1)
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
select {
case requestReceived <- struct{}{}:
default:
}
w.Header().Set("Retry-After", "10")
w.WriteHeader(http.StatusTooManyRequests)
}))
defer srv.Close()
c := NewClient("tok", srv.URL)
c.SetRetryBackoff([]time.Duration{10 * time.Second, 10 * time.Second})
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// Cancel the context after the first request completes, while the
// client is blocked in the retry timer select.
go func() {
<-requestReceived
// Small delay to ensure we're inside the timer select.
time.Sleep(50 * time.Millisecond)
cancel()
}()
_, err := c.doGet(ctx, srv.URL+"/test")
if err == nil {
t.Fatal("expected error, got nil")
}
if !errors.Is(err, context.Canceled) {
t.Errorf("err = %v, want context.Canceled", err)
}
}
func TestParseRetryAfter_IntegerSeconds(t *testing.T) {
c := NewClient("tok", "")
delay, ok := c.parseRetryAfter("42")
if !ok {
t.Fatal("expected ok=true")
}
if delay != 42*time.Second {
t.Errorf("delay = %v, want 42s", delay)
}
}
func TestParseRetryAfter_ZeroSeconds(t *testing.T) {
c := NewClient("tok", "")
delay, ok := c.parseRetryAfter("0")
if !ok {
t.Fatal("expected ok=true for zero seconds (RFC 7231 allows immediate retry)")
}
if delay != 0 {
t.Errorf("delay = %v, want 0", delay)
}
}
func TestParseRetryAfter_NegativeSeconds(t *testing.T) {
c := NewClient("tok", "")
_, ok := c.parseRetryAfter("-5")
if ok {
t.Error("expected ok=false for negative seconds")
}
}
func TestParseRetryAfter_HTTPDate_Future(t *testing.T) {
fixedNow := time.Date(2025, 12, 1, 15, 59, 50, 0, time.UTC)
c := NewClient("tok", "")
c.now = func() time.Time { return fixedNow }
delay, ok := c.parseRetryAfter("Mon, 01 Dec 2025 16:00:00 GMT")
if !ok {
t.Fatal("expected ok=true")
}
// Should be 10 seconds in the future.
if delay != 10*time.Second {
t.Errorf("delay = %v, want 10s", delay)
}
}
func TestParseRetryAfter_HTTPDate_Past(t *testing.T) {
fixedNow := time.Date(2025, 12, 1, 17, 0, 0, 0, time.UTC)
c := NewClient("tok", "")
c.now = func() time.Time { return fixedNow }
delay, ok := c.parseRetryAfter("Mon, 01 Dec 2025 16:00:00 GMT")
if !ok {
t.Fatal("expected ok=true")
}
if delay != 0 {
t.Errorf("delay = %v, want 0 (past date)", delay)
}
}
func TestParseRetryAfter_RFC850_Format(t *testing.T) {
fixedNow := time.Date(2025, 12, 1, 15, 59, 50, 0, time.UTC)
c := NewClient("tok", "")
c.now = func() time.Time { return fixedNow }
// RFC 850 format
delay, ok := c.parseRetryAfter("Monday, 01-Dec-25 16:00:00 GMT")
if !ok {
t.Fatal("expected ok=true for RFC 850 format")
}
if delay != 10*time.Second {
t.Errorf("delay = %v, want 10s", delay)
}
}
func TestParseRetryAfter_Invalid(t *testing.T) {
c := NewClient("tok", "")
_, ok := c.parseRetryAfter("not-valid")
if ok {
t.Error("expected ok=false for invalid value")
}
}
func TestParseRetryAfter_EmptyString(t *testing.T) {
c := NewClient("tok", "")
_, ok := c.parseRetryAfter("")
if ok {
t.Error("expected ok=false for empty string")
}
}
func TestParseRetryAfter_MaxCap(t *testing.T) {
// Verify that parseRetryAfter returns the raw value (capping is done by caller).
c := NewClient("tok", "")
delay, ok := c.parseRetryAfter("3600")
if !ok {
t.Fatal("expected ok=true")
}
if delay != 3600*time.Second {
t.Errorf("delay = %v, want 3600s (caller is responsible for capping)", delay)
}
}
func TestAPIError_Error_Truncation(t *testing.T) {
longBody := make([]byte, 300)
for i := range longBody {
longBody[i] = 'x'
}
apiErr := &APIError{StatusCode: 500, Body: string(longBody)}
msg := apiErr.Error()
if len(msg) > 250 {
// "HTTP 500: " (10) + 200 + "...(truncated)" (14) = 224
t.Errorf("error message too long: %d chars", len(msg))
}
}
func TestAPIError_Error_NewlineSanitized(t *testing.T) {
apiErr := &APIError{StatusCode: 400, Body: "line1\nline2\rline3"}
msg := apiErr.Error()
for _, c := range msg {
if c == '\n' || c == '\r' {
t.Errorf("error message contains unsanitized newline: %q", msg)
break
}
}
}
func TestNewClient_HasCheckRedirect(t *testing.T) {
c := NewClient("secret-token", "https://api.github.com")
if c.httpClient.CheckRedirect == nil {
t.Fatal("expected CheckRedirect to be set")
}
}
func TestDefaultCheckRedirect_RejectsHTTPSToHTTP(t *testing.T) {
prev := &http.Request{URL: &url.URL{Scheme: "https", Host: "api.github.com", Path: "/foo"}}
req := &http.Request{
URL: &url.URL{Scheme: "http", Host: "api.github.com", Path: "/foo"},
Header: http.Header{"Authorization": []string{"Bearer token"}},
}
err := defaultCheckRedirect(req, []*http.Request{prev})
if err == nil {
t.Fatal("expected error on HTTPS->HTTP redirect")
}
if !strings.Contains(err.Error(), "HTTPS to HTTP downgrade") {
t.Errorf("unexpected error message: %v", err)
}
}
func TestDefaultCheckRedirect_RejectsCrossHost(t *testing.T) {
prev := &http.Request{URL: &url.URL{Scheme: "https", Host: "api.github.com", Path: "/foo"}}
req := &http.Request{
URL: &url.URL{Scheme: "https", Host: "objects.githubusercontent.com", Path: "/bar"},
Header: http.Header{"Authorization": []string{"Bearer token"}},
}
err := defaultCheckRedirect(req, []*http.Request{prev})
if err == nil {
t.Fatal("expected error on cross-host redirect")
}
if !strings.Contains(err.Error(), "cross-host") {
t.Errorf("unexpected error message: %v", err)
}
}
func TestDefaultCheckRedirect_AllowsSameHost(t *testing.T) {
prev := &http.Request{URL: &url.URL{Scheme: "https", Host: "api.github.com", Path: "/foo"}}
req := &http.Request{
URL: &url.URL{Scheme: "https", Host: "api.github.com", Path: "/bar"},
Header: http.Header{"Authorization": []string{"Bearer token"}},
}
err := defaultCheckRedirect(req, []*http.Request{prev})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
// Auth should be preserved on same-host redirect
if auth := req.Header.Get("Authorization"); auth != "Bearer token" {
t.Errorf("expected Authorization to be preserved, got %q", auth)
}
}
func TestDefaultCheckRedirect_AllowsSameHostHTTPToHTTP(t *testing.T) {
prev := &http.Request{URL: &url.URL{Scheme: "http", Host: "localhost:8080", Path: "/foo"}}
req := &http.Request{
URL: &url.URL{Scheme: "http", Host: "localhost:8080", Path: "/bar"},
Header: http.Header{},
}
err := defaultCheckRedirect(req, []*http.Request{prev})
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}
func TestDefaultCheckRedirect_RejectsTooManyRedirects(t *testing.T) {
via := make([]*http.Request, 10)
for i := range via {
via[i] = &http.Request{URL: &url.URL{Scheme: "https", Host: "api.github.com", Path: "/"}}
}
req := &http.Request{URL: &url.URL{Scheme: "https", Host: "api.github.com", Path: "/final"}}
err := defaultCheckRedirect(req, via)
if err == nil {
t.Fatal("expected error after 10 redirects")
}
if !strings.Contains(err.Error(), "10 redirects") {
t.Errorf("unexpected error message: %v", err)
}
}
func TestDefaultCheckRedirect_EmptyViaAllowed(t *testing.T) {
req := &http.Request{URL: &url.URL{Scheme: "https", Host: "api.github.com", Path: "/foo"}}
err := defaultCheckRedirect(req, nil)
if err != nil {
t.Fatalf("unexpected error with empty via: %v", err)
}
}
func TestSetHTTPClient_NilRestoresDefault(t *testing.T) {
c := NewClient("token", "https://api.github.com")
c.SetHTTPClient(nil)
if c.httpClient == nil {
t.Fatal("expected non-nil httpClient after SetHTTPClient(nil)")
}
if c.httpClient.Timeout != 30*time.Second {
t.Errorf("expected 30s timeout, got %v", c.httpClient.Timeout)
}
if c.httpClient.CheckRedirect == nil {
t.Fatal("expected CheckRedirect policy after SetHTTPClient(nil)")
}
}
+1 -1
View File
@@ -2,4 +2,4 @@ module gitea.weiker.me/rodin/review-bot
go 1.26.2
require gopkg.in/yaml.v3 v3.0.1
require github.com/goccy/go-yaml v1.19.2
+2 -4
View File
@@ -1,4 +1,2 @@
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
github.com/goccy/go-yaml v1.19.2 h1:PmFC1S6h8ljIz6gMRBopkjP1TVT7xuwrButHID66PoM=
github.com/goccy/go-yaml v1.19.2/go.mod h1:XBurs7gK8ATbW4ZPGKgcbrY1Br56PdM69F7LkFRi1kA=
+146 -38
View File
@@ -5,12 +5,15 @@ import (
"embed"
"encoding/json"
"fmt"
"io"
"os"
"sort"
"strings"
"unicode/utf8"
"gopkg.in/yaml.v3"
"github.com/goccy/go-yaml"
"github.com/goccy/go-yaml/ast"
"github.com/goccy/go-yaml/parser"
)
//go:embed personas/*.yaml
@@ -118,9 +121,7 @@ func ListBuiltinPersonas() []string {
default:
continue
}
if !seen[personaName] {
seen[personaName] = true
}
seen[personaName] = true
}
names := make([]string, 0, len(seen))
for name := range seen {
@@ -142,10 +143,19 @@ func parsePersona(data []byte, source string) (*Persona, error) {
err = unmarshalYAMLWithDepthLimit(data, &p, MaxYAMLDepth)
} else {
// Use json.Decoder with DisallowUnknownFields for consistency with
// YAML's KnownFields(true) - both reject unknown fields to catch typos.
// YAML's Strict() - both reject unknown fields to catch typos.
dec := json.NewDecoder(bytes.NewReader(data))
dec.DisallowUnknownFields()
err = dec.Decode(&p)
if err == nil {
// Reject trailing content after the first valid JSON object.
// Without this check, input like `{"name":"x"}garbage` would
// silently succeed because Decoder stops after one object.
var dummy json.RawMessage
if err2 := dec.Decode(&dummy); err2 != io.EOF {
err = fmt.Errorf("unexpected trailing content after JSON object")
}
}
}
if err != nil {
return nil, fmt.Errorf("parse persona %s: %w", source, err)
@@ -156,70 +166,164 @@ func parsePersona(data []byte, source string) (*Persona, error) {
return &p, nil
}
// unmarshalYAMLWithDepthLimit unmarshals YAML data with explicit depth limiting
// and strict field checking. This protects against stack exhaustion from deeply
// nested structures and catches typos in field names.
// Multi-document YAML files are rejected to prevent silent data loss.
// unmarshalYAMLWithDepthLimit unmarshals YAML data with three safety checks:
// - Depth limiting: rejects AST trees exceeding maxDepth to prevent stack exhaustion.
// - Multi-document rejection: prevents silent data loss from ignored extra documents.
// - Strict field checking: rejects unknown YAML keys to catch typos early.
func unmarshalYAMLWithDepthLimit(data []byte, out any, maxDepth int) error {
// First pass: decode into a yaml.Node to check depth limits and node counts.
// This prevents stack exhaustion before we attempt to decode into structs.
var node yaml.Node
dec := yaml.NewDecoder(bytes.NewReader(data))
if err := dec.Decode(&node); err != nil {
// First pass: parse into AST to check depth limits, node counts, and
// multi-document rejection. This prevents stack exhaustion before we
// attempt to decode into structs.
file, err := parser.ParseBytes(data, 0)
if err != nil {
return err
}
// Reject empty YAML input (whitespace-only, comment-only, or truly empty files).
// The parser returns a single doc with nil body for these cases.
if len(file.Docs) == 0 || file.Docs[0].Body == nil {
return fmt.Errorf("empty YAML document")
}
// Reject multi-document YAML files - silently ignoring additional documents
// could lead to confusing behavior where users think their changes take effect.
var extra yaml.Node
if dec.Decode(&extra) == nil {
if len(file.Docs) > 1 {
return fmt.Errorf("multi-document YAML is not supported; only single-document files are allowed")
}
nodeCount := 0
if err := checkYAMLDepth(&node, 0, maxDepth, MaxYAMLNodes, make(map[*yaml.Node]struct{}), &nodeCount); err != nil {
if err := checkYAMLDepth(file.Docs[0].Body, 0, maxDepth, MaxYAMLNodes, make(map[ast.Node]int), make(map[ast.Node]bool), &nodeCount); err != nil {
return err
}
// Second pass: decode with strict field checking enabled.
// KnownFields(true) rejects unknown keys, catching typos like "focuss" or "identiy".
// We must re-decode from the original data because yaml.Node.Decode() doesn't
// support the KnownFields option.
strictDec := yaml.NewDecoder(bytes.NewReader(data))
strictDec.KnownFields(true)
return strictDec.Decode(out)
// Strict() rejects unknown keys, catching typos like "focuss" or "identiy".
//
// Safety note: goccy/go-yaml's decoder does not expand YAML aliases
// recursively — it resolves them via the pre-built AST, which our first
// pass already depth-checked. Alias chains that would exceed depth limits
// are caught above; the decoder merely reads the resolved scalar values.
dec := yaml.NewDecoder(bytes.NewReader(data), yaml.Strict())
return dec.Decode(out)
}
// checkYAMLDepth recursively checks that YAML nodes don't exceed the depth limit
// or the total node count limit. It also detects alias cycles to prevent infinite
// recursion from crafted YAML with self-referential aliases.
func checkYAMLDepth(node *yaml.Node, depth, maxDepth, maxNodes int, seen map[*yaml.Node]struct{}, nodeCount *int) error {
// checkYAMLDepth recursively checks that YAML AST nodes don't exceed the depth
// limit or the total node count limit. It uses two tracking maps:
// - validated: maps each node to the maximum depth at which it was previously
// checked. If a node is revisited at a deeper depth (e.g., via an alias),
// we re-check it to ensure the combined effective depth doesn't exceed limits.
// - visiting: per-path recursion stack for true cycle detection. A node on the
// current path is a cycle (alias loop); we return nil to avoid infinite recursion.
//
// This design prevents the alias depth bypass where an anchored subtree validated
// at a shallow depth could be referenced via alias at a greater depth, effectively
// exceeding MaxYAMLDepth.
func checkYAMLDepth(node ast.Node, depth, maxDepth, maxNodes int, validated map[ast.Node]int, visiting map[ast.Node]bool, nodeCount *int) error {
if node == nil {
return nil
}
if depth > maxDepth {
return fmt.Errorf("YAML nesting depth exceeds maximum (%d)", maxDepth)
}
// Cycle detection: if we're currently visiting this node on the current
// recursion path, it's a cycle (e.g., alias pointing to an ancestor).
// Return nil to break the cycle without error — cycles are a structural
// property, not a depth violation.
if visiting[node] {
return nil
}
// Track total nodes visited as defense-in-depth against wide-but-shallow attacks.
// Placed after cycle detection but before the depth-aware short-circuit. This means
// nodes revisited at shallower depths (via aliases) are counted each time they are
// encountered — intentional conservative overcounting. This bounds the total work
// performed during validation rather than tracking unique nodes, which is the safer
// security posture for untrusted YAML input.
*nodeCount++
if *nodeCount > maxNodes {
return fmt.Errorf("YAML node count exceeds maximum (%d)", maxNodes)
}
// Cycle detection: if we've seen this node before, we're in a cycle.
if _, ok := seen[node]; ok {
return nil // Already validated this subtree, skip to avoid infinite recursion.
// Depth-aware short-circuit: skip re-validation only when the current visit
// depth is the same or shallower than the depth at which this node was
// previously validated. A shallower (or equal) current depth means the
// prior, deeper validation already covered any subtree depth violations.
// If the current depth exceeds the previous validation depth (e.g., an alias
// references this node deeper in the tree), we must re-traverse to ensure
// the combined effective depth doesn't exceed maxDepth.
//
// Note: using ast.Node (interface) as map key relies on pointer identity,
// which is correct because all goccy/go-yaml AST node types are pointer
// receivers (*MappingNode, *SequenceNode, etc.), never value types.
if prevDepth, ok := validated[node]; ok && depth <= prevDepth {
return nil
}
seen[node] = struct{}{}
validated[node] = depth
// Handle alias nodes: follow the alias to its anchor target.
// Increment depth when following aliases since they expand the effective structure.
if node.Kind == yaml.AliasNode && node.Alias != nil {
return checkYAMLDepth(node.Alias, depth+1, maxDepth, maxNodes, seen, nodeCount)
}
// Mark as visiting (on the current recursion path) for cycle detection.
visiting[node] = true
defer func() { visiting[node] = false }()
for _, child := range node.Content {
if err := checkYAMLDepth(child, depth+1, maxDepth, maxNodes, seen, nodeCount); err != nil {
// Walk children based on node type.
switch n := node.(type) {
case *ast.MappingNode:
for _, value := range n.Values {
if err := checkYAMLDepth(value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
return err
}
}
case *ast.MappingValueNode:
// Both Key and Value are visited at depth+1 relative to this
// MappingValueNode. Since MappingNode visits its MappingValueNode
// children at depth+1 as well, keys and values end up at depth+2
// from the parent MappingNode. This is intentional: it mirrors the
// actual nesting structure (mapping → key-value pair → key/value).
if err := checkYAMLDepth(n.Key, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
return err
}
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
return err
}
case *ast.SequenceNode:
for _, value := range n.Values {
if err := checkYAMLDepth(value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
return err
}
}
case *ast.AliasNode:
// Follow alias to its target, incrementing depth since aliases expand
// the effective structure.
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
return err
}
case *ast.AnchorNode:
// Increment depth for anchor values as a conservative measure: the
// anchor definition itself is structural, and treating it as a depth
// level ensures that deeply nested anchors are caught at definition
// time rather than only when referenced via alias. This +1 is
// asymmetric with alias (which also increments) — by design, the
// effective depth budget for anchored-then-aliased content is reduced
// because both the definition site and the reference site each consume
// a level, making deeply nested anchor/alias pairs hit the limit sooner.
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
return err
}
case *ast.TagNode:
if err := checkYAMLDepth(n.Value, depth+1, maxDepth, maxNodes, validated, visiting, nodeCount); err != nil {
return err
}
case *ast.MergeKeyNode:
// MergeKeyNode represents the literal "<<" merge key token. It has no
// child nodes — the value side of a merge (e.g., *alias) lives in the
// parent MappingValueNode.Value, which is already recursed into above.
// Explicitly listed here (rather than in the default case) to prevent
// future library changes from silently bypassing depth checks.
default:
// Scalar leaf nodes (StringNode, IntegerNode, FloatNode, BoolNode,
// NullNode, InfinityNode, NanNode, LiteralNode) have no children to
// recurse into.
}
return nil
}
@@ -227,7 +331,11 @@ func checkYAMLDepth(node *yaml.Node, depth, maxDepth, maxNodes int, seen map[*ya
// ParsePersonaBytes parses persona data from bytes with a source label for errors.
// This is useful for parsing personas fetched from external sources (e.g., Gitea API)
// without requiring filesystem access. Format is detected by source extension.
// Input is bounded by MaxPersonaFileSize to prevent resource exhaustion.
func ParsePersonaBytes(data []byte, source string) (*Persona, error) {
if len(data) > MaxPersonaFileSize {
return nil, fmt.Errorf("persona data from %s exceeds maximum size (%d bytes, limit %d)", source, len(data), MaxPersonaFileSize)
}
return parsePersona(data, source)
}
+222 -41
View File
@@ -7,7 +7,7 @@ import (
"strings"
"testing"
"gopkg.in/yaml.v3"
"github.com/goccy/go-yaml/ast"
)
func TestLoadBuiltinPersona(t *testing.T) {
@@ -459,7 +459,14 @@ func TestYAMLDeeplyNestedRejection(t *testing.T) {
path := filepath.Join(dir, "deeply-nested.yaml")
// Build a deeply nested YAML structure that exceeds MaxYAMLDepth (20).
// Each level adds 2 to the depth count (key + value mapping).
// Depth accumulation trace for "nested: \n level0: \n level1: ...":
// - Document root parsed at depth 0
// - Root MappingNode children (MappingValueNodes) visited at depth 1
// - "nested" MappingValueNode: key at depth 2, value at depth 2
// - Each levelN adds depth via MappingValueNode traversal (key + value)
// - Exact depth per level depends on AST structure (MappingNode wrapping),
// but 25 levels reliably exceeds MaxYAMLDepth (20) with comfortable margin.
// The test uses 25 levels rather than exactly 21 to avoid brittleness.
var sb strings.Builder
sb.WriteString("name: test\nidentity: test\nnested:\n")
indent := " "
@@ -483,6 +490,35 @@ func TestYAMLDeeplyNestedRejection(t *testing.T) {
}
}
func TestYAMLEmptyFileRejection(t *testing.T) {
tests := []struct {
name string
content string
}{
{"completely_empty", ""},
{"whitespace_only", " \n\n "},
{"comment_only", "# just a comment\n"},
}
for _, tc := range tests {
t.Run(tc.name, func(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, tc.name+".yaml")
if err := os.WriteFile(path, []byte(tc.content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Fatal("expected error for empty YAML input, got nil")
}
if !strings.Contains(err.Error(), "empty YAML document") {
t.Errorf("expected error containing %q, got: %v", "empty YAML document", err)
}
})
}
}
func TestYAMLFileSizeLimit(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "huge.yaml")
@@ -504,41 +540,41 @@ func TestYAMLFileSizeLimit(t *testing.T) {
func TestYAMLAliasCycleDetection(t *testing.T) {
// Test that our checkYAMLDepth function handles alias cycles gracefully
// by using the seen map to prevent infinite recursion.
// We test this directly because go-yaml's parser handles most cycles
// at parse time, but we need to ensure our checker is robust.
// by using the visiting map to prevent infinite recursion.
// Create a node structure where an alias points to a parent node,
// simulating what could happen with malicious input that bypasses
// go-yaml's cycle detection.
parent := &yaml.Node{
Kind: yaml.MappingNode,
Content: []*yaml.Node{
{Kind: yaml.ScalarNode, Value: "name"},
{Kind: yaml.ScalarNode, Value: "test"},
{Kind: yaml.ScalarNode, Value: "nested"},
// simulating what could happen with crafted input.
parent := &ast.MappingNode{
Values: []*ast.MappingValueNode{
{
Key: &ast.StringNode{Value: "name"},
Value: &ast.StringNode{Value: "test"},
},
},
}
// Create a child that aliases back to the parent (artificial cycle)
aliasToParent := &yaml.Node{
Kind: yaml.AliasNode,
Alias: parent,
aliasToParent := &ast.AliasNode{
Value: parent,
}
parent.Content = append(parent.Content, aliasToParent)
parent.Values = append(parent.Values, &ast.MappingValueNode{
Key: &ast.StringNode{Value: "nested"},
Value: aliasToParent,
})
nodeCount := 0
seen := make(map[*yaml.Node]struct{})
validated := make(map[ast.Node]int)
visiting := make(map[ast.Node]bool)
// This should NOT hang or stack overflow - the seen map prevents infinite recursion
err := checkYAMLDepth(parent, 0, MaxYAMLDepth, MaxYAMLNodes, seen, &nodeCount)
// This should NOT hang or stack overflow - cycle detection prevents infinite recursion
err := checkYAMLDepth(parent, 0, MaxYAMLDepth, MaxYAMLNodes, validated, visiting, &nodeCount)
if err != nil {
t.Errorf("unexpected error traversing cyclic structure: %v", err)
}
// Verify we tracked the parent in the seen map
if _, ok := seen[parent]; !ok {
t.Error("parent node not tracked in seen map")
// Verify we tracked the parent in the validated map
if _, ok := validated[parent]; !ok {
t.Error("parent node not tracked in validated map")
}
}
@@ -594,36 +630,82 @@ func TestYAMLNodeCountLimit(t *testing.T) {
func TestCheckYAMLDepthCycleDetectionDirect(t *testing.T) {
// Direct test of cycle detection in checkYAMLDepth by creating
// a node structure with an artificial cycle.
// This tests the seen map logic independent of go-yaml's parsing.
node := &yaml.Node{
Kind: yaml.MappingNode,
Content: []*yaml.Node{
{Kind: yaml.ScalarNode, Value: "key"},
{Kind: yaml.ScalarNode, Value: "value"},
node := &ast.MappingNode{
Values: []*ast.MappingValueNode{
{
Key: &ast.StringNode{Value: "key"},
Value: &ast.StringNode{Value: "value"},
},
},
}
// Create a cycle by making a child reference the parent
cycleChild := &yaml.Node{
Kind: yaml.AliasNode,
Alias: node, // Points back to the parent
cycleChild := &ast.AliasNode{
Value: node, // Points back to the parent
}
node.Content = append(node.Content,
&yaml.Node{Kind: yaml.ScalarNode, Value: "cyclic"},
cycleChild,
)
node.Values = append(node.Values, &ast.MappingValueNode{
Key: &ast.StringNode{Value: "cyclic"},
Value: cycleChild,
})
nodeCount := 0
seen := make(map[*yaml.Node]struct{})
err := checkYAMLDepth(node, 0, MaxYAMLDepth, MaxYAMLNodes, seen, &nodeCount)
validated := make(map[ast.Node]int)
visiting := make(map[ast.Node]bool)
err := checkYAMLDepth(node, 0, MaxYAMLDepth, MaxYAMLNodes, validated, visiting, &nodeCount)
// Should complete without infinite recursion due to cycle detection
if err != nil {
t.Errorf("unexpected error: %v", err)
}
// The seen map should contain multiple entries
if len(seen) < 2 {
t.Errorf("seen map has %d entries, expected at least 2", len(seen))
// The validated map should contain multiple entries
if len(validated) < 2 {
t.Errorf("validated map has %d entries, expected at least 2", len(validated))
}
}
func TestYAMLAliasDepthBypass(t *testing.T) {
// Test that an anchored subtree first validated at a shallow depth is
// re-checked when referenced via alias at a deeper position. Without the
// depth-aware validated map, the alias reference would skip re-checking
// and allow the effective nesting to exceed MaxYAMLDepth.
dir := t.TempDir()
path := filepath.Join(dir, "alias-depth-bypass.yaml")
// Build YAML with an anchor at shallow depth containing a subtree near the limit,
// then reference it via alias deep enough that effective depth exceeds MaxYAMLDepth.
var sb strings.Builder
sb.WriteString("name: test\nidentity: test\n")
// Create the anchored subtree at depth 1 (key level) that nests 15 levels deep.
sb.WriteString("anchor_key: &deep_anchor\n")
for i := 0; i < 15; i++ {
sb.WriteString(strings.Repeat(" ", i+1))
sb.WriteString(fmt.Sprintf("level%d:\n", i))
}
sb.WriteString(strings.Repeat(" ", 16))
sb.WriteString("leaf: value\n")
// Create a wrapper that nests 6 levels deep, then references the anchor.
// Effective depth at alias target = 6 (wrapper nesting) + 1 (alias) + 15 (subtree) = 22 > 20
sb.WriteString("wrapper:\n")
for i := 0; i < 6; i++ {
sb.WriteString(strings.Repeat(" ", i+1))
sb.WriteString(fmt.Sprintf("n%d:\n", i))
}
sb.WriteString(strings.Repeat(" ", 7))
sb.WriteString("alias_ref: *deep_anchor\n")
if err := os.WriteFile(path, []byte(sb.String()), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Fatal("expected error for alias depth bypass, got nil")
}
if !strings.Contains(err.Error(), "nesting depth exceeds") {
t.Errorf("error = %q, want containing 'nesting depth exceeds'", err.Error())
}
}
@@ -776,3 +858,102 @@ identity: test identity
t.Errorf("Name = %q, want %q", p.Name, "test")
}
}
func TestJSONTrailingContentRejected(t *testing.T) {
tests := []struct {
name string
content string
}{
{
name: "trailing garbage after object",
content: `{"name":"test","identity":"test identity"}garbage`,
},
{
name: "two JSON objects",
content: `{"name":"test","identity":"test identity"}{"name":"other"}`,
},
{
name: "trailing array",
content: `{"name":"test","identity":"test identity"}[]`,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
dir := t.TempDir()
path := filepath.Join(dir, "test.json")
if err := os.WriteFile(path, []byte(tt.content), 0644); err != nil {
t.Fatalf("failed to write test file: %v", err)
}
_, err := LoadPersona(path)
if err == nil {
t.Fatal("expected error for trailing content, got nil")
}
if !strings.Contains(err.Error(), "trailing content") {
t.Errorf("error = %q, want to contain 'trailing content'", err.Error())
}
})
}
}
func TestParsePersonaBytesSizeLimit(t *testing.T) {
// ParsePersonaBytes should reject input exceeding MaxPersonaFileSize
oversized := make([]byte, MaxPersonaFileSize+1)
for i := range oversized {
oversized[i] = 'x'
}
_, err := ParsePersonaBytes(oversized, "oversized.yaml")
if err == nil {
t.Fatal("expected error for oversized input, got nil")
}
if !strings.Contains(err.Error(), "exceeds maximum size") {
t.Errorf("error = %q, want to contain 'exceeds maximum size'", err.Error())
}
// Just under the limit should not trigger size error (may fail parse, but not size)
underLimit := []byte("name: test\nidentity: test persona\n")
p, err := ParsePersonaBytes(underLimit, "valid.yaml")
if err != nil {
t.Fatalf("unexpected error for valid input: %v", err)
}
if p.Name != "test" {
t.Errorf("Name = %q, want %q", p.Name, "test")
}
}
func TestYAMLMergeKeyDepthCheck(t *testing.T) {
// Verify that YAML merge keys (<<: *alias) are properly handled by the
// depth checker. The merge key content is in the MappingValueNode.Value
// (an AliasNode), not in the MergeKeyNode itself.
p, err := ParsePersonaBytes([]byte("name: merge-test\nidentity: test\n"), "merge.yaml")
if err != nil {
t.Fatalf("basic parse failed: %v", err)
}
if p.Name != "merge-test" {
t.Errorf("Name = %q, want %q", p.Name, "merge-test")
}
// Test that deeply nested merge keys still hit depth limit.
// Build YAML with merge key content nested beyond MaxYAMLDepth.
var sb strings.Builder
sb.WriteString("name: deep-merge\nidentity: deep merge persona\n")
sb.WriteString("anchor: &deep\n")
indent := " "
for i := 0; i < MaxYAMLDepth+5; i++ {
sb.WriteString(indent)
sb.WriteString(fmt.Sprintf("level%d:\n", i))
indent += " "
}
sb.WriteString(indent + "leaf: value\n")
sb.WriteString("target:\n <<: *deep\n")
_, err = ParsePersonaBytes([]byte(sb.String()), "deep-merge.yaml")
if err == nil {
t.Fatal("expected error for deeply nested merge key content, got nil")
}
if !strings.Contains(err.Error(), "depth") {
t.Errorf("error = %q, want to contain 'depth'", err.Error())
}
}