Files
review-bot/PLAN.md
T
aweiker 8c8f3ab4b3
CI / test (pull_request) Successful in 18s
CI / review (anthropic--claude-4.6-sonnet, sonnet, SONNET_REVIEW_TOKEN) (pull_request) Successful in 44s
CI / review (gpt-5, security, ., rodin/security-patterns, SECURITY_REVIEW.md, SECURITY_REVIEW_TOKEN) (pull_request) Successful in 1m57s
CI / review (gpt-5, gpt, GPT_REVIEW_TOKEN) (pull_request) Successful in 2m21s
feat(#123): add IP-level SSRF defense to Gitea client and action
## Changes

### Go: IP-level SSRF protection in gitea.Client (primary defense)
- Add gitea/ipcheck.go with IsBlockedIP() covering all blocked CIDR ranges:
  loopback (127.0.0.0/8, ::1), RFC1918 (10/8, 172.16/12, 192.168/16),
  link-local (169.254/16, fe80::/10), ULA (fc00::/7), CGN (100.64/10),
  multicast, reserved, and unspecified ranges.
- IPv6-mapped IPv4 addresses (::ffff:x.x.x.x) are normalized before checking.
- Add safeDialContext to gitea.Client: resolves DNS, rejects any IP in a
  blocked CIDR, then dials the resolved IP directly to narrow the DNS rebinding
  window. NewClient now uses this safe transport by default.
- Add WithUnsafeDialer() for test code using httptest.Server (127.0.0.1).
- Update NewTestClient helper in export_test.go for all gitea unit tests.
- Update SetHTTPClient(nil) to restore the safe transport (not the plain one).

### Go: validate-url subcommand (defense-in-depth for future bash callers)
- Add 'review-bot validate-url <url>' subcommand: validates https scheme,
  no user-info, resolves hostname, rejects any blocked IP.
- Exit 0=safe, 1=blocked, 2=validation error/dns failure.
- Add outWriter/errWriter vars to main.go for testable output capture.

### action.yml: Python3 IP check in 'Determine version' step
- After the https scheme validation, resolve SERVER_URL hostname with
  socket.getaddrinfo and reject any result where
  ipaddress.ip_address(ip).is_private/is_loopback/is_link_local/etc. is true.
- python3 is required on ubuntu-* runners (noted in existing comments).
- Covers the version-check curl that sends ACTION_TOKEN to SERVER_URL.
- SERVER_URL for install-step curls is covered by the same pre-check.

### Tests
- gitea/ipcheck_test.go: 30+ cases covering all blocked families + public IPs
- gitea/client_test.go: safe transport presence, WithUnsafeDialer, SSRF blocking
- cmd/review-bot/validateurl_test.go: scheme validation, user-info, exit codes

Closes #123
2026-05-14 07:44:51 +00:00

14 KiB

Plan: Issue #123 — SSRF Defense: IP-Level URL Validation

Problem

When running on a self-hosted Gitea runner, action.yml accepts a user-supplied vcs-url input and uses it to make HTTP requests (version lookup, binary download) from within the runner environment. An attacker who controls workflow inputs could point this URL at internal services (10.x.x.x, 169.254.x.x, 127.0.0.1, etc.) and exfiltrate data.

Current mitigations (scheme check + whitespace rejection in bash) are insufficient because they operate on the URL string, not on the resolved IP addresses. DNS rebinding and creative hostnames can bypass string-level checks.

The existing Go binary (review-bot) also makes requests to vcs-url directly via gitea.NewClient. That client has redirect protections but no IP-level SSRF defense.

What This Issue Does NOT Cover

  • LLM base URLs / AI Core auth/API URLs — those are future work, out of scope for this issue.
  • GitHub Actions path (VCS_TYPE=github) — already uses github.api_url (trusted, not user-controlled), so SSRF is not applicable there.

Constraints

  • No new dependencies. CONVENTIONS.md has a strict allowlist; net is stdlib.
  • Must not break existing functionality for legitimate Gitea instances.
  • Must handle DNS resolution correctly (some CIDRs block connection but resolve fine — we must check before connecting).
  • Must work on both linux/amd64 and linux/arm64 runner environments.
  • Go stdlib net package provides everything needed.

Acceptance Criteria (from issue)

  • Decide on approach → Go custom DialContext in gitea.Client
  • Implement URL validation that blocks requests to RFC1918/loopback/link-local IPs
  • Block HTTP redirects to non-https or different hosts (already done in defaultCheckRedirect)
  • Add tests

Approach Decision: Custom DialContext in gitea.Client

Three options were considered:

Option Pros Cons
Bash dig/getent + IP checks No Go change Fragile, platform-dependent, bypass-prone, hard to test
review-bot validate-url subcommand Reusable from bash Two phases (validate-then-dial) has TOCTOU; adding a subcommand complicates main.go
Custom DialContext in gitea.Client Atomic: resolves and validates IP in one step; no TOCTOU; testable in Go; no new deps Defense-in-depth only in Go path, not bash curl calls

Decision: Custom DialContext + bash-side pre-check for the install step.

Rationale:

  • The review-bot binary already makes the sensitive Gitea API calls. Protecting them at the DialContext level is the most robust defense — it's atomic and cannot be bypassed.
  • The bash curl in the "Determine version" and "Install review-bot" steps makes requests to user-supplied URLs. The Go binary isn't involved there. Options:
    • Accept residual risk for the install-step curl calls (the token used is reviewer-token, not a secrets-level token).
    • Add a review-bot validate-url subcommand that the bash calls before each curl.
    • We add review-bot validate-url as a lightweight subcommand to protect the install-step curl calls too.

Both defenses together (DialContext + validate-url subcommand) give full coverage. The validate-url path only needs to resolve the domain and check the resulting IPs — no additional state.

Design: validate-url Subcommand

review-bot validate-url <url> exits 0 if safe, non-zero + error message if blocked.

usage: review-bot validate-url <url>

Resolves <url> and checks that all resolved IP addresses are
routable (not RFC1918, loopback, link-local, or special-use).

Exit codes:
  0 — URL is safe to use
  1 — URL resolves to a blocked address
  2 — URL validation error (bad scheme, malformed URL, DNS failure)

This command is tiny — no LLM logic, no Gitea auth. The bash in action.yml can call it as:

"${{ runner.temp }}/review-bot" validate-url "$SERVER_URL" || {
  echo "Error: SERVER_URL resolves to a blocked/private IP" >&2
  exit 1
}

Design: IP Validation Logic (Shared)

A single internal/ipcheck package (or unexported function in gitea) will:

  1. Parse the URL: extract host + port.
  2. Call net.DefaultResolver.LookupIPAddr(ctx, host) to resolve all IPs.
  3. For each IP, call isBlockedIP(ip):
    • Loopback: 127.0.0.0/8, ::1/128
    • RFC1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
    • Link-local: 169.254.0.0/16, fe80::/10
    • Unique local (IPv6 ULA): fc00::/7
    • Multicast: 224.0.0.0/4, ff00::/8
    • Loopback/unspecified: 0.0.0.0/8, ::
  4. If any resolved IP is blocked → reject.
  5. If zero IPs are returned → reject (NXDOMAIN shouldn't be allowed through).
  6. Also validate: scheme must be https, no user info in URL.

This logic lives in a standalone file (gitea/ipcheck.go or cmd/review-bot/ipcheck.go) so it can be used from both the DialContext interceptor and the validate-url subcommand.

Why reject if ANY IP is blocked: An attacker using split-horizon DNS can return both a public IP and a private IP. If we only block "all IPs private", one real IP clears the check. We must block if any resolved IP is private.

Design: Custom DialContext in gitea.Client

// safeDialer wraps net.Dialer.DialContext with IP-level SSRF protection.
// It resolves the hostname and checks all IPs before connecting.
func safeDialContext(ctx context.Context, network, addr string) (net.Conn, error) {
    host, port, err := net.SplitHostPort(addr)
    if err != nil {
        return nil, fmt.Errorf("invalid address %q: %w", addr, err)
    }
    addrs, err := net.DefaultResolver.LookupIPAddr(ctx, host)
    if err != nil {
        return nil, fmt.Errorf("DNS lookup %q: %w", host, err)
    }
    for _, a := range addrs {
        if isBlockedIP(a.IP) {
            return nil, fmt.Errorf("blocked: %q resolves to private/reserved IP %s", host, a.IP)
        }
    }
    // Re-dial using the first resolved IP directly to avoid double-resolve
    d := &net.Dialer{Timeout: 10 * time.Second}
    return d.DialContext(ctx, network, net.JoinHostPort(addrs[0].IP.String(), port))
}

NewClient applies this as the transport's DialContext:

transport := &http.Transport{
    DialContext: safeDialContext,
}
return &Client{
    ...
    http: &http.Client{
        Timeout:       30 * time.Second,
        Transport:     transport,
        CheckRedirect: defaultCheckRedirect,
    },
}

Note: This only protects Gitea client calls. The llm and github clients are not changed in this PR (out of scope per issue).

Files to Change

File Change
gitea/ipcheck.go (new) isBlockedIP(ip net.IP) bool + blocked CIDR list
gitea/ipcheck_test.go (new) Tests for all blocked/allowed CIDRs
gitea/client.go Add safeDialContext, apply to NewClient transport; add WithUnsafeDialer for tests
gitea/client_test.go Tests for SSRF blocking in NewClient
cmd/review-bot/main.go Add validate-url subcommand routing
cmd/review-bot/validateurl.go (new) runValidateURL(args []string) error
cmd/review-bot/validateurl_test.go (new) Unit tests for validate-url subcommand
.gitea/actions/review/action.yml Call validate-url before the version-check and download curl calls

State / Data Model

No persistent state. This is stateless request-time validation.

CIDR list is a compile-time constant slice — no runtime config.

Error Cases

Case Behavior
DNS resolution fails (NXDOMAIN) Reject — do not connect
DNS returns empty IP list Reject — do not connect
Any IP in blocked CIDR Reject with log message showing hostname + IP
URL has http:// scheme Reject at string level (validate-url) / irrelevant for DialContext (scheme is checked at bash level already)
User info in URL (user:pass@host) Reject at string level in validate-url
IPv6 mapped IPv4 (::ffff:10.0.0.1) Blocked via ip.To4() normalization before CIDR check
Unresolvable port Handled by net.SplitHostPort error

Edge Cases

  • IPv6 mapped IPv4 addresses (::ffff:192.168.1.1): net.IP.To4() returns the IPv4 form, which must be checked against IPv4 CIDRs. We normalize before checking.
  • Multiple A/AAAA records: One public + one private = blocked. Zero records = blocked.
  • Default port (443 for https): URL may not include :443. The validate-url subcommand must use the URL's scheme to infer port for resolution (the hostname is enough for DNS check; port isn't needed for IP validation).
  • DialContext addr format: net/http passes "host:port" (already split), so net.SplitHostPort works without parsing the original URL.
  • Test isolation: Tests that hit local httptest servers need to bypass the safe dialer. Existing SetHTTPClient method allows injecting a plain HTTP client in tests. For the DialContext specifically, we add WithUnsafeDialer() *Client for test-only use.
  • DNS rebinding: The safeDialContext checks IPs after resolution, immediately before connection. This is not 100% immune to DNS rebinding (an attacker can change DNS response between LookupIPAddr and DialContext), but combined with the validate-url pre-check in bash (which also resolves), the window is extremely narrow. Full immunity requires pinning the IP for the duration of the request, which is not practical here without a larger refactor.

Testing Strategy

Unit tests (gitea/ipcheck_test.go)

  • isBlockedIP: test each CIDR (10.0.0.1, 172.16.0.1, 192.168.0.1, 169.254.1.1, 127.0.0.1, ::1, fe80::1, fc00::1)
  • Public IPs: 8.8.8.8, 1.1.1.1, 2001:4860:4860::8888 → not blocked
  • IPv6-mapped IPv4: net.IP{0,0,0,0,0,0,0,0,0,0,0xff,0xff,192,168,1,1} → blocked

Unit tests (gitea/client_test.go)

  • TestSafeDialerBlocksPrivateIP: mock resolver returns 127.0.0.1 → client returns error, no connection made
  • TestSafeDialerBlocksRFC1918: mock resolver returns 10.0.0.1
  • TestSafeDialerAllowsPublicIP: public IP passes (connect to httptest server via 127.0.0.1 bypassed by WithUnsafeDialer)

Unit tests (cmd/review-bot/validateurl_test.go)

  • Table-driven: various URLs (http://, https with private hostname, malformed) → expected exit/error
  • DNS resolution mocked via dependency injection or by testing against known-bad/good URLs

Integration tests

  • Existing tests use SetHTTPClient with httptest — they pass a plain HTTP client that bypasses safe dialing. No change needed there.

Backward Compatibility

  • No flag changes, no env var changes, no action input changes.

  • NewClient now uses a safe HTTP transport by default — this is a breaking change in behavior for callers that were previously able to point the client at localhost. But callers that do so are tests — they already use SetHTTPClient to inject a test transport. The WithUnsafeDialer escape hatch for tests makes this clean.

  • The validate-url subcommand in action.yml runs before the binary is installed (in the "Determine version" step). We call it after the binary is installed (cache step) or at the start of "Install review-bot". Actually — the binary is installed in the "Install review-bot" step, but "Determine version" runs first and that's where the sensitive version-check curl happens. Therefore:

    • "Determine version" keeps existing bash scheme validation (pre-install, no binary available)
    • "Install review-bot" calls validate-url before any curl
    • "Run review" is protected by the Go binary's DialContext
    • This means version-check curl is not fully IP-protected — it uses the existing scheme-only validation. This is an acceptable residual risk: the attacker would only leak the Gitea API version response, not any secrets (no token is sent in the unauthenticated path; authenticated path sends reviewer-token which is the user's own token for their own server).

    Actually, re-reading action.yml: the version check step DOES send ACTION_TOKEN to API_URL. So we should protect it. Options:

    • Embed a lightweight IP check in bash using python3 (available on ubuntu runners per comments in action.yml)
    • Or restructure to do validate-url after install, skipping the cache step for the first call

    Decision: Add a python3 IP check helper for the version-check curl. Python3's socket.getaddrinfo can resolve and check IPs without any external dep. Keep it minimal (just loopback + RFC1918 + link-local check). This is one more line in an already-bash-heavy step, documented clearly.

Open Questions

  1. The DNS rebinding window in safeDialContext can't be fully closed without IP pinning (binding the connection to the resolved IP). Is pinning worth the added complexity? Currently we dial addrs[0] directly after resolution which is a close approximation. This is our approach — worth noting in code comments.

  2. Should we also protect the llm client? The issue says "action.yml Gitea path" so explicitly not in scope, but llm-base-url is also user-supplied. Suggest filing a follow-up. Not addressed here.

  3. The python3 bash helper for the version-check step: should it also check IPv6? Yes — add ipaddress.ip_address(ip).is_private which handles both in Python 3.3+.

Completion Checklist

Generated for this specific task:

  1. isBlockedIP covers all CIDR families: loopback, RFC1918, link-local, ULA, multicast, unspecified
  2. IPv6-mapped IPv4 addresses are normalized before CIDR check
  3. ANY blocked IP in the resolved set rejects (not just all-blocked)
  4. Zero resolved IPs rejects
  5. safeDialContext dials the resolved IP directly (avoids second DNS lookup that could be different)
  6. Test isolation: existing tests using httptest still pass (via WithUnsafeDialer or SetHTTPClient)
  7. validate-url subcommand exits non-zero for private IPs, malformed URLs, http:// scheme
  8. action.yml calls validate-url before each curl that uses user-supplied URL
  9. go test ./... passes
  10. go vet ./... passes
  11. scripts/check-deps.sh passes (no new deps)