security-patterns/open-redirect.md

# Open Redirect

## Rule

Never redirect to user-controlled URLs. Validate against allowlist of destinations.

**Source:** [CWE-601: URL Redirection to Untrusted Site](https://cwe.mitre.org/data/definitions/601.html)

## Why It's Dangerous

- **Phishing**: Victim trusts your domain, clicks link, lands on attacker site
- **OAuth token theft**: Redirect URI manipulation steals auth codes
- **Credential harvesting**: Fake login page after "session expired" redirect
- **Malware distribution**: Your domain reputation used to bypass filters

## Correct Pattern

```python
from urllib.parse import urlparse, urljoin

ALLOWED_HOSTS = {"example.com", "app.example.com"}
ALLOWED_PATHS = {"/dashboard", "/profile", "/settings"}

def safe_redirect(url: str, default: str = "/") -> str:
    """Validate redirect URL, return safe destination."""
    if not url:
        return default

    # Parse the URL
    parsed = urlparse(url)

    # Option 1: Only allow relative paths (safest)
    if parsed.netloc:
        # Has a host component - reject external URLs
        return default

    # Ensure path doesn't escape (e.g., //evil.com)
    if url.startswith("//"):
        return default

    # Validate path against allowlist (if applicable)
    if ALLOWED_PATHS and parsed.path not in ALLOWED_PATHS:
        return default

    return url

def safe_redirect_with_hosts(url: str, default: str = "/") -> str:
    """Allow specific external hosts."""
    if not url:
        return default

    parsed = urlparse(url)

    # Relative URL - safe
    if not parsed.netloc:
        if url.startswith("//"):
            return default
        return url

    # External URL - check allowlist
    if parsed.scheme not in ("http", "https"):
        return default

    if parsed.netloc not in ALLOWED_HOSTS:
        return default

    return url

@app.route("/login")
def login():
    next_url = request.args.get("next", "/dashboard")
    # ... authenticate user ...
    return redirect(safe_redirect(next_url))
```

## Incorrect Pattern

```python
# Wrong: direct redirect from parameter
@app.route("/redirect")
def bad_redirect():
    url = request.args.get("url")
    return redirect(url)  # Attacker: ?url=https://evil.com

# Wrong: checking only prefix
def bad_validate(url):
    return url.startswith("https://example.com")
    # Bypassed by: https://example.com.evil.com

# Wrong: checking only domain presence
def bad_validate_2(url):
    return "example.com" in url
    # Bypassed by: https://evil.com/example.com

# Wrong: using path join incorrectly
def bad_redirect_2(path):
    base = "https://example.com"
    return redirect(urljoin(base, path))
    # urljoin("https://example.com", "//evil.com") = "https://evil.com"

# Wrong: trusting Referer header
@app.route("/back")
def go_back():
    return redirect(request.referrer)  # Attacker-controlled!
```

## Bypass Techniques

```python
# Common bypass attempts to defend against:

bypasses = [
    "//evil.com",                    # Protocol-relative
    "https://evil.com",              # Absolute URL
    "//evil.com/example.com",        # Domain in path
    "https://example.com@evil.com",  # Userinfo
    "https://example.com.evil.com",  # Subdomain
    "/\\evil.com",                   # Backslash
    "/%09/evil.com",                 # Tab character
    "/%0d/evil.com",                 # Carriage return
    "https:evil.com",                # Missing slashes
    "javascript:alert(1)",           # JavaScript URI
    "data:text/html,<script>",       # Data URI
    "\x00https://evil.com",          # Null byte
]

def robust_validate(url: str) -> bool:
    """Defend against common bypasses."""
    if not url:
        return False

    # Normalize
    url = url.strip()

    # Block dangerous schemes
    lower = url.lower()
    if any(lower.startswith(s) for s in ["javascript:", "data:", "vbscript:"]):
        return False

    # Block protocol-relative
    if url.startswith("//"):
        return False

    # Block backslash tricks
    if "\\" in url:
        return False

    # Block whitespace in scheme
    if any(c in url[:10] for c in "\t\r\n"):
        return False

    # Only allow relative paths
    parsed = urlparse(url)
    if parsed.scheme or parsed.netloc:
        return False

    return True
```

## OAuth Redirect URI

```python
# OAuth redirect URIs need EXACT matching
REGISTERED_REDIRECT_URIS = {
    "https://app.example.com/oauth/callback",
    "https://app.example.com/auth/complete",
}

def validate_redirect_uri(uri: str) -> bool:
    """Exact match only - no partial matching!"""
    return uri in REGISTERED_REDIRECT_URIS

# Wrong approaches:
def bad_oauth_validate(uri):
    return uri.startswith("https://app.example.com/")
    # Attacker: https://app.example.com/oauth/callback/../../../evil
    # After normalization: still under app.example.com but different path
```

## Edge Cases

- URL encoding: `%2f` decoded to `/` after validation
- Case sensitivity: `HTTPS://EXAMPLE.COM` vs `https://example.com`
- IPv6 URLs: `http://[::1]/`
- Port numbers: `https://example.com:443` vs `https://example.com`
- Fragment identifiers: `#` portions not sent to server but affect client
- Meta refresh: `<meta http-equiv="refresh" content="0;url=evil.com">`
- JavaScript redirects: `window.location = userInput`