Files
security-patterns/xxe.md
T
Rodin 17c535bc61 Add session management, CORS, XXE patterns
Complete the security patterns collection:
- session-management.md: fixation, hijacking, secure cookies, concurrent sessions
- cors.md: origin validation, reflected origin attacks, preflight caching
- xxe.md: external entities, DTD attacks, language-specific fixes

Now 19 patterns covering comprehensive web application security.
2026-05-10 23:20:36 -07:00

5.1 KiB

XML External Entities (XXE)

Rule

Disable external entity processing. Disable DTDs. Use safe parser defaults.

Source: OWASP XXE Prevention Cheat Sheet

What XXE Can Do

  • File disclosure: Read /etc/passwd, config files, source code
  • SSRF: Make requests to internal services
  • DoS: Billion laughs attack (exponential entity expansion)
  • Port scanning: Error-based probing of internal ports
  • RCE: In some configurations (PHP expect://)

Attack Payloads

<!-- File disclosure -->
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<data>&xxe;</data>

<!-- SSRF to cloud metadata -->
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&xxe;</data>

<!-- Billion laughs DoS -->
<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!-- ... continues exponentially -->
]>
<lolz>&lol9;</lolz>

Correct Pattern

# Python - defusedxml (recommended)
import defusedxml.ElementTree as ET

def parse_xml_safe(xml_string: str):
    """Parse XML with XXE protection."""
    return ET.fromstring(xml_string)

# Python - standard library with safe settings
from xml.etree.ElementTree import XMLParser, parse
import xml.etree.ElementTree as ET

def parse_xml_manual(xml_string: str):
    """Manual safe configuration."""
    parser = ET.XMLParser()
    # Python's ElementTree doesn't resolve external entities by default
    # But always verify your specific library!
    return ET.fromstring(xml_string, parser=parser)

# lxml with safe settings
from lxml import etree

def parse_xml_lxml(xml_string: str):
    """lxml with XXE disabled."""
    parser = etree.XMLParser(
        resolve_entities=False,
        no_network=True,
        dtd_validation=False,
        load_dtd=False,
    )
    return etree.fromstring(xml_string.encode(), parser=parser)

Incorrect Pattern

from lxml import etree

# Wrong: default lxml settings allow XXE
def bad_parse(xml_string: str):
    return etree.fromstring(xml_string)

# Wrong: explicitly enabling dangerous features
def bad_parse_2(xml_string: str):
    parser = etree.XMLParser(resolve_entities=True)
    return etree.fromstring(xml_string, parser=parser)

# Wrong: using xml.dom.minidom without protection
from xml.dom.minidom import parseString
def bad_parse_3(xml_string: str):
    return parseString(xml_string)  # May be vulnerable

# Wrong: SAX parser without disabling features
import xml.sax
def bad_parse_4(xml_string: str):
    handler = MyHandler()
    xml.sax.parseString(xml_string, handler)

Language-Specific Fixes

Java

// DocumentBuilderFactory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
dbf.setXIncludeAware(false);
dbf.setExpandEntityReferences(false);

// SAXParserFactory
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-general-entities", false);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

.NET

// XmlReader (safe by default in .NET 4.5.2+)
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;
XmlReader reader = XmlReader.Create(stream, settings);

// XmlDocument
XmlDocument doc = new XmlDocument();
doc.XmlResolver = null;  // Disable external resources
doc.LoadXml(xmlString);

PHP

// Disable entity loading globally
libxml_disable_entity_loader(true);

// Use LIBXML options
$doc = new DOMDocument();
$doc->loadXML($xml, LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_DTDATTR);
// Actually, better to just not use those flags:
$doc->loadXML($xml, LIBXML_NONET);

When You Need DTDs

# If you absolutely need DTD validation (rare):
# 1. Allowlist specific DTDs
# 2. Fetch DTDs from local filesystem only
# 3. Never allow user-controlled DTD URLs

ALLOWED_DTDS = {
    "-//W3C//DTD XHTML 1.0 Strict//EN": "/path/to/local/xhtml1-strict.dtd"
}

class SafeResolver(etree.Resolver):
    def resolve(self, system_url, public_id, context):
        if public_id in ALLOWED_DTDS:
            return self.resolve_filename(ALLOWED_DTDS[public_id], context)
        raise ValueError(f"DTD not allowed: {public_id}")

Edge Cases

  • SVG files are XML — validate uploads!
  • SOAP/XML-RPC endpoints are XXE targets
  • Office documents (DOCX, XLSX) contain XML
  • Configuration files (Maven pom.xml, Spring beans.xml)
  • RSS/Atom feeds
  • SAML assertions
  • Blind XXE (out-of-band data exfiltration via DNS/HTTP)