# XML External Entities (XXE) ## Rule Disable external entity processing. Disable DTDs. Use safe parser defaults. **Source:** [OWASP XXE Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html) ## What XXE Can Do - **File disclosure**: Read `/etc/passwd`, config files, source code - **SSRF**: Make requests to internal services - **DoS**: Billion laughs attack (exponential entity expansion) - **Port scanning**: Error-based probing of internal ports - **RCE**: In some configurations (PHP expect://) ## Attack Payloads ```xml ]> &xxe; ]> &xxe; ]> &lol9; ``` ## Correct Pattern ```python # Python - defusedxml (recommended) import defusedxml.ElementTree as ET def parse_xml_safe(xml_string: str): """Parse XML with XXE protection.""" return ET.fromstring(xml_string) # Python - standard library with safe settings from xml.etree.ElementTree import XMLParser, parse import xml.etree.ElementTree as ET def parse_xml_manual(xml_string: str): """Manual safe configuration.""" parser = ET.XMLParser() # Python's ElementTree doesn't resolve external entities by default # But always verify your specific library! return ET.fromstring(xml_string, parser=parser) # lxml with safe settings from lxml import etree def parse_xml_lxml(xml_string: str): """lxml with XXE disabled.""" parser = etree.XMLParser( resolve_entities=False, no_network=True, dtd_validation=False, load_dtd=False, ) return etree.fromstring(xml_string.encode(), parser=parser) ``` ## Incorrect Pattern ```python from lxml import etree # Wrong: default lxml settings allow XXE def bad_parse(xml_string: str): return etree.fromstring(xml_string) # Wrong: explicitly enabling dangerous features def bad_parse_2(xml_string: str): parser = etree.XMLParser(resolve_entities=True) return etree.fromstring(xml_string, parser=parser) # Wrong: using xml.dom.minidom without protection from xml.dom.minidom import parseString def bad_parse_3(xml_string: str): return parseString(xml_string) # May be vulnerable # Wrong: SAX parser without disabling features import xml.sax def bad_parse_4(xml_string: str): handler = MyHandler() xml.sax.parseString(xml_string, handler) ``` ## Language-Specific Fixes ### Java ```java // DocumentBuilderFactory DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); dbf.setFeature("http://xml.org/sax/features/external-general-entities", false); dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false); dbf.setXIncludeAware(false); dbf.setExpandEntityReferences(false); // SAXParserFactory SAXParserFactory spf = SAXParserFactory.newInstance(); spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true); spf.setFeature("http://xml.org/sax/features/external-general-entities", false); spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false); ``` ### .NET ```csharp // XmlReader (safe by default in .NET 4.5.2+) XmlReaderSettings settings = new XmlReaderSettings(); settings.DtdProcessing = DtdProcessing.Prohibit; settings.XmlResolver = null; XmlReader reader = XmlReader.Create(stream, settings); // XmlDocument XmlDocument doc = new XmlDocument(); doc.XmlResolver = null; // Disable external resources doc.LoadXml(xmlString); ``` ### PHP ```php // Disable entity loading globally libxml_disable_entity_loader(true); // Use LIBXML options $doc = new DOMDocument(); $doc->loadXML($xml, LIBXML_NOENT | LIBXML_DTDLOAD | LIBXML_DTDATTR); // Actually, better to just not use those flags: $doc->loadXML($xml, LIBXML_NONET); ``` ## When You Need DTDs ```python # If you absolutely need DTD validation (rare): # 1. Allowlist specific DTDs # 2. Fetch DTDs from local filesystem only # 3. Never allow user-controlled DTD URLs ALLOWED_DTDS = { "-//W3C//DTD XHTML 1.0 Strict//EN": "/path/to/local/xhtml1-strict.dtd" } class SafeResolver(etree.Resolver): def resolve(self, system_url, public_id, context): if public_id in ALLOWED_DTDS: return self.resolve_filename(ALLOWED_DTDS[public_id], context) raise ValueError(f"DTD not allowed: {public_id}") ``` ## Edge Cases - SVG files are XML — validate uploads! - SOAP/XML-RPC endpoints are XXE targets - Office documents (DOCX, XLSX) contain XML - Configuration files (Maven pom.xml, Spring beans.xml) - RSS/Atom feeds - SAML assertions - Blind XXE (out-of-band data exfiltration via DNS/HTTP)