CVE-2026-41675
EPSS 0.02%xmldom has XML node injection through unvalidated processing instruction serialization
Description
## Summary The package allows attacker-controlled processing instruction data to be serialized into XML without validating or neutralizing the PI-closing sequence `?>`. As a result, an attacker can terminate the processing instruction early and inject arbitrary XML nodes into the serialized output. --- ## Details The issue is in the DOM construction and serialization flow for processing instruction nodes. When `createProcessingInstruction(target, data)` is called, the supplied `data` string is stored directly on the node without validation. Later, when the document is serialized, the serializer writes PI nodes by concatenating `<?`, the target, a space, `node.data`, and `?>` directly. That behavior is unsafe because processing instructions are a syntax-sensitive context. The closing delimiter `?>` terminates the PI. If attacker-controlled input contains `?>`, the serializer does not preserve it as literal PI content. Instead, it emits output where the remainder of the payload is treated as live XML markup. The same class of vulnerability was previously addressed for CDATA sections (GHSA-wh4c-j3r5-mjhp / CVE-2026-34601), where `]]>` in CDATA data was handled by splitting. The serializer applies no equivalent protection to processing instruction data. --- ## Affected code **`lib/dom.js` — `createProcessingInstruction` (lines 2240–2246):** ```js createProcessingInstruction: function (target, data) { var node = new ProcessingInstruction(PDC); node.ownerDocument = this; node.childNodes = new NodeList(); node.nodeName = node.target = target; node.nodeValue = node.data = data; return node; }, ``` No validation is performed on `data`. Any string including `?>` is stored as-is. **`lib/dom.js` — serializer PI case (line 2966):** ```js case PROCESSING_INSTRUCTION_NODE: return buf.push('<?', node.target, ' ', node.data, '?>'); ``` `node.data` is emitted verbatim. If it contains `?>`, that sequence terminates the PI in the output stream and the remainder appears as active XML markup. **Contrast — CDATA (line 2945, patched):** ```js case CDATA_SECTION_NODE: return buf.push(g.CDATA_START, node.data.replace(/]]>/g, ']]]]><, which was subsequently closed without being merged. --- ## Fix Applied > **⚠ Opt-in required.** Protection is not automatic. Existing serialization calls remain > vulnerable unless `{ requireWellFormed: true }` is explicitly passed. Applications that pass > untrusted data to `createProcessingInstruction()` or mutate PI nodes with untrusted input > (via `.data =` or `CharacterData` mutation methods) should audit all `serializeToString()` > call sites and add the option. `XMLSerializer.serializeToString()` now accepts an options object as a second argument. When `{ requireWellFormed: true }` is passed, the serializer throws `InvalidStateError` before emitting any ProcessingInstruction node whose `.data` contains `?>`. This check applies regardless of how `?>` entered the node — whether via `createProcessingInstruction` directly or a subsequent mutation (`.data =`, `CharacterData` methods). On `@xmldom/xmldom` ≥ 0.9.10, the serializer additionally applies the full W3C DOM Parsing §3.2.1.7 checks when `requireWellFormed: true`: 1. **Target check**: throws `InvalidStateError` if the PI target contains a `:` character or is an ASCII case-insensitive match for `"xml"`. 2. **Data Char check**: throws `InvalidStateError` if the PI data contains characters outside the XML Char production. 3. **Data sequence check**: throws `InvalidStateError` if the PI data contains `?>`. On `@xmldom/xmldom` ≥ 0.8.13 (LTS), only the `?>` data check (check 3) is applied. The target and XML Char checks are not included in the LTS fix. ### PoC — fixed path ```js const { DOMImplementation, XMLSerializer } = require('@xmldom/xmldom'); const doc = new DOMImplementation().createDocument(null, 'r', null); doc.documentElement.appendChild(doc.createProcessingInstruction('a', '?><z/><?q ')); // Default (unchanged): verbatim — injection present const unsafe = new XMLSerializer().serializeToString(doc); console.log(unsafe); // <r><?a ?><z/><?q ?></r> // Opt-in guard: throws InvalidStateError before serializing try { new XMLSerializer().serializeToString(doc, { requireWellFormed: true }); } catch (e) { console.log(e.name, e.message); // InvalidStateError: The ProcessingInstruction data contains "?>" } ``` The guard catches `?>` regardless of when it was introduced: ```js // Post-creation mutation: also caught at serialization time const pi = doc.createProcessingInstruction('target', 'safe data'); doc.documentElement.appendChild(pi); pi.data = 'safe?><injected/>'; new XMLSerializer().serializeToString(doc, { requireWellFormed: true }); // InvalidStateError: The ProcessingInstruction data contains "?>" ``` ### Why the default stays verbatim The W3C DOM Parsing and Serialization spec §3.2.1.3 defines a `require well-formed` flag whose **default value is `false`**. With the flag unset, the spec explicitly permits serializing PI data verbatim. This matches browser behavior: Chrome, Firefox, and Safari all emit `?>` in PI data verbatim by default without error. Unconditionally throwing would be a behavioral breaking change with no spec justification. The opt-in `requireWellFormed: true` flag allows applications that require injection safety to enable strict mode without breaking existing code. ### Residual limitation `createProcessingInstruction(target, data)` does not validate `data` at creation time. The WHATWG DOM spec (§4.5 step 2) mandates an `InvalidCharacterError` when `data` contains `?>`; enforcing this check unconditionally at creation time is a breaking change and is deferred to a future breaking release. When the default serialization path is used (without `requireWellFormed: true`), PI data containing `?>` is still emitted verbatim. Applications that do not pass `requireWellFormed: true` remain exposed.
Affected packages (3)
- Debian/node-xmldomfrom 0
- npm/xmldomfrom 0, <= 0.6.0
- npm/@xmldom/xmldomfrom 0, < 0.8.13
CVSS scores
| Source | Version | Severity | Vector |
|---|---|---|---|
| osv | CVSS 4.0 | — | CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:H/VA:N/SC:N/SI:N/SA:N |
References (7)
- ADVISORYhttps://nvd.nist.gov/vuln/detail/CVE-2026-41675
- ADVISORYhttps://security-tracker.debian.org/tracker/CVE-2026-41675
- PATCHhttps://github.com/xmldom/xmldom
- WEBhttps://github.com/xmldom/xmldom/commit/7207a4b0e0bcc228868075ed991665ef9f73b1c2
- WEBhttps://github.com/xmldom/xmldom/releases/tag/0.8.13
- WEBhttps://github.com/xmldom/xmldom/releases/tag/0.9.10
- WEBhttps://github.com/xmldom/xmldom/security/advisories/GHSA-x6wf-f3px-wcqx