Hi everyone,
I’ve been using libxml for 10+ years (since about v2.6) in one of my projects, so some (or a lot) of the things that I’m using might be old school now, but fortunately never had any issues with new libxml versions.
Until recently, when my distro got the package update from 2.13.8 to 2.14.2.
I have my DTD compiled into my project in one of my header files:
#define KC_DTD "\
<!ELEMENT kc (keychain)*> \
\
<!ELEMENT keychain (key)*> \
<!ATTLIST keychain \
name CDATA #REQUIRED \
description CDATA #REQUIRED \
created CDATA #REQUIRED \
modified CDATA #REQUIRED> \
\
<!ELEMENT key EMPTY> \
<!ATTLIST key \
name CDATA #REQUIRED \
value CDATA #REQUIRED \
created CDATA #REQUIRED \
modified CDATA #REQUIRED> \
"
I’m loading this like so - more or less:
xmlParserInputBufferPtr buf = NULL;
xmlDtdPtr dtd = NULL;
buf = xmlParserInputBufferCreateMem(KC_DTD, sizeof(KC_DTD), XML_CHAR_ENCODING_NONE);
dtd = xmlIOParseDTD(NULL, buf, XML_CHAR_ENCODING_NONE);
if (!dtd) {
xmlGenericError(xmlGenericErrorContext, "ERROR: Could not parse kc DTD.\n");
return(0);
}
A couple of printf()s and other shenanigans are missing from the example but the culprit turned out to be xmlIOParseDTD()
.
The error output is:
Entity: line 1: parser error : Content error in the external subset
IRED value CDATA #REQUIRED created CDATA #REQUIRED modified CDATA #REQUIRED>
^
Only thing I could link this with in the libxml changelog was this:
- xmlIOParseDTD doesn’t allow null bytes at the end of the input anymore.
Which as far as I could trace relates to issue #868, and in turn commit 05bd1720.
I’m at a loss trying to figure out how null bytes are related to this in my case. I’ve been trying to reformat my embedded DTD to iron out possible collation/new-line errors introduced by the pre-processor, to no avail.
Looking at a sample XML and the DTD checked with xmllint(1):
test.xml
<?xml version="1.0" encoding="UTF-8"?>
<kc>
<keychain name="default" created="1746443463" modified="1746443463" description="">
</keychain>
</kc>
DTD:
<!ELEMENT kc (keychain)*>
<!ELEMENT keychain (key)*>
<!ATTLIST keychain
name CDATA #REQUIRED
description CDATA #REQUIRED
created CDATA #REQUIRED
modified CDATA #REQUIRED>
<!ELEMENT key EMPTY>
<!ATTLIST key
name CDATA #REQUIRED
value CDATA #REQUIRED
created CDATA #REQUIRED
modified CDATA #REQUIRED>
Test (doesn’t report any issues):
$ xmllint --noout --dtdvalid kc.dtd test.xml
To double check, when I deliberately mess up the XML or the DTD, it checks out:
$ xmllint --noout --dtdvalid kc.dtd regress/test.xml
Document test.xml does not validate against kc.dtd
Am I on the right track in that this change (xmlIOParseDTD doesn’t allow null bytes at the end of the input anymore) actually introduced this? Have I been using this badly the whole time and now that it’s fixed it started to show?
Context:
I’ve had a similar issue with e.g. xmlIOParseDTD()
not free()ing its input buffer although the documentation states: input
will be freed by the function in any case, so I’ve been free()ing it with xmlFreeParserInputBuffer(buf)
since forever. But now with this new libxml version it seems this is actually being free()d and now I get a segfault for a double free() - which is a good thing, I’ll adjust the code, I’m just thinking I might’ve been using this in a way that something’s now been fixed in libxml, my application started to show symptoms of using this the wrong way.
Any help, suggestions would be appreciated,
Daniel