Intended use of xmlEncodeEntitiesReentrant vs xmlEncodeSpecialChars

narwahl · January 30, 2024, 3:52am

These two functions look very similar with respect to encoding special characters, except that xmlEncodeEntitiesReentrant handles non-ASCII characters explicitly and checks whether the doc is HTML. Also, the documentation entry for xmlEncodeSpecialChars seems to be missing punctuation and possibly missing some words:

Do a global encoding of a string, replacing the predefined entities this routine is reentrant, and result must be deallocated.

What is the intended use case for each of these two functions? Or, phrased differently, why do we have both?

Related question: Why doesn’t xmlEncodeEntitiesReentrant replace '"' and '\'' (double quote and apostrophe)? These are predefined entities. Likewise, xmlEncodeSpecialChars replaces only '"' (double quote). These could break attributes whose values are enclosed by single quotes.

nwellnhof · January 30, 2024, 12:17pm

Regarding the intention, you’d have to ask the person who wrote these functions 20+ years ago if they remember. From what I can tell, UTF-8 wasn’t as ubiquitous back then and there was a need to generate ASCII-only output.

All we can do now is to document the exact behavior of these functions more clearly.

narwahl · January 30, 2024, 7:02pm

That makes sense and I can certainly sympathize about unclear intent of inherited code… I think both functions are fine for text node content, but that neither is generally safe for escaping an attribute value that may contain single or double quotes in addition to '<', '>', '&'. It’s unfortunate if there’s no public function to do that. Are you aware of any?

To clarify, I’m less concerned with “why” things are the way they are, and more concerned with “what” I should be doing to sanitize my XML. Luckily the project I work on rolled their own sanitizer 11 years ago, but I’d prefer to use library functions where possible.

nwellnhof · January 31, 2024, 12:20pm

Are you aware of any?

No, the closest is xmlEncodeSpecialChars. I think it’s good idea to make this function escape single quotes as well.

There’s also xmlEncodeAttributeEntities, but for some reason, this function doesn’t escape quotes at all.

narwahl · January 31, 2024, 8:06pm

No, the closest is xmlEncodeSpecialChars. I think it’s good idea to make this function escape single quotes as well.

Agreed… also, a comment in xmlEncodeEntitiesReentrant says that it escapes double quotes, but it looks like it doesn’t do that.

There’s also xmlEncodeAttributeEntities, but for some reason, this function doesn’t escape quotes at all.

That one’s private, and it treats attributes specially only if the doc is an HTML doc.

system · March 1, 2024, 8:06pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.