“xmlReadFile” allocate for 600M of memory but “xmlFreeDoc” did not release it

I have an xml file of 50M, I use libxml2 to parse the xml file like the following code:

int main()
{
    char* scdpath = "/home/sunri/work40/parsexml/c/parsescd/scdxml/JKZSub002.xml";
    xmlDocPtr doc;
    doc = xmlReadFile(scdpath,"UTF-8",XML_PARSE_RECOVER);
    if (NULL == doc) {
        fprintf(stderr,"Document not parsed successfully.");
        return -1;
    }
    xmlFreeDoc(doc);
    getchar();

    return 0;
}

After executing the function “xmlReadFile”, I observed that the memory increased by 650M through the command “top”, but after executing the function “xmlFreeDoc”, I saw that the memory in the TOP command did not change.

I have the following questions:

  1. Why is there only 50M xml file, but the function “xmlReadFile” takes up 650M of memory, is there any way to reduce the memory usage?
  2. I want to release all the memory occupied by parsing the xml file after using the xml file. For example, the function “xmlReadFile” allocates 650M. After using it, I hope to release the 650M memory. How should I use it?

From the code point of view, the memory allocated by “xmlReadFile” is on the stack, and may not be released until the program ends.
Is there a way to allocate all the memory required for parsing xml on the heap, so that users can control the release of memory by themselves? Maybe in this way, the memory application and release can be clearly seen through code single-step debugging.

I tested the following code,

for (i = 0; i < 5; i++) {
  doc = xmlReadFile(scdpath,"UTF-8",XML_PARSE_RECOVER);
  if (NULL == doc) {
    fprintf(stderr,"Document not parsed successfully.");
    return -1;
  }
  xmlFreeDoc(doc);
  xmlCleanupParser();
}

When i=1, after executing xmlFreeDoc, the memory displayed by the top command is reclaimed.

I’m not entirely sure what your looking at here, but I’d note that libc doesn’t neccisarily return memory to the system when free()'d (which would potentialy be an “expensive” operation, and of course it’s quite likely your about to allocate memory again - it just makes sense to cache and repurpose what you’ve already got). You can try messing around with something like malloc_trim to force the runtime to give up memory.

Is it? It gives you a xmlDocPtr when you then stick in doc on the stack - but a pointer to data isn’t the same thing as data.

For one thing you’ve added xmlCleanupParser:

This function name is somewhat misleading. It does not clean up parser state, it cleans up memory allocated by the library itself. It is a cleanup function for the XML library. It tries to reclaim all related global memory allocated for the library processing. It doesn’t deallocate any document related memory. One should call xmlCleanupParser() only when the process has finished using the library and all XML/HTML documents built with it. See also xmlInitParser() which has the opposite function of preparing the library for operations. WARNING: if your application is multithreaded or has plugin support calling this may crash the application if another thread or a plugin is still using libxml2. It’s sometimes very hard to guess if libxml2 is in use in the application, some libraries or plugins may use it without notice. In case of doubt abstain from calling this function or do it just before calling exit() to avoid leak reports from valgrind !

So it sounds like libxml2 caches various things, which of course aren’t neccsarily part of the document (and this not cleared with xmlFreeDoc - perhaps your file references at DTD or something?

thank you.
Generally, after parsing the xml, the application memory caches the text information in the xml file, and there is no need to retain libxml resources such as xml node, and the xml will not be parsed again in the future, so malloc_trim can be called to force the release of the heap memory. I think many scenarios are This is the case, generally after calling libxml to complete the document parsing, save the saved information, and then do not need xml resources, libxml2 should design a function to release the heap memory requested by xmlreadfile.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.