XML documents are not determined from format to size. Some may have only a few lines, while others have several megabytes. You may wonder if you need to know the size of the XML document. And when performance becomes the primary issue, knowing the size of an XML document is something that must be done.
From a performance perspective, there are two types of methods for processing XML documents. Batch processing takes a short time to parse into a group of documents. The real-time approach is real-time processing of documents. The performance of batch processing can be measured by how many documents are processed over a certain period of time, and the performance of real-time mode is measured in a similar way, but only how long it takes to process a document.
Scenarios Scene
Imagine that you have a system that works in real time, like a Web server. The system needs to receive orders from customers in real time and needs to respond to this order immediately.
This system obviously cannot be carried out in a batch processing manner. Simply estimate, suppose this is a very simple order, only 10 projects, so that the resulting XML document is relatively small, probably each document is 4KB. In this case, the DOM is used to parse the received document.
If your order is only a few hours per hour, then system performance is not a problem for you. But in the long run, the number of orders will one day be large enough to make you realize that system performance must be improved.
Now you start thinking about improving performance to accommodate the increased load. Your order documents are already very small, and merging them into larger documents is of little practical significance. From a vertical point of view, you can improve the existing system processing capacity; From a lateral point of view, you can add more systems to spread the load.
To look at another completely different area, you are now dealing with a large data warehouse. Unlike a Web server, you now use FTP to transfer an XML document with an average size of 300MB. If you still use DOM to parse XML documents, you'll soon get into big trouble. Conversely, if you use sax, it's much better to parse the incoming XML documents directly without having to load them up in memory.
Change document Size
Sometimes you have a special situation that needs to change the size of the XML document. Imagine, just like you did. You have a Web server that processes XML documents in real time, and when all the document sizes are 400MB instead of 4KB, you can't use DOM because it takes up too much memory. But because it's a real-time system, performance is important. You can use sax, but it takes time to allow and a powerful processor.
In this case, you can improve system performance by changing the size of the document. For example, you can divide a 400MB document into 10 40MB, or 40 10MB of small documents, which is more efficient than processing a 400MB document. This way you can use DOM to read the file into memory for processing and respond to requests for each document in a timely manner. You can also clear out irrelevant documents.
There is a similar situation in the batch processing mode. Imagine that you are processing thousands of 4KB of documents in batch mode through the DOM. The best way is to combine 1000 files into a 4MB file. Because the loading of each document requires system time (whether DOM or sax). By merging 1000 documents into one, you only need to load one document, taking up only 1 per thousand of the time.