Document Object Model DOM |
Quote from IBM Developer Works Elementary Textbook |
The Document Object model (commonly called DOM) defines a set of interfaces for the parsed version of an XML document. The parser reads the entire document and then constructs a tree that resides in memory, and your code can manipulate the tree structure using the DOM interface. You can traverse the tree to see what the original document contains, you can delete several parts of the tree, you can rearrange the tree and add new branches, and so on.
DOM is created by the consortium and is the official recommendation of the association.
DOM provides a rich set of features that you can use to interpret and manipulate XML documents, but use them for a price. When developing the original DOM for an XML document, many of the people on the Xml-dev mailing list raised a few questions about DOM: The DOM constructs the entire document-resident memory tree. If the document is large, it will require a great deal of memory. The DOM creates objects that represent everything in the original document, including elements, text, attributes, and spaces. If you just focus on a small part of the original document, creating objects that will never be used is extremely wasteful. The DOM parser must read the entire document before your code gets control. For very large documents, this can cause significant delays.
These are simply problems caused by the design of the Document object model, and the DOM API is a very useful way to parse XML documents, aside from these issues.
Sax
To solve the DOM problem, the Xml-dev participants (led by David Megginson) created the SAX interface. Several features of sax solve the DOM problem: The SAX parser sends events to your code. It tells you when the parser discovers the start of the element, the end of the element, the text, the beginning or end of the document, and so on. You can decide what events are important to you, and you can decide what type of data structure you want to create to hold the information from those events. If you do not explicitly save data from an event, it is discarded. The SAX parser does not create any objects at all, it simply passes events to your application. If you want to create objects based on those events, this will be done by you. The SAX parser starts sending events at the beginning of parsing. When the parser discovers the start of the document, the start of the element, and the text, the code receives an event. Your application can start building results immediately; you don't have to wait until the entire document is parsed. Even better, if you only find something in your document, the code can throw an exception once it finds what you're looking for. The exception stops the SAX parser, and the code uses the data it finds to do whatever it needs to do.
To be fair, the SAX parser also has some concerns: Sax events are stateless. When the SAX parser finds text in an XML document, it sends an event to your code. The event only gives you the text you find; it does not tell you what elements contain that text. If you want to know this, you must write your own state management code. SAX events are not persistent. If your application needs a data structure to model an XML document, you must write that code yourself. If you need to access data from a SAX event and don't store that data in code, you'll have to parse the document again. SAX is not controlled by a centrally managed organization. Although this has not been a problem so far, some developers will feel more comfortable if SAX is controlled by an organization like the consortium.
What kind of interface is right for you. |
|
To determine which interface is right for you, you need to understand the design essentials for all interfaces, and you need to understand what your application does with the XML document that you will be working on. Consider the following questions to help you find the right approach. Do you want to write your application in Java? Jaxp uses DOM, SAX, and JDOM; If you write code in Java, you should use JAXP to isolate your code from the specifics of the various parser implementations. how the application will be deployed. If your application is going to be deployed as a Java applet, you will want to minimize the number of code you want to download, and don't forget that the SAX parser is smaller than the DOM parser. Also know that using JDOM requires a small amount of code to be written in addition to SAX or DOM parsers. Once you parse an XML document, you need to access that data more than once. If you need to go back and access the parsed version of the XML file, DOM might be the right choice. When a SAX event is triggered, if you need it later, you (the developer) decide to save it in some way. If you need to access an event that has not been saved, you must resolve the file again. And the DOM automatically saves all the data. need only a small amount of content for the XML source file. If you only need a small amount of content in an XML source file, SAX might be the right choice. SAX does not create objects for everything in the source file; you want to determine what is important. With SAX, you check each event to see if it has something to do with your needs, and then process it accordingly. Even better, once you find what you're looking for, your code throws an exception to completely stop the SAX parser. are you working on a machine with little memory? If so, SAX is your best bet, regardless of what other factors you might consider.