Char2 XML
Parser: Reads a file, confirms that it has the correct format, and then breaks it down into elements that enable programmers to access these elements.
The Java library provides two types of XML parsers: Dom and sax, the Document object model and the flow mechanism parser.
The DOM is not suitable for processing too long XML, taking into account the memory consumption.
If you are interested in only some of the elements in the XML and do not care about the context, consider using Sax.
The interface of the DOM parser has been standardized, and the Org.w3c.dom package contains definitions of these interface types, such as document and element. Different organizations provide DOM parsers that implement these interfaces, such as Apache and IBM. We can use either of these parsers in a plug-in way through the JAXP (Java API for XML processing) library. The JDK itself also has its own DOM parser. This is used in this chapter. Therefore, we can achieve the purpose of using the parser simply by implementing the above interfaces or classes.
Here's how to read in an XML document:
Documentbuilderfactory factory = Documentbuilderfactory.newinstance ();
Documentbuilder builder = Factory.newdocumentbuilder (); This is the parser object.
The book says three kinds of sources of XML, File, URL, InputStream:
FIle f = ...
Document doc = Builder.parse (f);
URL u = ...
Document doc = builder.parse (u);
InputStream in = ...
Document doc = Builder.parse (in);
It is important to note that if you use InputStream as the input source, the parser will not be able to locate documents such as DTDs when the XML is useful to references to the location of the XML as a relative path, such as a DTD. You need to install an entity resolver to fix this problem.
The next step is to parse the various parts of the Document object, see this figure:
For example, the following document is processed:
<?xml version= "1.0"?>
<font>
<name>Zhangsan</name>
<size>33</size>
</font>
Element root = Doc.getdocumentelement (); Returns the root element font
Root.gettagname (); Returns the string "Font"
To get the child elements of the element:
NodeList children = Root.getchildnodes ();
for (int i=0;i<children.getlength (); i++) {
Node child = Children.item (i);
...
}
Which, GetLength () =5 instead of 2, why? Because the child element and the main element between, child elements and child elements between the space also forget. If you only want to get child elements, you can do this:
NodeList children = Root.getchildnodes ();
for (int i=0;i<children.getlength (); i++) {
Node child = Children.item (i);
if (child instanceof Element) {
Element childelement = (element) child;
...
}
}
This processing is still very troublesome, this is why the subsequent introduction of the DTD, the DTD can be used to standardize the content of XML, reduce some unnecessary validation process.
Let's look at the diagram above, which includes the blank sub-element, which is the text type, and, in addition, found? The value of name and size is also the text type, so how do you get Zhangsan, 33 and two values? Nature is handled by objects of type text:
for (int i=0;i<children.getlength (); i++) {
Node child = Children.item (i);
if (child instanceof Element) {
Element childelement = (element) child;
Text Textnode = (text) childelement.getfirstchild ();
String text = Textnode.getdata (). Trim ();
if (Childelement.gettagname (). Equals ("name")) {
name = text;
}else if (Childelement.gettagname (). Equals (size)) {
Size = Integer.parseint (text);
}
}
}
The upper trim () is used to avoid whitespace generated in the following format:
<size>
33
</size>
In this case, the parser will include all line breaks and spaces in the text.
The last part is to get the element attribute object in XML, directly on the code:
Namenodemap attributes = Element.getattributes ();
for (int i=0;i<attributes.getlength (); i++) {
Node attribute = Attributes.item (i);
String name = Attribute.getnodename (); Property name
String value = Attribute.getnodevalue (); Property value
}
Java Core Technology II reading notes