Dom4j is an open source XML parsing package produced by dom4j.org. Its website defines it as follows:
Dom4j is an easy to use, open source library for working with XML, XPath and
XSLT on the Java platform using the Java collections framework and with full
Support for Dom, Sax and JAXP.
Dom4j is an easy-to-use and open-source library for XML, XPath, and XSLT. It is applied to the Java platform and uses Java
The Collection framework fully supports Dom, sax, and JAXP.
Dom4j is easy to use. You can use it as long as you understand basic XML-DOM models. However, his own guide
There is only one page (HTML), but it is quite complete. There are few Chinese documents in China. So I am writing this short tutorial
For your convenience, this articleArticleI will only talk about the basic usage. If you need to use it in depth, please ...... Find other resources by yourself
Material.
IBM developerCommunity(See the appendix) to compare the performance of some XML parsing packages.
Dom4j has outstanding performance and ranks top in multiple tests. (In fact, this is also referenced in dom4j's official documentation.
So in this project, I used dom4j as an XML parsing tool.
JDOM is widely used as a parser in China. The two are good at its length, but the biggest feature of dom4j is that a large number
Is considered more flexible than JDOM. Did the master say that "interface-oriented programming ". Currently
More and more dom4j applications are available. If you are good at using JDOM, you may wish to continue using it. Just look at this article as an example.
For comparison, if you are about to adopt a parser, use dom4j.
Its main interfaces are defined in the org. dom4j package:
Attribute
Attribute defines XML attributes.
Branch
Branch defines a public
Common behaviors,
CDATA
CDATA defines the xml cdata Region
Characterdata
Characterdata is an excuse to identify character-based nodes. Such as CDATA, comment, text.
Comment
Comment defines the XML annotation Behavior
Document
Defines XML documents
Documenttype
Documenttype defines XML doctype Declaration
Element
Element defines XML elements
Elementhandler
Elementhandler defines the processor of the Element Object
Elementpath
Used by elementhandler to obtain the path level information currently being processed
Entity
Entity defines XML Entity
Node
Node defines polymorphism for all XML nodes in dom4j.
Nodefilter
Nodefilter defines the behavior of a filter or predicate generated in the dom4j node (predicate)
Processinginstruction
Processinginstruction defines XML processing instructions.
Text
Text defines XML text nodes.
Visitor
Visitor is used to implement the visitor mode.
Xpath
After analyzing a string, XPath provides an XPATH expression.
You can see what their names mean.
To understand this interface, you must understand the inheritance relationship of the interface:
A. Interface java. Lang. cloneable
A. Interface org. dom4j. Node
A. Interface org. dom4j. Attribute
B. Interface org. dom4j. Branch
A. Interface org. dom4j. Document
B. Interface org. dom4j. Element
A. Interface org. dom4j. characterdata
A. Interface org. dom4j. CDATA
B. Interface org. dom4j. Comment
C. Interface org. dom4j. Text
B. Interface org. dom4j. documenttype
C. Interface org. dom4j. Entity
D. Interface org. dom4j. processinginstruction
A lot of things are clear at a glance. Most of them are inherited by node. Know these relationships and write them in the future Program
Classcastexception does not occur.
The following are some examples (part from the documentation provided by dom4j.
1. Read and parse the XML document:
Reading and Writing XML documents mainly depends on the org. dom4j. Io package, which provides two different methods: domreader and saxreader,
The call method is the same. This is the benefit of relying on interfaces.
// Read XML from the file, input the file name, and return the XML file
Public document read (string filename) throws
Malformedurlexception, documentexception {
Saxreader reader = new saxreader ();
Document document = reader. Read (new file (filename ));
Return document;
}
The reader's read method is overloaded and can be read from multiple sources, such as inputstream, file, and URL.
. The resulting document object contains the entire XML table.
According to my own experience, the character encoding read is converted according to the encoding defined in the XML file header. Note that
Make sure that the names of the codes are consistent.
2. Get the root node
The second step after reading is to get the root node. Anyone familiar with XML knows that all XML analysis is based on the root element.
.
Public element getrootelement (document DOC ){
Return Doc. getrootelement ();
}
3. traverse the XML tree
Dom4j provides at least three methods to traverse nodes:
1) iterator)
// Enumerate all subnodes
For (iterator I = root. elementiterator (); I. hasnext ();){
Element element = (element) I. Next ();
// Do something
}
// Enumerate nodes named foo
For (iterator I = root. elementiterator (FOO); I. hasnext ();){
Element Foo = (element) I. Next ();
// Do something
}
// Enumeration attribute
For (iterator I = root. attributeiterator (); I. hasnext ();){
Attribute attribute = (attribute) I. Next ();
// Do something
}
2) Recursion
Iterator can also be used as an enumeration method for recursion.
Public void treewalk (){
Treewalk (getrootelement ());
}
Public void treewalk (element ){
For (INT I = 0, size = element. nodecount (); I <size; I ++)
{
Node node = element. node (I );
If (node instanceof element ){
Treewalk (element) node );
} Else {// do something ....
}
}
}
3) Visitor Mode
The most exciting thing is dom4j's support for visitor, which can be greatly reduced.CodeAnd easy to understand. Understanding Design Patterns
As we all know, visitor is one of the gof design patterns. The main principle is that the two types retain each other's references, and
One way is to access many visitable as a visitor. Let's take a look at the visitor mode in dom4j (not mentioned in the quick document
Supply)
You only need to customize a class to implement the visitor interface.
Public class myvisitor extends visitorsupport {
Public void visit (element ){
System. Out. println (element. getname ());
}
Public void visit (attribute ATTR ){
System. Out. println (ATTR. getname ());
}
} Call: Root. Accept (New myvisitor ())
The visitor interface provides multiple visit () Overloading methods. Different objects in XML are accessed in different ways.
The above is a simple implementation of element and attribute, which are usually used. Visitorsupport is
Dom4j provides the default adapter and the default adapter mode of the visitor interface.
Visit (*) Empty implementation to simplify the code.
Note that this visitor automatically traverses all sub-nodes. For root. Accept (myvisitor ),
Subnode. When I used it for the first time, I thought it was necessary to traverse it by myself, and then call visitor in recursion. The results can be imagined.
4. Support for xpath
Dom4j has good support for xpath. If you access a node, you can directly select it using XPath.
Public void bar (document ){
List list = Document. selectnodes (// Foo/bar );
Node node = Document. selectsinglenode (// Foo/BAR/author );
String name = node. valueof (@ name );
}
For example, if you want to find all the hyperlinks in the XHTML document, the following code can be implemented:
Public void findlinks (document) throws into entexception
{
List list = Document. selectnodes (// A/@ href );
For (iterator iter = List. iterator (); ITER. hasnext ();){
Attribute attribute = (attribute) ITER. Next ();
String url = attribute. getvalue ();
}
}
5. Conversion of strings and XML
Sometimes strings are often converted to XML or vice versa,
// Convert XML to the string document = ...;
String text = Document. asxml ();
// Convert string to XML
String text = <person> <Name> James </Name> </person>;
Document document = incluenthelper. parsetext (text );
6. Use XSLT to convert XML
Public document styledocument (
Document document,
String stylesheet
) Throws exception {
// Load the transformer using JAXP
Transformerfactory factory = transformerfactory. newinstance ();
Transformer transformer = factory. newtransformer (
New streamsource (stylesheet)
);
// Now lets style the given document
Documentsource source = new documentsource (document );
Documentresult result = new documentresult ();
Transformer. Transform (source, result );
// Return the transformed document
Document transformeddoc = result. getdocument ();
Return transformeddoc;
}
7. Create XML
XML is usually created before writing files, which is as easy as stringbuffer.
Public document createdocument (){
Document document = incluenthelper. createdocument ();
Element root = Document. addelement (Root );
Element author1 =
Root
. Addelement (author)
. Addattribute (name, James)
. Addattribute (location, UK)
. Addtext (James Strachan );
Element author2 =
Root
. Addelement (author)
. Addattribute (name, Bob)
. Addattribute (location, US)
. Addtext (Bob McWhirter );
Return document;
}
8. file output
A simple output method is to output a document or any node through the write method.
Filewriter out = new filewriter (FOO. XML );
Document. Write (out );
If you want to change the output format, for example, beautify the output or reduce the format, you can use the xmlwriter class public void
Write (document) throws ioexception {
// Specify the file
Xmlwriter writer = new xmlwriter (
New filewriter (output. XML)
);
Writer. Write (document );
Writer. Close ();
// Beautify the format
Outputformat format = outputformat. createprettyprint ();
Writer = new xmlwriter (system. Out, format );
Writer. Write (document );
// Reduce the format
Format = outputformat. createcompactformat ();
Writer = new xmlwriter (system. Out, format );
Writer. Write (document );
}
Dom4j is simple enough. Of course, some complicated applications have not been mentioned, such as elementhandler. If you are tempted
Then use dom4j together.