Java XML processing Technology one (parsing XML and surviving XML technology)

Source: Internet
Author: User
Tags cdata gettext xml cdata xml reader xpath

Java XML processing Technology one

XML technology is developed with the development of Java. In the case of XML, the simple data format is usually stored in a text file such as an INI configuration file, and the complex format is in a custom file format, so there is a special parser for each file format. XML solves this problem later, the program is faced with a fixed format of the XML file, as long as the standard API can be processed by the XML file.

XML files are widely used in case systems, such as Clientconfig.xml, serverconfig.xml files are used as XML files to make configuration files, metadata files and metadata loader is inseparable from XML. Therefore, this chapter will be a systematic explanation of the XML file processing technology.

1.1 XML Processing Technology comparison

The technology for XML files in the Java domain is broadly divided into two categories: XML API and oxmapping.  XML API is the basis of XML processing, optional technology including JDOM, DOM4J, etc., oxmapping is the abbreviation of Object-xml Mapping, this technology hides the details of the XML underlying operations, you can map the XML file into a JavaBean Object, you can also save an JavaBean object as an XML file, with optional techniques XStream, digester, Castor, and so on. The relationship between XML API and Oxmapping is similar to that of JDBC and ormaping, oxmapping internal implementation is done using XML API, and the two implementation techniques implement XML processing from different levels.

XML API

The most popular of these XML processing techniques is JDOM and dom4j, which are used in very similar ways. But Dom4j's advantage is more obvious than Jdom:

Dom4j a large number of interfaces, which makes dom4j more flexible and scalable than dom4j;

DOM4J performance is better than JDOM;

DOM4J supports advanced features such as XPath;

Because of these advantages, many open source projects are beginning to use DOM4J to do XML parsing technology, this book will also use DOM4J as the first choice for XML processing.

Oxmapping

Parsing using XML API is a little cumbersome, inspired by ormapping technology, people have invented oxmapping technology, using oxmapping technology, we can map XML files into a JavaBean object, or you can save a JavaBean object into an XML file, which greatly simplifies our development effort, allowing developers to focus more on application-level things.

Many oxmapping frameworks emerge in the open source world, including XStream, Digester, Castor, and so on. XStream and digester the mapping process in code, and Castor needs to write a mapping configuration file similar to Cfg.xml in Hibernate. Compared with digester, the main advantage of XStream is more compact, more convenient to use, but the current use of digester is "open-source brand" Apache under the sub-project, the online can refer to more than XStream information, fortunately XStream more concise, so and does not cause too much impact on the XStream.

Use of 1.2 dom4j

DOM4J is an easy-to-use, open-source library for XML, XPath, and XSLT. It is applied to the Java platform, uses the Java Collection framework, and fully supports DOM, SAX, and JAXP. DOM4J is an open source project on SourceForge.net, with an address of http://sourceforge.net/projects/dom4j.

DOM4J programming based on interface is a very significant advantage, following is the inheritance architecture diagram of its main interface:

Figure 5. 1

Most of these interfaces are defined in the package org.dom4j, the following is a brief description of the meanings of each interface:

Table 5. 1 dom4j Main interface

Node

node is the base type interface for all XML nodes in dom4j

Attribute

Attribute defines the properties of XML

Branch

Branch defines a common behavior for nodes that can contain child nodes, such as XML elements (element) and documents (docuemnts)

Document

Defines the XML document

Element

element defines XML elements

DocumentType

DocumentType defining XML DOCTYPE declarations

Entity

Entity Definition XML Entity

Characterdata

Characterdata is an identification excuse that identifies a character-based node. such as CDATA, Comment, Text

Cdata

CDATA Defines the XML CDATA region

Comment

Comment defines the behavior of XML annotations

Text

Text Definition XML text node

ProcessingInstruction

ProcessingInstruction defining XML processing directives

Reading an XML file

In XML applications, the most common is the parsing of XML files read, DOM4J provides a variety of ways to read XML documents, including DOM tree traversal, Visitor mode and XPath way.

Either way, we'll start by constructing a Document object from an XML file:

Saxreader reader = new Saxreader ();

Document document = Reader.read (new File);

Here we use Saxreader as an XML reader, and we can also choose Domreader as an XML reader:

Saxreader reader = new Domreader ();

Document document = Reader.read (new File);

The Read method of reader has several overloaded methods, which can read XML documents from various sources such as InputStream, File, URL, and so on.

(1) Dom tree traversal

This reads the Dom as a normal tree, to read the value of a node in the XML, as long as the data structure of the tree traversal algorithm to locate the node to be read.

To facilitate the Dom tree, first get the root node of the tree:

Element root = Document.getrootelement ();

After the root node is acquired, it can be read down one level at a time:

Traverse all child nodes

for (Iterator i = Root.elementiterator (); I.hasnext ();)

{

element element = (Element) I.next ();

Do something

}

Traverse a node named "foo"

for (Iterator i = root.elementiterator ("foo"); I.hasnext ();)

{

element foo = (element) I.next ();

Do something

}

Traverse Properties

for (Iterator i = Root.attributeiterator (); I.hasnext ();)

{

Attribute Attribute = (Attribute) i.next ();

Do something

}

(2) Visitor mode

Dom Tree Traversal is the most common and commonly used method of XML reading, and other XML parsing engines, such as JDom, are also used to read XML in this way. But DOM4J provides another way of reading, and that is the Visitor way. This approach implements the Visitor mode, where the caller can just write a Visitor. The Visitor mode makes it easy for visitors to add new operations while allowing visitors to centralize related actions and separate unrelated operations.

The written Visitor must implement the Org.dom4j.Visitor interface, and DOM4J also provides a default adapter org.dom4j.VisitorSupport for Adapter mode.

public class Demovisitor extends Visitorsupport

{

public void visit (element Element)

{

System.out.println (Element.getname ());

}

public void Visit (Attribute attr)

{

System.out.println (Attr.getname ());

}

}

This Visitor can then be called on the node to begin the traversal:

Root.accept (New Demovisitor ())

This approach requires traversing all nodes and elements, so the speed is slightly slower.

(3) XPath mode

The most appealing feature of DOM4J is the integration support for XPath, which is not supported by all XML parsing engines, but it is a very useful feature.

XPath is the language that addresses, searches, and matches parts of a document. It uses path notation to specify and match parts of the document, which are similar to those used in file systems and URLs. For example, xpath:/x/y/z searches for the document's root node x, under which node y exists under node Z. The statement returns all nodes that match the specified path structure. /x/y/* returns any node under the Y node for which the parent node is x. /x/y[@name =a] matches all y nodes of the parent node X, whose properties are called Name, and the property value is a.

XPath greatly simplifies the handling of XML, as long as the user tells the engine what part of the document to match with the matching expression, the exact matching work is done by the XPath engine. This approach is much closer to the natural way of thinking in humans. Let's look at a practical example:

There is an XML file that records the basic situation of a department:

<?xml version= "1.0" encoding= "GB2312"?>

<department>

<name> Development Dept. </name>

<level>2</level>

<employeeList>

<employee number= "001" name= "Tom"/>

<employee number= "002" name= "Jim"/>

<employee number= "003" name= "Lily"/>

</employeeList>

</department>

Name represents the department name, level is the department, and EmployeeList is the employee list for the department. Write a program below to read this file and print out the department's information.

Code 5. 1 XPath Demo

InputStream instream = null;

Try

{

Instream = Dom4jDemo01.class.getResourceAsStream (

"/com/cownew/char0502/department01.xml");

Saxreader reader = new Saxreader ();

Document doc = Reader.read (new InputStreamReader (instream));

Node NameNode = Doc.selectsinglenode ("//department/name");

SYSTEM.OUT.PRINTLN ("department Name:" + namenode.gettext ());

Node Levelnode = Doc.selectsinglenode ("//department/level");

SYSTEM.OUT.PRINTLN ("Departmental level:" + levelnode.gettext ());

List employeenodelist = doc

. selectnodes ("//department/employeelist/employee");

SYSTEM.OUT.PRINTLN ("Department Employee:");

for (int i = 0, n = employeenodelist.size (); i < n; i++)

{

Defaultelement employeeelement = (defaultelement) employeenodelist

. get (i);

String name = Employeeelement.attributevalue ("name");

String number = Employeeelement.attributevalue ("number");

SYSTEM.OUT.PRINTLN (name + ", Work No.:" + number);

}

} finally

{

Resourceutils.close (instream);

}

Operation Result:

Department Name: Development Department

Department Level: 2

Department Employees:

Tom, Work No.: 001

Jim, work number: 002

Lily, Work No.: 003

With XPath, we can directly navigate to a specific node using the very clear way of "//department/name". XPath mode locates a single node using the selectSingleNode method, while locating multiple nodes uses the SelectNodes method.

All XML files in the case system are parsed using XPath methods, including Clientconfig.java, Serverconfig.java, Entitymetadataparser.java, and so on.

Creation of XML files

The creation of XML files in dom4j is similar to other XML engines, first constructing a tree of nodes based on the root node of document and then invoking the corresponding IO class library to save the XML file to the appropriate media.

The following is a demonstration of the process of generating the department information XML file mentioned above:

Code 5. 2 XML Creation Demo

Import Java.io.FileWriter;

Import java.io.IOException;

Import org.dom4j.Document;

Import Org.dom4j.DocumentHelper;

Import org.dom4j.Element;

Import Org.dom4j.io.OutputFormat;

Import Org.dom4j.io.XMLWriter;

public class Dom4jdemo02

{

public static void Main (string[] args)

{

Create a Document Object

Document document = Documenthelper.createdocument ();

Add root node "department"

Element departelement = document.addelement ("department");

Add the "Name" node

Element departnameelement = documenthelper.createelement ("name");

Departnameelement.settext ("Development Department");

Departelement.add (departnameelement);

Add a "Level" node

Element departlevelelement = documenthelper.createelement ("level");

Departlevelelement.settext ("2");

Departelement.add (departlevelelement);

Add Employee List "employeelist" node

Element employeeelementlist = Documenthelper

. createelement ("EmployeeList");

Departelement.add (employeeelementlist);

Add Employee Node "employee"

Element emp1element = documenthelper.createelement ("employee");

Emp1element.addattribute ("number", "001");

Emp1element.addattribute ("name", "Tom");

Employeeelementlist.add (emp1element);

Element emp2element = documenthelper.createelement ("employee");

Emp2element.addattribute ("number", "002");

Emp2element.addattribute ("name", "Jim");

Employeeelementlist.add (emp2element);

Element emp3element = documenthelper.createelement ("employee");

Add Property

Emp3element.addattribute ("Number", "003");

Emp3element.addattribute ("name", "Lily");

Employeeelementlist.add (emp3element);

Try

{

WriteToFile (document, "C:/department.xml");

} catch (IOException e)

{

E.printstacktrace ();

}

}

private static void WriteToFile (document document, String file)

Throws IOException

{

Landscaping format

OutputFormat format = Outputformat.createprettyprint ();

Format.setencoding ("GB2312");

XMLWriter writer = null;

Try

{

writer = new XMLWriter (new FileWriter (file), format);

Writer.write (document);

} finally

{

if (writer! = null)

Writer.close ();

}

}

}

After running it can be in c:/ Found the same department.xml as the contents of the 5.2.1 file.

Here are two points to keep in mind:

(1) OutputFormat format = OutputFormat. Createprettyprint ()

XML is often required to be read, dom4j default generation format is condensed format, this can reduce space consumption, but the disadvantage is that the file format is very ugly, so we use the lock format for output.

(2) format.setencoding ("GB2312")

The Night www.jiangyea.com

DOM4J the default encoding format is "UTF-8", which can be problematic when outputting Chinese characters, so we change to "GB2312" format.

This uses the CreateElement method provided by the Dom4j tool class Documenthelper to create a node that has public static CDATA Createcdata (String text) and public  Methods such as Static Comment createcomment (string text), public static Entity createentity (string name, string text) can help us create nodes faster. Documenthelper also provides a ParseText method that can parse a string directly into a Documen object.

Java XML processing Technology one (parsing XML and surviving XML technology)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.