Shallow Java XML programming

Source: Internet
Author: User
Tags format array object end implement object model string trim
Xml| programming

For XML, my understanding is that a standard format is used to hold data. I think XML and HTML are completely different, perhaps just using tags as a reason for document interpretation. So people like to compare XML with HTML. For me personally, XML is a simple and convenient data file. Because. It is different from the general relational database, the data to see the two-dimensional table. The data in the two-dimensional table is obtained by basic relational operation. XML simply treats the data as a document and then gets the data from parsing the document. So my view is that to manipulate an XML document, it's OK to have a document interpreter that interprets the XML. And then the content of the explanation. The data needed to be converted into a program is available. When the operation is completed, the data can be written back. This article mainly introduces two XML document interpreters, and Java-related APIs. They are DOM and sax respectively, and Jdom.


For convenience, we need a simple XML as an example reference to this article: the main purpose of this XML is to store my book information. Including the title, the author of the book, the price of the book, the number of the book. A few basic information,

The XML example is as follows:

<?xml version= "1.0" encoding= "gb2312"?>

<Books>

<book id= "1" >

<bookName>

Programming Zhuji

</bookName>

<bookAuthor>

Jon Bentley

</bookAuthor>

<bookISBN>

7-5083-1914-1

</bookISBN>

<bookPrice>

28.0

</bookPrice>

</Book>


<book id= "2" >

<bookName>

Java Programming Idea (2nd edition)

</bookName>

<bookAuthor>

Bruce Eckel

</bookAuthor>

<bookISBN>

7-111-10441-2

</bookISBN>

<bookPrice>

99.0

</bookPrice>

</Book>


<book id= "3" >

<bookName>

Inside VCL (Deep core VCL architecture analysis)

</bookName>

<bookAuthor>

Levar

</bookAuthor>

<bookISBN>

7-5053-9489-4

</bookISBN>

<bookPrice>

80.0

</bookPrice>

</Book>

</Books>


The XML records three books and their related information.

Working with XML documents in general can be divided into three steps.

1. Create an XML interpreter

2. Establishing associations through XML interpreter and XML file

3. Interpreting XML tags through XML interpreter


The type of the XML interpreter. Basically can be divided into:

? Validation and non-validation parsers

? A parser that supports one or more XML Schema languages

? Parser that supports Document Object Model (DOM)

? Parser that supports simple APIs for XML (SAX)


This is mainly about the latter two kinds of interpreter.


Document Object Model (DOM) Interpreter:

The DOM interpreter is a standard interpreter API developed by the official consortium of the Web. As long as the programming interface that conforms to this standard can be used to manipulate XML. The interpreter currently has three main levels. Level1 Level2 Level3 The only discussion here is that the Level2.dom model actually transforms the data in the XML file into a tree in memory. The tree has roughly doument note nodelist element. And Dom is responsible for analyzing the structure of the tree, It then explains the role of the XML document by interpreting the tree. It is noteworthy that this is due to the operation of the tree structure. So for some cases the interpreter is not clear. For example, the attribute value:

Id= "1"

Id=

"1"

Id= ' 1 '


This is exactly the same for DOM. Because. There is no explanation for the format Dom of the document. It only transforms the data in the document into a tree, and the original document format does not appear in the DOM.


 

Simple API for XML (SAX) parser

Sax interpreter, can be said to be a specific operation left to the programmer and the interpretation of the work of a programming model. It does not load the entire XML document into memory as it does to the DOM, but it interprets it by line and then notifies the program by event, which is used by specific programs and then processed, This is like writing event-driven code. Thus sax is applicable in terms of memory occupancy rate and interpretation efficiency. For specific environments Dom and sax have their specific application facets.

Sax events consist primarily of

Start of document Startdocument events

Startelement and EndElement for elements

For characters events that handle character text

And the end Enddocment event of the document


JDOM API

Dom and sax provide very rich functionality. But it also has a huge burden for developers. Because of the complexity, the open source community has launched Jason Hunter and Brett McLaughlin Two Java experts launched Jdom as a simpler api.jdom that will provide adapters to select specific XML interpreters behind. It also provides a tree structure to handle XML, And Jdom's tree is more brief. So it takes a bit of Dom and sax. Of course, it also reduces flexibility. Therefore, it is also a good choice to use jdom simply.


Three kinds of APIs on the


DOM Model:

Need to have an overall understanding of the entire document structure

Situations where the document structure needs to be modified

Situations where you need to refer to the document multiple times

Sax Model:

Memory occupancy requirements are limited

Only need to read the XML part element

Only need to refer to an XML document once


Introduction to the XML Document Object Model (DOM):

Dom is launched by the consortium to provide a good analytical structure for manipulating markup in XML. This model and the HTML-like tag analysis model. It provides a set of operational interfaces, Can be implemented by each specific platform. Therefore, you can select any application that satisfies the programming interface provided by the DOM interpreter. It will be easy to migrate to other platforms that implement the DOM programming interface.

For the standard DOM. The interface it provides is a set of objects that can describe the tree structure of an XML document. And the methods through which these objects can manipulate XML documents. The application simply invokes these objects as well as the object operations. This is equivalent to manipulating the XML file. This simplifies the operation of the XML file. So that means to familiarize yourself with the DOM programming model, it's important that we familiarize ourselves with the objects and methods provided by DOM. .


Let's familiarize ourselves with the objects and methods that DOM provides:

Document:

This object describes the entire XML document. That's the whole tree. For a tree it can be represented as a whole object that contains many nodes. And these nodes are collectively referred to as node.


Node:

node is a relatively abstract concept, so node is described as an interface in the Java definition. The interface extends several sub-interfaces.

Element: Representing elements in a document

Attribute: Represents a property in an XML file

Text: Representing the words in a node

Other node types include: Comment represents annotations in an XML file, processinginstruction representations, and cdatasection represents a CDATA section. These interested readers can refer to the relevant documentation. Here is not a detailed explanation.


The following methods are often used to process the DOM:


Document.getdocumentelement (): Returns the root of the DOM tree. (This function is a method of the Document interface and does not define other Node subtypes.) )

Node.getfirstchild () and Node.getlastchild (): Returns the first and last child of a given Node.

Node.getnextsibling () and node.getprevioussibling (): Returns the next and previous sibling of the given Node.

Element.getattribute (String attrname): Returns the value of a property named Attrname for a given Element. If you need the value of the id attribute, you can use Element.getattribute ("id"). If the property does not exist, the method returns an empty string ("").


Let's do a basic example of DOM, which is simply a simple display of the contents of the Book.xml document above. This is the basic thing DOM does.


The code is as follows:


Import javax.xml.parsers.*;

Import org.w3c.dom.*;


public class domxml{

public static void Main (string[] args) throws exception{

Documentbuilderfactory factory=documentbuilderfactory.newinstance ();

Documentbuilder Bulider=factory.newdocumentbuilder ();

Document doc=bulider.parse ("Bookxml.xml");

NodeList nl=doc.getelementsbytagname ("book");

for (int i=0;i<nl.getlength (); i++) {

Element node= (Element) Nl.item (i);

System.out.println ("BookName is:" +

Node.getelementsbytagname ("BookName"). Item (0). Getfirstchild (). Getnodevalue (). Trim ());

System.out.println ("Bookauthor is:" +

Node.getelementsbytagname ("Bookauthor"). Item (0). Getfirstchild (). Getnodevalue (). Trim ());

System.out.println ("BOOKISBN is:" +

Node.getelementsbytagname ("BOOKISBN"). Item (0). Getfirstchild (). Getnodevalue (). Trim ());

System.out.println ("Bookprice is:" +

Node.getelementsbytagname ("Bookprice"). Item (0). Getlastchild (). Getnodevalue (). Trim () + "n/a");

}

}

}


The code is simple. Just print out the results.


 

 

What we can see in this code is that

Documentbuilderfactory factory=documentbuilderfactory.newinstance ();

Documentbuilder Bulider=factory.newdocumentbuilder ();

Document doc=bulider.parse ("Bookxml.xml");


So three pieces of code. The DOM model provides a factory, which is responsible for loading the XML interpreter. This allows the programmer to break out of the specific programming environment. There is no need to know about the interpreter. And Documentbuilder provides a way to create an XML document. Through this object, it also eliminates the need for the program and the underlying I The/O system calls. And the object generates the correct XML document tree through an interpreter. So that the XML is loaded into memory and the correct tree format is generated.

The second step is to read the correct document content. Reading the correct document content is like manipulating the tree. First, iterate through the required nodes. Then read the data.

So just take a look at the entire contents of the XML document read through the DOM, and you can draw a step:

1. Create an XML interpreter through Documentbulderfoctory.

2. Create a documentbuilder that can load and generate XML through an interpreter

3. Load and generate an XML tree via Documentbuilder. Instance of Document object

4. Through the document, you can traverse the tree. And read the contents of the corresponding node.


About the loading of the interpreter and the loading of the generator are placed in the import Javax.xml.parsers package. In addition to the DOM, the SAX interpreter and builder are included in the package. So here we'll talk about sax.


Creating an XML document from a DOM model

The DOM programming interface allows you to create XML documents in addition to reading the data in an XML document. Because DOM provides objects that generate a number structure that describes an XML document, it is common to write the generated tree structure in memory to any output stream. This requires a XmlDocument write method. This method is used to write XML documents to the output stream.

Case code:

Import javax.xml.parsers.*;
Import org.w3c.dom.*;
Import org.apache.crimson.tree.XmlDocument;
Import java.io.*;

public class domxml{
public static void Main (string[] args) throws exception{
Documentbuilderfactory factory=documentbuilderfactory.newinstance ();
Documentbuilder Builder=factory.newdocumentbuilder ();
Document doc=builder.newdocument ();
Element books=doc.createelement ("books");

Element book=doc.createelement ("book");
Book.setattribute ("id", "001");
Book.appendchild (Doc.createtextnode ("The Java Book"));

Books.appendchild (book);
Doc.appendchild (books);
((XmlDocument) doc). Write (New FileOutputStream ("Dom.xml"));
((XmlDocument) doc). Write (System.out);
}
}

The purpose of the program is to create an XML document, then create the structure of the document, and then output the structure created, where two streams are used, one for the screen output stream, and one for the output stream of the file.

Modifying an XML document through the DOM

In Jaxp, it is proposed that the XML produced by the DOM is viewed as a source, and the modified source can be written back to the XML document through the output stream. The DOM tree explained from the XML file can act as a synchronous update. This process first relies on the interpreter of the rain XML file. The interpreter first analyzes the XML document and then produces the tree structure, and the application can manipulate the tree and then write the results of the operation back to the XML file as the source. It needs to be used here. The Domsource object converts the tree described by the document into a source object, and then writes the source back to the XML file through a transformer. A program case file is given below.

The documentation is as follows, and the XML file being manipulated is the bookxml.xml we are familiar with.

The case code is as follows:

Import javax.xml.parsers.*;
Import Javax.xml.transform.dom.DOMSource;
Import Javax.xml.transform.stream.StreamResult;
Import javax.xml.transform.*;
Import org.w3c.dom.*;

public class modifyxml{
public static void Main (string[] args) throws exception{
Documentbuilderfactory factory=documentbuilderfactory.newinstance ();
Documentbuilder Builder=factory.newdocumentbuilder ();
Document doc=builder.parse ("Bookxml.xml");

Element Book;
Element BookName;
Element Bookauthor;
Element BOOKISBN;
Element Bookprice;

Insert
Book=doc.createelement ("book");
Book.setattribute ("id", "4");

Bookname=doc.createelement ("BookName");
Bookname.appendchild (Doc.createtextnode ("Java Programme begin book"));
Book.appendchild (BookName);

Bookauthor=doc.createelement ("Bookauthor");
Bookauthor.appendchild (Doc.createtextnode ("HESJ"));
Book.appendchild (Bookauthor);

Bookisbn=doc.createelement ("BOOKISBN");
Bookisbn.appendchild (Doc.createtextnode ("7-145-10241-3"));
Book.appendchild (BOOKISBN);

Bookprice=doc.createelement ("Bookprice");
Bookprice.appendchild (Doc.createtextnode ("77.8"));
Book.appendchild (Bookprice);

Node books=doc.getelementsbytagname ("books"). Item (0);
Books.appendchild (book);

Delete
Books.removechild (book);

Modify
NodeList allbook=doc.getelementsbytagname ("book");
Element Opbook=null;
for (int i=0;i<allbook.getlength (); i++) {
opbook= (Element) Allbook.item (i);
String Id=opbook.getattribute ("id");
if (Id.equals ("4")) {
Break
}
}
Opbook.getelementsbytagname ("Bookauthor"). Item (0). Getfirstchild (). setNodeValue ("Bluce");

Transformerfactory tf=transformerfactory.newinstance ();
Transformer Tr=tf.newtransformer ();
Domsource ds=new Domsource (DOC);
Streamresult sr=new Streamresult (System.out);
Tr.transform (DS,SR);
}
}

Here, first create a book node. Then add it to the document, and then you can delete and modify the node.


Simple XML programming Interface (SAX)

In order to solve the problem that DOM interpreter needs to occupy a large amount of memory, this paper proposes a method that does not load the entire XML document. Instead of reading the XML document piecemeal, it throws an explanation of the event. Therefore, the strategy of Sax programming interface is proposed. So sax does not need to generate a tree structure It saves memory overhead by not having to create any objects to describe XML. It's just a simple explanation of the XML file.

Understanding what happened to sax is necessary to programming with sax. For sax, the throw of the event is done by the interpreter, and the processing is done by the application, so implementing the event interface and filling in the event code is our business. For events can be handled or not handled, However, because Sax does not create any objects. The state of the event is also not maintained. If you want to maintain state in multiple events, it is also an application thing.


Sax Event Description:

For sax, events are the core of programming. There are 5 kinds of commonly used events.

Startdocument ()

This event indicates that the interpreter begins to interpret the document. It does not pass any arguments, so there is no need to process XML document data here.

Enddocument ()

This event indicates that the interpreter is ready to end an explanation of the XML document. So none of the above two events have done much practical work

Startelement (...)

Tells you that the parser has found a starting tag. This event tells you the name of the element, the name and value of all the attributes for that element, and also tells you some information about the namespace.

Characters (...)

Tells you that the parser found some text. You get an array of characters, an offset to the array, and a length variable, and you have the three variables to access the text found by the parser.

EndElement (...)

Tells you that the parser has found an end tag. This event tells you the name of the element and the associated namespace information.

Startelement (), characters (), and endelement () are the most important events, followed by a focus on these three events. All of these events belong to the ContentHandler interface, and other SAX interfaces are defined to handle errors, entities, and other infrequently used content.


 

Startelement () Event

The Startelement () event tells you that the SAX parser found the starting tag for an element. The event has four parameters:


String URI

The namespace URI. Since my XML document does not use namespaces, it does not discuss its meaning here, so you can ignore it. About namespaces you can refer to other XML-related data.

String LocalName

An element name that does not include a namespace

String QualifiedName

The qualified name of the element, which is a combination of the namespace prefix and the element local name.

Org.xml.sax.Attributes Attributes

A collection that contains all the attributes of the element. This object provides several ways to get the name and value of a property and the number of attributes for that element.

If your XML application looks for the content of an element, the Startelement () event can tell you when the element starts.


 

 

Characters () event

The characters () event contains the characters found in the source file by the parser. In the spirit of minimizing memory footprint, this event contains an array of characters, which is much lighter than a Java String object. The following are the parameters for the characters () event:


Char[] Characters

The array of characters found by the parser.

int start

The index number of one character in the characters array that belongs to the event.

int length

The number of characters Fu in the event.

If your XML application needs to store the content of a particular element, you can place the code that stores that content in the characters () event handler.


EndElement () Event

The EndElement () event tells you that the parser has found an end tag for an element. It has three parameters:

String URI string LocalName string QualifiedName

These three parameters are the same as startelement.

A typical response to this event is to change the state information in the XML application. For example, in my program it will be to print the read XML document to the screen.


So we're going to write an application about sax. It's the same as the same program as the DOM just used to print the interpreted XML document information.

The code is as follows:

Import Java.io.File;

Import javax.xml.parsers.*;

Import org.xml.sax.helpers.*;

Import org.xml.sax.*;


public class Saxxml extends defaulthandler{

Private String elementname;

private int id;

Private String BookName;

Private String Bookauthor;

Private String BOOKISBN;

Private String Bookprice;


Public Saxxml () {

This. Elementname= "";

This.id=0;

This.bookname= "";

This.bookauthor= "";

This.bookisbn= "";

This.bookprice= "";

}


public void Startdocument () {

System.out.println ("Document Begin");

}


public void Startelement (String uri,string localname,string qname,attributes Attributes) {

This. Elementname=qname;

if (Qname.equals ("book"))

System.out.println (attributes.getvalue (0));

}


public void characters (char[] ch,int start,int length) {

String Str=new string (ch,start,length);

if (this. Elementname.equals ("BookName") &&!str.trim (). Equals ("")) {

THIS.BOOKNAME=STR;

}

if (this. Elementname.equals ("Bookauthor") &&!str.trim (). Equals ("")) {

THIS.BOOKAUTHOR=STR;

}

if (this. Elementname.equals ("BOOKISBN") &&!str.trim (). Equals ("")) {

THIS.BOOKISBN=STR;

}

if (this. Elementname.equals ("Bookprice") &&!str.trim (). Equals ("")) {

THIS.BOOKPRICE=STR;

}

}


public void EndElement (String uri,string localname,string qName) {

if (Qname.equals ("BookName")) {

System.out.println (This.bookname);

}

if (Qname.equals ("Bookauthor")) {

System.out.println (This.bookauthor);

}

if (Qname.equals ("BOOKISBN")) {

System.out.println (THIS.BOOKISBN);

}

if (Qname.equals ("Bookprice")) {

System.out.println (This.bookprice);

}

if (Qname.equals ("book")) {

System.out.println ();

}

This. Elementname= "";

}


public void Enddocument () {

System.out.println ("Document end");

System.exit (0);

}


public static void Main (string[] args) throws exception{

SAXParserFactory factory=saxparserfactory.newinstance ();

SAXParser Parser=factory.newsaxparser ();

Parser.parse (New File ("Bookxml.xml"), New Saxxml ());

}


}


The code implements the reading of XML documents through events. Here's a little bit of attention to the processing of whitespace, which I've solved with a string trim method. Of course you can use other methods as well. This program is used in startelement to set up elementname ( The name of the element) in this state, and in the characters the corresponding action is made in the state. This is mainly to put the read information into the properties of our objects. This property is read by the EndElement event.


The steps for Sax development are as follows:

1. Implement the ContentHandler interface and fill in the event code (here I am using the inherited DefaultHandler class, which implements the ContentHandler interface)

2. Create a SAX interpreter factory

3. Create Sax interpreter through factory

4. Loading an XML document with a SAX interpreter, loading the class instance object that has implemented the ContentHandler interface into the interpreter

5. Interpreter through callback to your event procedure


 



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.