parsing processing of XML and other

Source: Internet
Author: User
Tags string access advantage
XML DOM and sax are two mainstream choices, and jdom,dom4j do well.

The DOM parser converts an XML document into a tree containing its contents, and can traverse the tree. The advantage of using the DOM parsing model is that it is easy to program, the developer only needs to invoke the build instructions, and then use the navigation APIs to access the required tree nodes to complete the task. You can easily add and modify elements in the tree. However, because of the need to process the entire XML document with the DOM parser, the performance and memory requirements are high, especially when you encounter a large XML file. Because of its traversal capabilities, DOM parsers are often used in services where XML documents require frequent changes.

Example: Import java.io.*;import java.util.*;import org.w3c.dom.*;import javax.xml.parsers.*;

public class myxmlreader{

public static void Main (String arge[]) {

Long lasting =system.currenttimemillis ();

try{

File F=new file ("Data_10k.xml");

Documentbuilderfactory factory=documentbuilderfactory.newinstance ();

Documentbuilder Builder=factory.newdocumentbuilder ();

Document doc = Builder.parse (f);

NodeList nl = doc.getelementsbytagname ("VALUE");

for (int i=0;i

System.out.print ("License plate number:" + doc.getelementsbytagname ("NO"). Item (i). Getfirstchild (). Getnodevalue ());

System.out.println ("Owner address:" + doc.getelementsbytagname ("ADDR"). Item (i)-getfirstchild (). Getnodevalue ());

}

}catch (Exception e) {

E.printstacktrace ();

}

The SAX parser employs an event-based model that triggers a series of events when parsing an XML document, and when a given tag is found, it can activate a callback method that tells the method that the label has been found. Sax typically requires less memory because it lets developers decide which tag they want to handle. Especially when developers only need to work with some of the data contained in the document, Sax has a better ability to expand. But coding is difficult when using a SAX parser, and it is difficult to access multiple different data in the same document at the same time.

Example: Import Org.xml.sax.*;import org.xml.sax.helpers.*;import javax.xml.parsers.*;

public class Myxmlreader extends DefaultHandler {

Java.util.Stack tags = new java.util.Stack ();

Public Myxmlreader () {

Super ();}

public static void Main (String args[]) {

Long lasting = System.currenttimemillis ();

try {

SAXParserFactory SF = Saxparserfactory.newinstance ();

SAXParser sp = Sf.newsaxparser ();

Myxmlreader reader = new Myxmlreader ();

Sp.parse (New InputSource ("Data_10k.xml"), reader);

catch (Exception e) {

E.printstacktrace ();

}

System.out.println ("Run Time:" + (System.currenttimemillis ()-lasting) + "milliseconds");}

public void characters (char ch[], int start, int length) throws Saxexception {

String tag = (string) tags.peek ();

if (Tag.equals ("NO")) {

System.out.print ("License plate number:" + New String (CH, start, length));} if (Tag.equals ("ADDR")) {

SYSTEM.OUT.PRINTLN ("Address:" + New String (CH, start, length));}

public void Startelement (String uri,string localname,string qname,attributes attrs) {

Tags.push (qName);}

Note: When the form of XML data is passed as a pass, it is more suitable to use DOM, although it has higher requirements to the system (memory, performance, etc.), but the general server can satisfy the processing of XML document on G.

Sax can be used when there is a need for certain aspects of XML or specific access to certain nodes, or for a timely event to be appropriate. It is based on the time processing mechanism, in programming, by overloading some event methods to obtain the processing of XML documents.

About XML encoding, InputStreamReader and XmlReader relationships:

The usual DOM and sax for documents encoded in ASCII, read XML documents with InputStreamReader, then become Unicode codes, and cannot be handled with XmlReader, with the error being encountered because of invalid Unicode characters. (When you use the System.out.println () output There is no problem, because it can automatically be converted to the local machine code).

Way to solve:

BufferedReader br=new BufferedReader (New InputStreamReader (new FileInputStream (f), "iso8859-1"));

This allows you to limit its encoding, so there is no problem.

String length problem: There is no length limit for string type Ann, but the maximum length of string in the general JDK is 4G.

string is associated with bufferedstring: a large number of processes that do not involve strings being effective, usually using string. Bufferedstring has an advantage in handling large amounts of string processing



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.