Notes for parsing XML in saxp

Source: Internet
Author: User

Saxp uses streaming to read XML, and uses event-triggered and callback functions to process XML content. It occupies a small amount of memory and is fast. This is suitable for 1) reading only but not changing XML content. 2) Only processing the content once, similar to searching for related content in XML.

Create Factory

The XML parsing process. First, you need to create a factory. The factory is used to create parser and contains some attributes for creating parser.

 
Saxparserfactory spfactory = saxparserfactory. newinstance (); // get an instance
 
Spfactory. setschema (schema); // setschema will use schema to generate a validator to verify the advantage of the ideal XML, and then send it to handler for processing.
Spfactory. setvalidating (false); // whether to verify the XML file. This verification is DTD verification. The default value is false.
 
Spfactory. setnamespaceaware (false); // This affects handler processing. The default value is false. If it is true, Parser first processes the XML prefix, finds the corresponding namespace, and then passes it as a parameter to handler. Otherwise, namspace is null. Spfactory. setxincludeaware (false); // whether to process xinclude nodes in XML. The default value is false.

Another Extensible Method is setfeatures. As the name implies, the parser function generated by the factory is customized.

For example, the following lineCode:

Saxfac. setfeature ("http://xml.org/sax/features/namespace-prefixes", true );

When setnamespaceaware is set to true, handler cannot obtain attributes related to the namespace such as xmlns by default. This makes sense in most cases, because namespace has been parsed by parser, and handler knows the namespace of each node. However, if you still need the complete attribute, you will be crazy-Debug can see it, but it has actually been filtered out. At this time, you need to set this feature before you can get it.

Create parser

After setting the corresponding properties, you can use the factory instance to create a parser.

Saxparser = spfactory. newsaxparser (); // create a saxpparser according to the factory settings.

With parser, you can set different handler to process XML. Use parser. getxmlreader () to get an xmlreader.

The most commonly used is contenthandler. Contenthandler is an interface. After creating an instance implemented by this interface, you can set it to parser. During XML parsing, Parser calls the corresponding contenthandler Method Back and forth based on the event.

Xmlreader. setcontenthandler (contenthandler );

In addition, there are setentityresolver (entityresolver), setdtdhandler (dtdhandler) and seterrorhandler (errorhandler ). These are never used = _ =. Have you used it? Usecase...

Contenthandler

Contenthandler is the most important class for processing XML documents. The method is triggered when a specific event occurs.

Startdocument (); // This event triggers enddocument () when parsing a document; // This event triggers startprefixmapping (string prefix, string URI) after the document resolution is complete ); // This event is triggered only when setnamespaceaware is set to true. It is triggered once every time the xmlns attribute is encountered. Prefix is the content after xmlns is removed. Uri is the namespace corresponding to this prefix. Therefore, if you want to get these attributes when setnamespaceaware is set to true, in addition to setfeature, you can also process this event and save the corresponding prefix-Uri mappingendprefixmapping (string prefix ); // This event is triggered after the node containing the xmlns attribute ends. For WSDL, It is triggered after the endelement of WSDL: definitions is triggered. Startelement (string Uri, string localname, string QNAME, attributes ATTS); // triggered at the beginning of a node. If setnamespaceaware is true, the URI value will be resolve; otherwise, it will be empty. Localname indicates the node name. For example, for WSDL: definitions, it is definitions. QNAME is the full name of a node, for example, WSDL: definitions. ATTS is all attributes of the node. However, there will be no xmlns attribute at this time. If setnamespaceaware is set to false, the URI is not parsed. Uri is empty, localname is empty, and QNAME is the name of a node such as Xs: sequence. ATTS returns K-V mapping of all node attributes. Endelement (string Uri, string localname, string QNAME); // triggered after the node is processed. The parameter value is the same as that of startelement. Characters (char ch [], int start, int length); // This event is triggered if there are characters before the start and end of node. For example, <nodea> some character </nodea>. After startelement nodea, before endelement nodea, the characters method is called. The string created with new string (CH, start, length) is "some character ". Note that if the content in the node contains multiple rows, This method may be called multiple times. That is, characters may be called once for each row (not verified. Note that the created string already contains the newline ignorablewhitespace (char [], Int, INT). // the document says receive notification of ignorable whitespace in element content. I have not tried it. It seems to be related to validator. Please answer... Processinginstruction (string target, string data); // triggered when XML has pi. But the PI at the beginning of XML won't be triggered, that is, <? XML version = "1.0" encoding = "UTF-8"?>, This event will not be triggered. In addition, the PI in XML will be processed. For example, <? Testpi key1 = "value1" key2 = "value2"?> It will be triggered twice. The target is testpi, and the data is key1 = "value1" and key2 = "value2 ". Skippedentity (string); // the document says receive notification of a skipped entity. I don't know what it means... Setdocumentlocator (locator Locator); // This method is also useful. parser will call this method before processing the document. If you need to know the column number of the rows currently processed in the document, you need to implement this method. The implementation does not require any processing. You only need to create a locator member variable in your own contenthandler, use this member variable to save the passed parameters, and then remember to call Super. setdocumentlocator. Parser updates the variable values in locator in real time. You can use the getlinenumber and getcolumnnumber of locator to obtain the rows and columns currently processed. There is also publicid and systemid, which do not know what to do.

Basically, this content is enough for parsing XML.

Ask questions

BesidesArticleIn addition to a few unclear, there is still a question to keep wondering the answer. When saxp processes XML, it generally finds something to look for and does not need to process more than half of the XML below. Is there any way to stop parser from processing? To make a Boolean mark, Parser still needs to repeat the XML. If the XML is large or remote XML, it will be more useful.

The answer http://stackoverflow.com/questions/2964315/java-sax-parser-how-to-manually-break-parsing found on stackoverflow

Simply put, an exception is directly thrown when the processing is completed and it is determined that no further processing is required...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.