The Microlark developed by John Cowan is an open source Microxml parser in the Java™ environment. In this article, we'll use sample code to learn Microlark.
Microxml is a backward-compatible, XML-simplified version and a new specification. In the 1th part of this series, part 1th: Exploring Microxml's http://www.aliyun.com/zixun/aggregation/17687.html, we introduced the basics of Microxml and explained its The difference between XML 1.x and related standards. Microxml was proposed by James Clark, and John Cowan created its first parser Microlark, which led to the development of this specification. Microlark belongs to the open source (Apache 2.0 license) tool, written in the Java language, which implements several parsing modes: Push mode, pull mode, and tree mode.
In this article, we'll learn to parse the Microxml format. We'll explore all aspects of the Microlark parser API using command line and sample code.
Started
To keep up with the sample progress in this article, you need to download:
Microlark.jar, if you wish, you can also download its source code open source Jython interpreter
First, you can run Microlark on the command line and use a Microxml file as an input file. Listing 1 makes some changes to the simple files used in the basic guidelines for exploring Microxml in part 1th:
Save the sample as Listing1.xml and place it in Microlark using the code shown in Listing 2.
Listing 2. Microlark
Java-jar Microlark.jar listing1.xml
You should see the output shown in Listing 3.
Listing 3. Output results
(Htmlalang en-\n--\n-(head-\n-title-welcome page) title-\n-) head-\n-(body-\n-(P-welcome to AAhref IBM.COM/DEVELOPERWORKS/-IBM developerworks) p-\n-) body-\n) HTML
Does this look slightly different? Listing 3 uses a format called PYX, which is a line-oriented representation of an XML document, originating from the presentation specification of the SGML document. PYX renders all the information in an XML document in a way that minimizes the burden of parsing. This is a very useful tool, but unfortunately, it is often overlooked by XML developers.
The default action for Microlark is to convert a microxml document to a subset of PYX or even PYX, because Microxml is a subset of XML.
The PYX format is very simple. The first character of each line represents the content type of the row. Content is not written directly across rows, but there may be multiple rows containing the same content type. For tag attributes, property names and property values are separated directly by a space and no additional quotes are used. Listing 4 shows the prefix characters.
Listing 4. Prefix characters
(Start-tag) End-taga attribute-character data (content)? 處理 instruction
The legend corresponds to the input above. The biggest advantage of PYX is that it can be used with long, extremely useful UNIX® text processing commands, such as grep, awk, Sort, sed, and awk.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.