Using Microlark to deal with Microxml

Source: Internet
Author: User
Keywords Microlark microxml

The Microlark developed by John Cowan is an open source Microxml parser in the Java™ environment. In this article, we'll use sample code to learn Microlark.

Microxml is a backward-compatible, XML-simplified version and a new specification. In the 1th part of this series, part 1th: Exploring Microxml's http://www.aliyun.com/zixun/aggregation/17687.html, we introduced the basics of Microxml and explained its The difference between XML 1.x and related standards. Microxml was proposed by James Clark, and John Cowan created its first parser Microlark, which led to the development of this specification. Microlark belongs to the open source (Apache 2.0 license) tool, written in the Java language, which implements several parsing modes: Push mode, pull mode, and tree mode.

In this article, we'll learn to parse the Microxml format. We'll explore all aspects of the Microlark parser API using command line and sample code.

Started

To keep up with the sample progress in this article, you need to download:

Microlark.jar, if you wish, you can also download its source code open source Jython interpreter

First, you can run Microlark on the command line and use a Microxml file as an input file. Listing 1 makes some changes to the simple files used in the basic guidelines for exploring Microxml in part 1th:

Listing 1. A simple file

<! DOCTYPE html><html lang= "en" > <!--A comment--> <head> <title>welcome page</title> </head> <body> <p>welcome to <a href= "http://ibm.com/developerworks/" >IBM developerworks< /a>.</p> </body></html>

Save the sample as Listing1.xml and place it in Microlark using the code shown in Listing 2.

Listing 2. Microlark

Java-jar Microlark.jar listing1.xml

You should see the output shown in Listing 3.

Listing 3. Output results

(Htmlalang en-\n--\n-(head-\n-title-welcome page) title-\n-) head-\n-(body-\n-(P-welcome to AAhref IBM.COM/DEVELOPERWORKS/-IBM developerworks) p-\n-) body-\n) HTML

Does this look slightly different? Listing 3 uses a format called PYX, which is a line-oriented representation of an XML document, originating from the presentation specification of the SGML document. PYX renders all the information in an XML document in a way that minimizes the burden of parsing. This is a very useful tool, but unfortunately, it is often overlooked by XML developers.

The default action for Microlark is to convert a microxml document to a subset of PYX or even PYX, because Microxml is a subset of XML.

The PYX format is very simple. The first character of each line represents the content type of the row. Content is not written directly across rows, but there may be multiple rows containing the same content type. For tag attributes, property names and property values are separated directly by a space and no additional quotes are used. Listing 4 shows the prefix characters.

Listing 4. Prefix characters

(Start-tag) End-taga attribute-character data (content)? 處理 instruction

The legend corresponds to the input above. The biggest advantage of PYX is that it can be used with long, extremely useful UNIX® text processing commands, such as grep, awk, Sort, sed, and awk.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.