Java Theory and Practice: Using XQuery for Screen collection

Source: Internet
Author: User
Tags tidy xml parser xpath xsl xquery

Last month, Java technology instructor Sam Pullara showed me his latest Java-enabled phone, Nokia 6630. The phone uses a full range of technologies--embedded JVMs, GPRS, and Bluetooth--but it also suffers from the problems of all smartphones--a limited screen that actually uses the area. Some Web sites support mobile-based browsers, and embedded browsers are trying to effectively render pages on a small screen, but looking at a typical Web page on a phone screen is like forcing an elephant into the back seat of the car (each of the participants will be disappointed, including you, the car and the elephant). The Sam built a simple, elegant solution to screen the data from his favorite Web site and reformat the data and display it on a small screen.

New method

There are many ways to extract data from an HTML document, but I really liked the way Sam used it: to use XQuery as a screen-collecting tool (extract quite a lot of data from a page) and use it as a stylesheet tool (reformat the data so that the data fits the page, without the need for page scrolling). With a small amount of infrastructure and some very simple XQuery expressions, you can extract relevant data from a large number of data sources-such as traffic, weather and financial quotes-and display the data in good condition on the phone.

I used to be in a situation where screen-scraping of HTML pages seemed to be a viable option for certain problems, but there were few Java toolkits for screen collection. There are many HTML parsing tools, but they often lack sufficient abstraction (clutter of screen-gathering code), a large number of applications that do not conform to HTML specifications, and they are unable to handle dynamically generated pages whose structures may change over time.

To make up for the gap between poor quality HTML and rich XML processing tools, you first have to convert HTML to XML. Many tools help to do this; The Jtidy Toolkit is doing a good job of making this work easier. Jtidy's design goal is to read the typical quality (that is, bad) HTML and output cleaner results (options available), and it also provides a DOM interface for traversing HTML documents that can be sent to the XML parser. The code in Listing 1 reads the HTML document from the InputStream and generates the DOM representation of the document:

Listing 1. Convert HTML to XML-compliant DOM with Jtidy

Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
Document tidyDOM = tidy.parseDOM(inputStream, null);

With this simple transformation, almost every Web page is processed as an XML document, and any XML tool (such as SAX, XSL, XPath, and so on) that you like can be used to extract data. Although XSL may be a wise choice (because it is designed to extract information from an XML document and transform it to show them), it is difficult to grasp the learning curve if you do not understand the XSL, even the simplest XSL transformations are too complex to annoy. XPath is a good choice for processing information extraction--xsl and XQuery use it for content selection, it's easy to use XPath to extract the data you need, and then format the HTML, but XQuery makes the tool easier.

XQuery: Introduction

XQuery is designed to extract data from a potentially very large set of XML data. The data set you enter does not have to be an XML document, although it may be an XML document, but it may be a collection of documents that have been indexed and saved in an XML database, or even a table in a set of relational databases. Like SQL, XQuery contains functions that extract data from multiple datasets, summarize data, aggregate data, and connect data.

Like the presentation template language such as JSP, ASP, or Velocity, XQuery combines elements from two domains (representing domains and computed fields) into a combination syntax. As a result, all XML documents automatically become valid XQuery expressions and evaluate themselves. XQuery also contains language statements (language statement) such as "for" and "let", which can be mixed with XML elements.

Listing 2 shows a sample XML document Bib.xml, which represents a bibliography. Then we'll introduce some quick XQuery expressions that give you an understanding of what XQuery can do, and finally we'll go back to the sample screen collection. To fully describe the syntax and usage of XQuery, it may take hundreds of pages, for more detailed reference materials and examples, see the Resources section.

Listing 2. Sample XML Bibliography

<bib>
   <book year="1994">
     <title>TCP/IP Illustrated</title>
     <author><last>Stevens</last><first>W.</firs t></author>
     <publisher>Addison-Wesley</publisher>
     <price> 65.95</price>
   </book>
   . . . more books . . .
</bib>

Listing 3 shows an XQuery expression that selects all books published by Addison-wesley after 1991, extracts their titles, and formats the title as a bulleted (<ul>) list before it. Curly braces represent a switch from "presentation mode" (data passed directly to the output, such as <ul> and <li> tags) to code mode, and then an implicit switch from code mode to presentation mode immediately after the return clause.

Listing 3. An XQuery expression that selects book titles based on query parameters

<ul>
{
  for $b in doc("bib.xml")/bib/book
  where $b/publisher = "Addison-Wesley" and $b/@year > 1991
  return
   <li>{ data($b/title) }</li>
}
</ul>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.