simplifying XML Read and write Significantly simplifies DOM encoding using query definitions such as XPath |
|
|
|
|
|
send this page as an e-mail message |
|
|
|
Level: Primary
Cameron Laird (Claird@phaseit.net), Vice President, Phaseit, Inc.
The rational use of Xml,xpath on July 02, 2007 can significantly simplify and accelerate applications. If you don't have XPath in your toolkit, add it now. A specific example, written in Python, makes the query idiom appear more natural.
XPath is supposed to be your friend.
If you rarely use XML for development and have never used XPath, you will have the opportunity to develop applications that are stronger and more maintainable. This article details some of the specific examples that illustrate how much impact query methods can have in simple XML processing.
Basic knowledge
To ensure that XPath goes well, consider some basic principles: The following examples are useful for using Python, which you may have heard several times before-it is designed to solve specific and practical problems, passed by the Committee unanimously, and so on. xsomething I admit I've heard of it. Although XPath differs from all other Alphabet Soup in XML, it is indeed a very useful tool. For programmers actually involved in programming, XPath is one of the most learning-value XML definitions, second only to XML itself and XHTML. You do not need XPath. You're a hands-on XML programmer, and your application already has the functionality you need. But the key is that traditional process programming does not solve common XML problems very well. Although the program works correctly, a small amount of query code can make your program more readable, easier to maintain, and, in many cases, run faster, sometimes with obvious results. The purpose of using this code is not to discard the existing normal programming methods, but to learn some assistive techniques that can be used to achieve the effect of a vertical bar.
Most readers of the developerwork XML zone are dedicated to C, C + +, or Java development. But Python python is a very worthwhile tool to show because it reads like streamlined pseudocode and is widely available. Even the beginners of Python can understand the subsequent usage.
Like many other languages, Python supports a large number of XML libraries. There used to be no knowing where to start. However, with the introduction of the Python 2.5 version, Fredrik Lundh's extraordinary ElementTree module became the standard; the example in this article will use this library. The library itself is small and can be attached to any 1.5.2 version of Python, which is explained in reference. Once the ElementTree is properly positioned, all the conditions required to execute the sample program are ready.
XML programming Model
Most XML programming is currently based on the Document Object Model (MODEL,DOM). This model treats an XML instance as an element tree, identifies elements with tags, and many elements have attributes and/or child elements. XML programming is a heavy task, because processing must navigate through the tree to individual elements. For example, consider the template XML that DeveloperWorks uses in developing the content of developerWorks articles. This looks very similar to XHTML. Its pattern is straightforward, of course, compared to an XML instance that is common in industrial applications, which may be used to gather many of the details of a machine or financial event. Even with this simple model, if you want to report all the anchor points in the text-all elements marked <a>-then you need to navigate the entire document to an arbitrary nesting depth. Simple non-recursive procedure expressions do not have access to all elements, although many libraries contain helpful features that can be used to traverse a tree.
Result: A simple program to report all the anchors will resemble the following:
Listing 1. DOM based code for reporting all anchor points
Import ElementTree. ElementTree
def detail_anchor (element):
if Element.tag = = "a":
attributes = Element.attrib
if "href" In Attributes.keys ():
print '%s ' was at URL '%s '. "% (Element.text,
attributes[' href '])
if" name "in Attribut Es.keys ():
print '%s ' anchors '%s '% (Element.text,
attributes[' name ')
def (element):
Detail_anchor (Element) for
x in Element.getchildren (): The "(
x)" (ElementTree)
. Elementtree.parse ("Draft2.xml"). Getroot ())
|
Using the 5.5.1 reference template described below produces output similar to the following:
Listing 2. The report generated by the program in Listing 1
' Related DeveloperWorks content ' is at ' http://www.ibm.com/developerworks '.
' Entire series ' is at ' http://www.ibm.com/... '
IBM Product Evaluation versions ' is at ' http://www.ibm.com/... '
|
If there is a standard way to query DOM instances and extract information about the specified elements, attributes, and tags, how much simplification can be made. You can see it yourself:
Listing 3 is an XPath based code that is equivalent to listing 1.
Import ElementTree. ElementTree
def detail_anchor (element):
if Element.tag = = "a":
attributes = Element.attrib
if "href" In Attributes.keys ():
print '%s ' was at URL '%s '. "% (Element.text,
attributes[' href '])
if" name "in Attribut Es.keys ():
print '%s ' anchors '%s '% (Element.text,
attributes[' name ') for
element in/
ElementTree. Elementtree.parse ("Draft2.xml"). FindAll ("//a"):
Detail_anchor (Element)
|
Note that "//a" in English means "search the entire document for elements labeled ' a '." In essence, listing 3 produces the same output as listing 2.
|
ElementTree installation
We want the Python installation to be attached with ElementTree. While you will be happy to read the code listings, it would be even better if you could execute the code yourself and make changes to the code and experiment with your own ideas. It's very easy to do. Compared to the many clumsy XML toolkits I've used, ElementTree can be installed and started using in just a few minutes. First, you need Python. Most current versions of UNIX, including MacOS and almost all Linux variants, are built into Python. For Windows, see ActivePython in resources. Correctly Place Python: Retrieves the ElementTree source package (as described in resources in the ElementTree home page). Unzip the package. Navigate to the correct directory (probably similar to the elementtree-1.2.6-20050316/--key is the directory where setup.py resides). Invoke Python setup.py install at the command line. All done. This is what you need to operate. To verify the installation, run the interactive Python shell and request import ElementTree. ElementTree. |
|
Compare Listing 1 and listing 3. The former is simple enough-but the latter is simpler. It eliminates the need to explicitly specify recursion. More importantly, XPath code is based on XML (logical) markup, not structure. The markup is closer to the human mind and lasts a long time, and from this point of view the structure is a relatively short-lived implementation detail. By using more complex queries, the declarative style of XPath is more obvious than the traditional structure-oriented process search implementation.
The significance of this article is that XPath can be incorporated into the development of existing XML programs, with immediate performance and maintainability gains. For me, it's more like using assembler, compilers, and higher-order languages: I can write the entire program entirely in machine language, but learning high productivity is a simple and cost-effective approach. In addition, XPath can generally improve the performance of a hand-coded search.
However, this creates a problem: Is there a way to improve XPath? Yes, of course--but there are some limitations. XQuery and XSLT are two other XML definitions that are quite acceptable to XPath (XSLT may be the most widely used among the three, while XQuery has the least useful implementation). XSLT is similar to XPath at the point of the Declaration, and Xquery adds the functionality of the procedure to the XPath query. Both can generate templates--briefly, queries that embed recognizable XML snippets--more or less in line with language habits. This is noteworthy because some XML theorists recommend templates as an ideal representation.
Overall, however, the benefits of XQuery, XSLT, or such language-specific query packages (such as Amara, XQJ, or JAXP) are incremented relative to the great benefits that XPath produces. Uche Ogbuji, provided in resources, clearly compares the sample issues that are addressed in a variety of different ways. Use these more professional interfaces only when you feel that XML processing is quite complex. But now you're starting to use XPath.
Other examples
The purpose of this article is to inspire you to use XPath instead of teaching you to use it. I hope you can see clearly that learning XPath doesn't take much effort.
At the same time, we want to know what functions can be implemented using XPath. The query in Listing 3 uses//a as a simple example. Finally, you'll learn more complex query features, such as/parent/child[@attr = ' value '], which specifies the child node of all "parent" and specifies that the value of the property "attr" is "value". Two other examples://@*, which checks all the attributes of all tags;//tr/td|th retrieves all TD or TH elements in tr.
Performance
Although performance measurements always need to be narrowed down to a specific measurement method, thinking about XPath performance requires some context. In layman's terms, XPath is faster than you can achieve. In large, complex programs, performance is critical, and XPath queries run faster (sometimes much faster) than the equivalent of hand-written process code. In principle, you can use existing knowledge about the content or layout of the data to write an ingenious query that can run quickly. In practice, however, the effect of using XPath is similar to the effect of using a high-level language (HLL) (rather than an assembler or C language): With just a little work, the developer can complete a program that runs correctly, while HLL All of the surface intrinsic speed defects can be compensated by more advanced application-specific algorithms.
In exactly the same way, XPath simplifies coding for some noticeable problems. In these problems, the speed effect is enough to enable programmers to carefully develop their own solutions with idle time. An XPath based solution is not only easier to correct and maintain, but is at least fast.
In addition, some XPath implementations are written in great detail, with memory usage higher than normal process searches. In these examples, XPath based coding can greatly improve the search speed--10 times times or more.
The absence of specific measurement methods to illustrate these problems is indeed frustrating. Although any particular comparison can be misleading, the relative performance of XPath depends on the layout of the development language, specific XPath implementations, XML query images, search, and runtime available memory (most of the time). Learn how to do simple timings and learn to test XPath on your application's dataset. If your experience is similar to mine, then you will often find that XPath is only half the speed of the native encoding, especially for some minor problems, but it is a level faster than other languages.
Conclusion
XPath is an XML tool that is extremely inexpensive to use: you might have built it into an XML library by using the development language. At the same time, XPath has the potential to significantly improve performance, and its simple presentation actually simplifies programming and maintenance. In addition, there are plenty of XPath tutorials to help you learn the knowledge you need. In the end, you'll find that investment in XPath can bring benefits quickly.
Once you've learned about XPath, you'll also be better able to judge whether it's too hard to use XML, and you'll need some advanced tools, such as XQuery and XSLT, to benefit you.
Be sure to read the materials listed in resources to help you get started with XPath programming immediately.
ResourcesLearning
You can refer to the English version of this article on the DeveloperWorks Global website.
Developerwork's "Cute Python" column: Why use Python to interpret XPath. For details on the merits of this language, see David Mertz's column. Recently, it has been widely used, and programmers who don't even know it at all tend to find it easy to read.
ActivePython: Explore ActivePython, based on ActivePython's homepage, ActivePython is a quality-assured, installable release of Python. Because of the reliability of its installation, the authors specifically recommend that Windows users interested in Python use ActivePython.
ElementTree Home page: View the ElementTree wrappers written and maintained by Fredrik Lundh How to add code to load the XML file as a tree of Element objects and save them back. In addition to the simple installation discussed in the sidebar above, ElementTree also comes with implementations to take advantage of other libraries and improve performance.
In a lot of tutorials on XPath, here's where to start: An Introduction to XPath (Bertrand portier,developerworks,2004 May): Learn what the grammar and semantics of Xpath,xpath languages are, and how to use XPath locations Paths, how to use XPath expressions, how to use XPath functions, and how XPath is associated with an XSLT. This tutorial is about the XPath version 1.0. XPath tutorial (Miloslav Nic and Jiri Jirat,zvon): Learn the selected XPath features through a number of examples.
XML Path Language (XPath) Specification: for common syntax and semantics for XSL transformations [XSLT] and xpointer sharing functions, read the Maintenance specification for the consortium.
Practical data binding: Using XPath as a data binding tool, part 2nd (Brett mclaughlin,developerworks,2006 January): Executing XPath queries using the JAXP API and encoding in Java using XPath 。
"XGrep is a kind of grep-like XML document tool. "Xgrep is a very good tool and developers should always keep it around." This article demonstrates XML programming using Python as an example. However, if Cameron's introduction focuses on the XPath itself, rather than how it is programmed, then he will teach it according to XGrep.
Read the following two developerwork introductory articles: An Introduction to XQuery (Howard katz,2006 update February): Read the XML Query Language standard proposed by the consortium. What type of language is XSLT. (Michael kay,2001 February): Put XSLT in context-its origins, its strengths, and why it is used.
Look at the different opinions of the two authors, but the two articles are written carefully, and they can complement each other well. A conversation with Jonathan Robie about XQuery: In this interesting interview, Jonathan Robie illustrates the meaning of the XML query language. Is the XQuery an Omni-tool?:ogbuji pays special attention to XQuery. Ogbuji details the relationship between XPath, XSLT, and XQuery in these meeting records.
Developerwork China website XML Zone: Understand all the information about XML.
IBM XML Authentication: Learn how to become a developer of IBM-certified XML and related technologies.
XML Technical Document library: The DeveloperWorks XML Zone provides a wide range of technical articles and hints, tutorials, standards, and IBM's Red Book.
DeveloperWorks Technical Activities and webcasts: keep an eye on the latest advances in technology.
access to products and technology
IBM Beta software: Build your next development project using IBM trial software, which can be downloaded directly from DeveloperWorks.
discussion
XML Zone Discussion forum: participate in any XML-centric forums.
About the author
|
|
|
Cameron Laird is a developerWorks long-term contributor and former columnist. He often writes articles on projects to promote the development of his company's applications, focusing on reliability, security, and system integration that was not originally designed for collaboration. |