Instance parsing Ruby program call Rexml to parse XML Format data Usage _ruby topic

Source: Internet
Author: User
Tags xml parser xpath

Rexml is a library written by Sean Russell. It is not Ruby's only XML library, but it is a popular one and is written in pure Ruby (Nqxml is also written in Ruby, but Xmlparser encapsulates the Jade library written in C). In his Rexml overview, Russell commented:
I have this problem: I don't like confusing APIs. There are several XML parser APIs for Java implementations. Most of them follow DOM or SAX, and are very similar in principle to the many Java APIs that continue to appear. That is, they look like they were designed by theorists who have never used their own APIs. In general, the existing XML APIs are annoying. They use a markup language that is clearly designed to be very simple, first-class, and powerful, and then encapsulate it with annoying, excessive, and large APIs. Even for the most basic XML tree operations, I always have to refer to API documentation; Nothing is intuitive, and almost every operation is complicated.
Although I don't think it's annoying, I agree with Russell's view that XML APIs are a lot of work for most people who use them.

Example
look at the following book.xml:

Reference

 <library shelf= "recent acquisitions" > <section name= "Ruby" > <book ISB n= "0672328844" > <title>the Ruby way</title> <author>hal fulton</author> <descriptio N> Second Edition. 
   The book is are now reading. 
  Ain ' t recursion grand? 
   </description> </book> </section> <section name= "Space" > <book isbn= "0684835509" > <title>the Case for mars</title> <author>robert zubrin</author> <description>Pus 
   Hing toward a second home for the human race. </description> </book> <book isbn= "074325631X" > <title>first man:the Life of Neil A. Armstrong</title> <author>james R. hansen</author> <description>definitive biography of th 
   E-Man on the moon. </description> </book> </section> </library> 

1 tree parsing (i.e. dom-like)

We need require rexml/document libraries, and include Rexml:

Require ' rexml/document ' 
include rexml 
 
input = file.new ("books.xml") 
doc = document.new (input) 
 
root = Doc.root 
puts root.attributes[shelf]  # Recent acquisitions Doc.elements.each 
 
("Library/section") {|e| Puts e.attributes["name"} 
# Output: 
# Ruby 
# space 
 
Doc.elements.each ("*/section/book") {|e| puts E.attributes["ISBN"]} 
# Output: 
# 0672328844 
# 0321445619 # 
0684835509 # 
074325631X 
 
SEC2 = root.elements[2] 
author = sec2.elements[1].elements["Author"].text  # Robert Zubrin 

The note here is that the attributes and values in XML are represented as a hash, so we can extract the values we need through attributes[]. The value of an element can also be obtained by means of a string or an integer similar to path. It is 1-based rather than 0-based, with integers.

2 Stream parsing (i.e. Sax-like parsing)

Here's a little trick that defines a listener class that will be called back at parse:

Require ' rexml/document ' 
require ' Rexml/streamlistener ' 
include Rexml 
 
class MyListener 
 include Rexml::streamlistener 
 def tag_start (*args) 
 puts "Tag_start: #{args.map {|x| X.inspect}.join (', ')} " 
 End 
 
 def text (data) return to 
 if data =~/^\w*$/  # whitespace only 
 abbrev = DATA[0..40] + (Data.length > 40?) "...": "") 
 puts "text: #{abbrev.inspect}" end end 
 
list = mylistener.new 
Source = file.new "bo Oks.xml " 
Document.parse_stream" (Source, list) 


Here's an introduction to the Streamlistener module, which provides a few empty callback methods, so you can override it to achieve your own functionality. When parser enters a tag, it calls Tag_ The Start method. And the text method is similar, he only is when reads the data to be recalled, its output is this:

Tag_start: "Library", {"shelf" => "recent Acquisitions"} 
Tag_start: "section", {"name" => "Ruby"} 
tag_ Start: "book", {"ISBN" => "0672328844"} 
Tag_start: "title", {} 
text: "The Ruby Way" 


3 XPath

Rexml provides XPath support through an XPath class. It also supports Dom-like and sax-like. or the previous XML file, which we use XPath to do:

Book1 = Xpath.first (Doc, "//book") # Info for a-i-found 
p Book1 
 
# Print out all titles 
Xpath.each (Doc, "//title") {|e| puts e.text} 
 
# get an array of the ' author ' elements in the document. 
names = Xpath.match (Doc, "//author"). Map {|x| X.text} 
p names 

The output is similar to the following:

<book isbn= ' 0672328844 ' > ... </> the Ruby Way the case for 
Mars 
A-man:the life of Neil A. Rmstrong 
["Hal Fulton", "Robert Zubrin", "James R. Hansen"] 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.