Instance parsing Ruby program call Rexml to parse XML Format data Usage

Instance parsing Ruby program call Rexml to parse XML Format data Usage _ruby topic

Last Update:2017-01-18 Source: Internet

Author: User

Tags xml parser xpath

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Rexml is a library written by Sean Russell. It is not Ruby's only XML library, but it is a popular one and is written in pure Ruby (Nqxml is also written in Ruby, but Xmlparser encapsulates the Jade library written in C). In his Rexml overview, Russell commented:
I have this problem: I don't like confusing APIs. There are several XML parser APIs for Java implementations. Most of them follow DOM or SAX, and are very similar in principle to the many Java APIs that continue to appear. That is, they look like they were designed by theorists who have never used their own APIs. In general, the existing XML APIs are annoying. They use a markup language that is clearly designed to be very simple, first-class, and powerful, and then encapsulate it with annoying, excessive, and large APIs. Even for the most basic XML tree operations, I always have to refer to API documentation; Nothing is intuitive, and almost every operation is complicated.
Although I don't think it's annoying, I agree with Russell's view that XML APIs are a lot of work for most people who use them.

Example
look at the following book.xml:

Reference

 <library shelf= "recent acquisitions" > <section name= "Ruby" > <book ISB n= "0672328844" > <title>the Ruby way</title> <author>hal fulton</author> <descriptio N> Second Edition. 
   The book is are now reading. 
  Ain ' t recursion grand? 
   </description> </book> </section> <section name= "Space" > <book isbn= "0684835509" > <title>the Case for mars</title> <author>robert zubrin</author> <description>Pus 
   Hing toward a second home for the human race. </description> </book> <book isbn= "074325631X" > <title>first man:the Life of Neil A. Armstrong</title> <author>james R. hansen</author> <description>definitive biography of th 
   E-Man on the moon. </description> </book> </section> </library>

1 tree parsing (i.e. dom-like)

We need require rexml/document libraries, and include Rexml:

Require ' rexml/document ' 
include rexml 
 
input = file.new ("books.xml") 
doc = document.new (input) 
 
root = Doc.root 
puts root.attributes[shelf]  # Recent acquisitions Doc.elements.each 
 
("Library/section") {|e| Puts e.attributes["name"} 
# Output: 
# Ruby 
# space 
 
Doc.elements.each ("*/section/book") {|e| puts E.attributes["ISBN"]} 
# Output: 
# 0672328844 
# 0321445619 # 
0684835509 # 
074325631X 
 
SEC2 = root.elements[2] 
author = sec2.elements[1].elements["Author"].text  # Robert Zubrin

The note here is that the attributes and values in XML are represented as a hash, so we can extract the values we need through attributes[]. The value of an element can also be obtained by means of a string or an integer similar to path. It is 1-based rather than 0-based, with integers.

2 Stream parsing (i.e. Sax-like parsing)

Here's a little trick that defines a listener class that will be called back at parse:

Require ' rexml/document ' 
require ' Rexml/streamlistener ' 
include Rexml 
 
class MyListener 
 include Rexml::streamlistener 
 def tag_start (*args) 
 puts "Tag_start: #{args.map {|x| X.inspect}.join (', ')} " 
 End 
 
 def text (data) return to 
 if data =~/^\w*$/  # whitespace only 
 abbrev = DATA[0..40] + (Data.length > 40?) "...": "") 
 puts "text: #{abbrev.inspect}" end end 
 
list = mylistener.new 
Source = file.new "bo Oks.xml " 
Document.parse_stream" (Source, list)

Here's an introduction to the Streamlistener module, which provides a few empty callback methods, so you can override it to achieve your own functionality. When parser enters a tag, it calls Tag_ The Start method. And the text method is similar, he only is when reads the data to be recalled, its output is this:

Tag_start: "Library", {"shelf" => "recent Acquisitions"} 
Tag_start: "section", {"name" => "Ruby"} 
tag_ Start: "book", {"ISBN" => "0672328844"} 
Tag_start: "title", {} 
text: "The Ruby Way"

3 XPath

Rexml provides XPath support through an XPath class. It also supports Dom-like and sax-like. or the previous XML file, which we use XPath to do:

Book1 = Xpath.first (Doc, "//book") # Info for a-i-found 
p Book1 
 
# Print out all titles 
Xpath.each (Doc, "//title") {|e| puts e.text} 
 
# get an array of the ' author ' elements in the document. 
names = Xpath.match (Doc, "//author"). Map {|x| X.text} 
p names

The output is similar to the following:

<book isbn= ' 0672328844 ' > ... </> the Ruby Way the case for 
Mars 
A-man:the life of Neil A. Rmstrong 
["Hal Fulton", "Robert Zubrin", "James R. Hansen"]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More