Example parsing: the Ruby program calls REXML to parse XML format data usage, rubyrexml

Last Update:2016-04-14 Source: Internet

Author: User

Tags xml parser

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Example parsing: the Ruby program calls REXML to parse XML format data usage, rubyrexml

REXML is a library written by Sean Russell. It is not the only XML library of Ruby, but it is very popular and is written in pure Ruby (NQXML is also written in Ruby, but XMLParser encapsulates the Jade library written in C ). In his REXML overview, Russell commented:
I have the following question: I don't like confusing APIs. There are several XML Parser APIs for Java implementation. Most of them follow DOM or SAX, and the basic principle is very similar to the many emerging Java APIs. That is to say, they seem to have been designed by the IMG who have never used their own APIs. Generally, existing XML APIs are annoying. They use a clearly designed markup language that is very simple, class, and powerful, and then encapsulate it with annoying, excessive, and large APIs. Even for the most basic XML Tree operations, I always have to refer to the API documentation; nothing is intuitive, and almost every operation is complicated.
Although I don't think it is so disturbing, I agree with Russell that XML APIs undoubtedly bring a lot of work to most people who use them.

Example
See the following book. xml:

Reference

<library shelf="Recent Acquisitions">  <section name="Ruby">   <book isbn="0672328844">   <title>The Ruby Way</title>   <author>Hal Fulton</author>   <description>    Second edition. The book you are now reading.    Ain't recursion grand?   </description>   </book>  </section>  <section name="Space">   <book isbn="0684835509">    <title>The Case for Mars</title>    <author>Robert Zubrin</author>    <description>Pushing toward a second home for the human     race.    </description>   </book>   <book isbn="074325631X">    <title>First Man: The Life of Neil A. Armstrong</title>    <author>James R. Hansen</author>    <description>Definitive biography of the first man on     the moon.    </description>   </book>  </section> </library>

1 Tree Parsing (that is, DOM-like)

We need the require rexml/document library and include REXML:

require 'rexml/document' include REXML  input = File.new("books.xml") doc = Document.new(input)  root = doc.root puts root.attributes["shelf"]  # Recent Acquisitions  doc.elements.each("library/section") { |e| puts e.attributes["name"] } # Output: # Ruby # Space  doc.elements.each("*/section/book") { |e| puts e.attributes["isbn"] } # Output: # 0672328844 # 0321445619 # 0684835509 # 074325631X  sec2 = root.elements[2] author = sec2.elements[1].elements["author"].text  # Robert Zubrin

Note that the attribute and value in xml are represented as a hash, so we can extract the value we need through attributes, the element value can also be obtained through a string or integer similar to path. if an integer is used, the value is 1-based instead of 0-based.

2 Stream Parsing (that is, SAX-like Parsing)

Here we use a small trick, that is, to define a listener class, which will be called back during parse:

require 'rexml/document' require 'rexml/streamlistener' include REXML  class MyListener  include REXML::StreamListener  def tag_start(*args)  puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"  end   def text(data)  return if data =~ /^\w*$/  # whitespace only  abbrev = data[0..40] + (data.length > 40 ? "..." : "")  puts " text : #{abbrev.inspect}"  end end  list = MyListener.new source = File.new "books.xml" Document.parse_stream(source, list)

Here we will introduce the StreamListener module, which provides several empty callback methods, so you can overwrite it to implement your own functions. when parser enters a tag, the tag_start method is called. the text method is similar, but it is called back when the data is read. Its output is as follows:

tag_start: "library", {"shelf"=>"Recent Acquisitions"} tag_start: "section", {"name"=>"Ruby"} tag_start: "book", {"isbn"=>"0672328844"} tag_start: "title", {} text : "The Ruby Way"

3 XPath

REXML supports XPath through the Xpath class. It also supports DOM-like and SAX-like. Or the xml file above. We can do this using Xpath:

book1 = XPath.first(doc, "//book") # Info for first book found p book1  # Print out all titles XPath.each(doc, "//title") { |e| puts e.text }  # Get an array of all of the "author" elements in the document. names = XPath.match(doc, "//author").map {|x| x.text } p names

The output is similar to the following:

<book isbn='0672328844'> ... </> The Ruby Way The Case for Mars First Man: The Life of Neil A. Armstrong ["Hal Fulton", "Robert Zubrin", "James R. Hansen"]

Articles you may be interested in:

How to Use the XML data processing database rexml in Ruby
Ruby uses the REXML library to parse xml format data
How to create and parse XML files in Ruby programs
A simple tutorial on XML, XSLT, and XPath processing in Ruby
Tutorial on using Nokogiri package to operate XML format data in Ruby

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More