Example parsing: the Ruby program calls REXML to parse XML format data usage, rubyrexml

Source: Internet
Author: User
Tags xml parser

Example parsing: the Ruby program calls REXML to parse XML format data usage, rubyrexml

REXML is a library written by Sean Russell. It is not the only XML library of Ruby, but it is very popular and is written in pure Ruby (NQXML is also written in Ruby, but XMLParser encapsulates the Jade library written in C ). In his REXML overview, Russell commented:
I have the following question: I don't like confusing APIs. There are several XML Parser APIs for Java implementation. Most of them follow DOM or SAX, and the basic principle is very similar to the many emerging Java APIs. That is to say, they seem to have been designed by the IMG who have never used their own APIs. Generally, existing XML APIs are annoying. They use a clearly designed markup language that is very simple, class, and powerful, and then encapsulate it with annoying, excessive, and large APIs. Even for the most basic XML Tree operations, I always have to refer to the API documentation; nothing is intuitive, and almost every operation is complicated.
Although I don't think it is so disturbing, I agree with Russell that XML APIs undoubtedly bring a lot of work to most people who use them.

Example
See the following book. xml:

Reference

<library shelf="Recent Acquisitions">  <section name="Ruby">   <book isbn="0672328844">   <title>The Ruby Way</title>   <author>Hal Fulton</author>   <description>    Second edition. The book you are now reading.    Ain't recursion grand?   </description>   </book>  </section>  <section name="Space">   <book isbn="0684835509">    <title>The Case for Mars</title>    <author>Robert Zubrin</author>    <description>Pushing toward a second home for the human     race.    </description>   </book>   <book isbn="074325631X">    <title>First Man: The Life of Neil A. Armstrong</title>    <author>James R. Hansen</author>    <description>Definitive biography of the first man on     the moon.    </description>   </book>  </section> </library>

1 Tree Parsing (that is, DOM-like)

We need the require rexml/document library and include REXML:

require 'rexml/document' include REXML  input = File.new("books.xml") doc = Document.new(input)  root = doc.root puts root.attributes["shelf"]  # Recent Acquisitions  doc.elements.each("library/section") { |e| puts e.attributes["name"] } # Output: # Ruby # Space  doc.elements.each("*/section/book") { |e| puts e.attributes["isbn"] } # Output: # 0672328844 # 0321445619 # 0684835509 # 074325631X  sec2 = root.elements[2] author = sec2.elements[1].elements["author"].text  # Robert Zubrin 

Note that the attribute and value in xml are represented as a hash, so we can extract the value we need through attributes, the element value can also be obtained through a string or integer similar to path. if an integer is used, the value is 1-based instead of 0-based.

2 Stream Parsing (that is, SAX-like Parsing)

Here we use a small trick, that is, to define a listener class, which will be called back during parse:

require 'rexml/document' require 'rexml/streamlistener' include REXML  class MyListener  include REXML::StreamListener  def tag_start(*args)  puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}"  end   def text(data)  return if data =~ /^\w*$/  # whitespace only  abbrev = data[0..40] + (data.length > 40 ? "..." : "")  puts " text : #{abbrev.inspect}"  end end  list = MyListener.new source = File.new "books.xml" Document.parse_stream(source, list) 


Here we will introduce the StreamListener module, which provides several empty callback methods, so you can overwrite it to implement your own functions. when parser enters a tag, the tag_start method is called. the text method is similar, but it is called back when the data is read. Its output is as follows:

tag_start: "library", {"shelf"=>"Recent Acquisitions"} tag_start: "section", {"name"=>"Ruby"} tag_start: "book", {"isbn"=>"0672328844"} tag_start: "title", {} text : "The Ruby Way" 


3 XPath

REXML supports XPath through the Xpath class. It also supports DOM-like and SAX-like. Or the xml file above. We can do this using Xpath:

book1 = XPath.first(doc, "//book") # Info for first book found p book1  # Print out all titles XPath.each(doc, "//title") { |e| puts e.text }  # Get an array of all of the "author" elements in the document. names = XPath.match(doc, "//author").map {|x| x.text } p names 

The output is similar to the following:

<book isbn='0672328844'> ... </> The Ruby Way The Case for Mars First Man: The Life of Neil A. Armstrong ["Hal Fulton", "Robert Zubrin", "James R. Hansen"] 

Articles you may be interested in:
  • How to Use the XML data processing database rexml in Ruby
  • Ruby uses the REXML library to parse xml format data
  • How to create and parse XML files in Ruby programs
  • A simple tutorial on XML, XSLT, and XPath processing in Ruby
  • Tutorial on using Nokogiri package to operate XML format data in Ruby

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.