Example parsing: the Ruby program calls REXML to parse XML format data usage, rubyrexml
REXML is a library written by Sean Russell. It is not the only XML library of Ruby, but it is very popular and is written in pure Ruby (NQXML is also written in Ruby, but XMLParser encapsulates the Jade library written in C ). In his REXML overview, Russell commented:
I have the following question: I don't like confusing APIs. There are several XML Parser APIs for Java implementation. Most of them follow DOM or SAX, and the basic principle is very similar to the many emerging Java APIs. That is to say, they seem to have been designed by the IMG who have never used their own APIs. Generally, existing XML APIs are annoying. They use a clearly designed markup language that is very simple, class, and powerful, and then encapsulate it with annoying, excessive, and large APIs. Even for the most basic XML Tree operations, I always have to refer to the API documentation; nothing is intuitive, and almost every operation is complicated.
Although I don't think it is so disturbing, I agree with Russell that XML APIs undoubtedly bring a lot of work to most people who use them.
Example
See the following book. xml:
Reference
<library shelf="Recent Acquisitions"> <section name="Ruby"> <book isbn="0672328844"> <title>The Ruby Way</title> <author>Hal Fulton</author> <description> Second edition. The book you are now reading. Ain't recursion grand? </description> </book> </section> <section name="Space"> <book isbn="0684835509"> <title>The Case for Mars</title> <author>Robert Zubrin</author> <description>Pushing toward a second home for the human race. </description> </book> <book isbn="074325631X"> <title>First Man: The Life of Neil A. Armstrong</title> <author>James R. Hansen</author> <description>Definitive biography of the first man on the moon. </description> </book> </section> </library>
1 Tree Parsing (that is, DOM-like)
We need the require rexml/document library and include REXML:
require 'rexml/document' include REXML input = File.new("books.xml") doc = Document.new(input) root = doc.root puts root.attributes["shelf"] # Recent Acquisitions doc.elements.each("library/section") { |e| puts e.attributes["name"] } # Output: # Ruby # Space doc.elements.each("*/section/book") { |e| puts e.attributes["isbn"] } # Output: # 0672328844 # 0321445619 # 0684835509 # 074325631X sec2 = root.elements[2] author = sec2.elements[1].elements["author"].text # Robert Zubrin
Note that the attribute and value in xml are represented as a hash, so we can extract the value we need through attributes, the element value can also be obtained through a string or integer similar to path. if an integer is used, the value is 1-based instead of 0-based.
2 Stream Parsing (that is, SAX-like Parsing)
Here we use a small trick, that is, to define a listener class, which will be called back during parse:
require 'rexml/document' require 'rexml/streamlistener' include REXML class MyListener include REXML::StreamListener def tag_start(*args) puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}" end def text(data) return if data =~ /^\w*$/ # whitespace only abbrev = data[0..40] + (data.length > 40 ? "..." : "") puts " text : #{abbrev.inspect}" end end list = MyListener.new source = File.new "books.xml" Document.parse_stream(source, list)
Here we will introduce the StreamListener module, which provides several empty callback methods, so you can overwrite it to implement your own functions. when parser enters a tag, the tag_start method is called. the text method is similar, but it is called back when the data is read. Its output is as follows:
tag_start: "library", {"shelf"=>"Recent Acquisitions"} tag_start: "section", {"name"=>"Ruby"} tag_start: "book", {"isbn"=>"0672328844"} tag_start: "title", {} text : "The Ruby Way"
3 XPath
REXML supports XPath through the Xpath class. It also supports DOM-like and SAX-like. Or the xml file above. We can do this using Xpath:
book1 = XPath.first(doc, "//book") # Info for first book found p book1 # Print out all titles XPath.each(doc, "//title") { |e| puts e.text } # Get an array of all of the "author" elements in the document. names = XPath.match(doc, "//author").map {|x| x.text } p names
The output is similar to the following:
<book isbn='0672328844'> ... </> The Ruby Way The Case for Mars First Man: The Life of Neil A. Armstrong ["Hal Fulton", "Robert Zubrin", "James R. Hansen"]
Articles you may be interested in:
- How to Use the XML data processing database rexml in Ruby
- Ruby uses the REXML library to parse xml format data
- How to create and parse XML files in Ruby programs
- A simple tutorial on XML, XSLT, and XPath processing in Ruby
- Tutorial on using Nokogiri package to operate XML format data in Ruby