The method for creating and parsing XML files in Ruby programs, rubyxml
Use builder to create XML
Builder installation method:
gem install builder
Require 'builder 'X = builder: XmlMarkup. new (: target => $ stdout,: indent => 1) # ": target => $ stdout" parameter: indicates that the output content will be written to the standard output console #": indent => 1 "parameter: XML output format will be indented by a space character x. instruct! : Xml,: version => '1. 1',: encoding => 'gb2312' x. comment! "Book information" x. library ("shelf" => "Recent Acquisitions") {x. section ("name" => "ruby") {x. book ("isbn" => "0672310001") {x. title "Programming Ruby" x. author "Yukihiro" x. description "Programming Ruby-The Pragmatic Programmer's Guide "}}}
P x # print XML
Ruby creates XML output results:
<? Xml version = "1.1" encoding = "gb2312"?> <! -- Book information --> <library shelf = "Recent Acquisitions"> <section name = "ruby"> <book isbn = "0672310001"> <title> Programming Ruby </title> <author> Yukihiro </author> <description> Programming Ruby-The Pragmatic Programmer's Guide </description> </book> </section> </library> <inspect/> # <IO: 0x2a06ae8>
Use ReXML to parse XML
REXML is a processor completely written in ruby. It has multiple APIs, two of which are distinguished by DOM-like and SAX-like. The first is to read the entire file into the memory and store it as a layered form (that is, a tree ). the second type is "parse as you go", which is suitable when your files are large and the memory is limited.
See the following book. xml:
Reference
<library shelf="Recent Acquisitions"> <section name="Ruby"> <book isbn="0672328844"> <title>The Ruby Way</title> <author>Hal Fulton</author> <description> Second edition. The book you are now reading. Ain't recursion grand? </description> </book> </section> <section name="Space"> <book isbn="0684835509"> <title>The Case for Mars</title> <author>Robert Zubrin</author> <description>Pushing toward a second home for the human race. </description> </book> <book isbn="074325631X"> <title>First Man: The Life of Neil A. Armstrong</title> <author>James R. Hansen</author> <description>Definitive biography of the first man on the moon. </description> </book> </section> </library>
1 Tree Parsing (that is, DOM-like)
We need the require rexml/document library and include REXML:
require 'rexml/document' include REXML input = File.new("books.xml") doc = Document.new(input) root = doc.root puts root.attributes["shelf"] # Recent Acquisitions doc.elements.each("library/section") { |e| puts e.attributes["name"] } # Output: # Ruby # Space doc.elements.each("*/section/book") { |e| puts e.attributes["isbn"] } # Output: # 0672328844 # 0321445619 # 0684835509 # 074325631X sec2 = root.elements[2] author = sec2.elements[1].elements["author"].text # Robert Zubrin
Note that the attribute and value in xml are represented as a hash, so we can extract the value we need through attributes, the element value can also be obtained through a string or integer similar to path. if an integer is used, the value is 1-based instead of 0-based.
2 Stream Parsing (that is, SAX-like Parsing)
Here we use a small trick, that is, to define a listener class, which will be called back during parse:
require 'rexml/document' require 'rexml/streamlistener' include REXML class MyListener include REXML::StreamListener def tag_start(*args) puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}" end def text(data) return if data =~ /^\w*$/ # whitespace only abbrev = data[0..40] + (data.length > 40 ? "..." : "") puts " text : #{abbrev.inspect}" end end list = MyListener.new source = File.new "books.xml" Document.parse_stream(source, list)
Here we will introduce the StreamListener module, which provides several empty callback methods, so you can overwrite it to implement your own functions. when parser enters a tag, the tag_start method is called. the text method is similar, but it is called back when the data is read. Its output is as follows:
tag_start: "library", {"shelf"=>"Recent Acquisitions"} tag_start: "section", {"name"=>"Ruby"} tag_start: "book", {"isbn"=>"0672328844"} tag_start: "title", {} text : "The Ruby Way" .........................................
3 XPath
REXML supports XPath through the Xpath class. It also supports DOM-like and SAX-like. Or the xml file above. We can do this using Xpath:
book1 = XPath.first(doc, "//book") # Info for first book found p book1 # Print out all titles XPath.each(doc, "//title") { |e| puts e.text } # Get an array of all of the "author" elements in the document. names = XPath.match(doc, "//author").map {|x| x.text } p names
The output is similar to the following:
<book isbn='0672328844'> ... </> The Ruby Way The Case for Mars First Man: The Life of Neil A. Armstrong ["Hal Fulton", "Robert Zubrin", "James R. Hansen"]