Instructions on how to use the XML data processing database REXML in Ruby, rubyrexml
Use REXML as a tree
The purpose of REXML is to be enough. To the maximum extent, it can complete tasks well. In fact, REXML supports two different styles of XML Processing-"Tree" and "stream ". The first style is the simpler version used by DOM; the second style is the simpler version used by SAX. Let's study the tree style first. Suppose we want to extract the same address book document from the previous example. The following example is from the modified eval. rb; standard eval. rb (link to the Ruby tutorial) shows a very long calculation result based on the evaluation of complex object expressions-my eval. rb does not respond when no error occurs:
How to Use REXML to reference nested data
ruby> require "rexml/document"ruby> include REXMLruby> addrbook = (Document.new File.new "address.xml").rootruby> persons = addrbook.elements.to_a("//person")ruby> puts persons[1].elements["address"].attributes["city"]New York
This expression is common. The. to_a () method creates an array of all <person> elements in the document. It may be useful in other naming procedures. The element is a bit like a DOM node, but it is actually closer to the XML itself (and easier to use ). The. to_a () parameter is XPath. In this case, all <person> elements in any part of the document can be identified. If we only need elements on the first layer, we can use:
Create an array of matching elements
ruby> persons = addrbook.elements.to_a("/addressbook/person")
We can even use XPath more directly as an overloaded index for the. elements attribute. For example:
Another way to reference nested data using REXML
ruby> puts addrbook.elements["//person[2]/address"].attributes["city"]New York
Please note that XPath uses a 1-based index, unlike Ruby and Python arrays using a 0-based index. In other words, it is still the same person in the city where we are checking. By viewing REXML, please note that XPath uses a 1-based index, unlike Ruby and Python arrays using a 0-based index. In other words, it is still the same person in the city where we are checking. View
Display element XML source code with REXML
ruby> puts addrbook.elements["//person[2]/address"]<address city='New York' street='118 St.' number='344' state='NY'/>ruby> puts addrbook.elements["//person[2]/contact-info"]<contact-info> <email address='robb@iro.ibm.com'/>
In addition, XPath does not have to match only one element. We have seen this when defining the persons array, but another example emphasizes this:
Match Multiple Elements With XPath
ruby> puts addrbook.elements.to_a("//person/address[@state='CA']")<address city='Sacramento' street='Spruce Rd.' number='99' state='CA'/><address city='Los Angeles' street='Pine Rd.' number='1234' state='CA'/>
In contrast, the index of the. elements attribute only generates the First Matching Element:
When XPath only matches the first occurrence
ruby> puts addrbook.elements.to_a("//person/address[@state='CA']")<address city='Sacramento' street='Spruce Rd.' number='99' state='CA'/><address city='Los Angeles' street='Pine Rd.' number='1234' state='CA'/>
You can also use the XPath address through the XPath class in REXML, which has methods such as. first (),. each (), and. match.
A unique common method of REXML elements is the. each iterator. Although Ruby has a loop structure that can operate on collections, Ruby programmers generally prefer to use the iterator method to pass control to code blocks. The following two structures are equivalent, but the second structure has a more natural Ruby feeling:
Matching XPath in REXML for Iteration
ruby> for addr in addrbook.elements.to_a("//address[@state='CA']") | puts addr.attributes["city"] | endSacramentoLos Angelesruby> addrbook.elements.each("//address[@state='CA']") { | |addr| puts addr.attributes["city"] | }SacramentoLos Angeles
Use REXML as a stream
For the purpose of "enough", The REXML tree method may be the simplest method in Ruby. However, REXML also provides a stream method, which is like a more lightweight variant of SAX. Just like using SAX, REXML does not provide application programmers with default data structures from XML documents. Instead, the "listener" or "handler" class provides a set of methods to respond to various events in the Document Stream. The following are common collections: start tag, end tag, and element text.
Although the stream method is far from as easy as the tree method, it is usually much faster. The REXML tutorial claims that the stream speed is 1500 times faster. Although I have never tried benchmarking it, I guess this is a finite case (my small example is also instantly completed in the tree mode ). In short, if the speed is critical, the speed difference is likely to be significant.
Let's look at a very simple example. What it does is the same as the preceding example of "list California cities. It is relatively simple to extend it for complex document processing:
Stream processing of XML documents in REXML
ruby> require "rexml/document"ruby> require "rexml/streamlistener"ruby> include REXMLruby> class Handler | include StreamListener | def tag_start name, attrs | if name=="address" and attrs.assoc("state")[1]=="CA" | puts attrs.assoc("city")[1] | end | end | endruby> Document.parse_stream((File.new "address.xml"), Handler.new)SacramentoLos Angeles
One thing to note in the stream processing example is that the tag attribute is passed as a group of arrays, it does a little more work than hash (but it may be faster to create in the database ).
Encoding Problems
All text nodes in REXML are UTF-8 encoded, so pay attention to this in all called code. In a program, the string passed to REXML must be UTF-8 encoded.
REXML cannot always guess the encoding method of your text, so it is always assumed to be UTF-8 encoding. At the same time, if you try to add text in other encoding methods, REXML will not issue a warning. The add-on must ensure that you add the text of the UTF-8. It does not matter if standard ASCII 7-bit encoding is added. If you use ISO8859-1 text, you must convert to UTF-8 encoding before adding. You can use text. unpack ("C"). pack ("U "). Only Document. write () and Document. to_s () are supported. If you need to Output a node with a specific encoding, you must use Output to wrap the Output object.
e = Element.new "<a/>"e.text = "f\xfcr" # ISO-8859-1 '??'o = ''e.write( Output.new( o, "ISO-8859-1" ) )
Any supported encoding can be passed to Output.
Articles you may be interested in:
- Example parsing the usage of calling REXML In the Ruby program to parse XML format data
- Ruby uses the REXML library to parse xml format data
- How to create and parse XML files in Ruby programs
- A simple tutorial on XML, XSLT, and XPath processing in Ruby
- Tutorial on using Nokogiri package to operate XML format data in Ruby