Guidelines for the use of Rexml in XML Format data processing libraries in Ruby _ruby topics

Source: Internet
Author: User
Tags arrays eval xpath

Using Rexml in tree mode
The purpose of Rexml is just enough. To the greatest extent, it can accomplish the task well. In fact, Rexml supports two different styles of XML processing-"tree" and "stream". The first style is a simpler version of what DOM tries to do; the second style is a simpler version of what SAX is trying to do. Let's look at the tree style first. Let's say we want to extract the same address book document from the previous example. The following example comes from the modified eval.rb that I created, the standard eval.rb (link to the Ruby tutorial), which shows very long computations based on the expression evaluation of complex objects-my eval.rb does not respond without error:
How to use Rexml to refer to nested data

ruby> require "rexml/document"
ruby> include rexml ruby> addrbook
= (document.new file.new " Address.xml "). Root
ruby> persons = addrbook.elements.to_a ("//person ")
Ruby> puts persons[1].elements[ "Address"].attributes["City"]
New York

This expression is very common. The. To_a () method creates an array of all <person> elements in the document, which may be useful in other names. The element is somewhat like a DOM node, but it is actually closer to the XML itself (and simpler to use). The parameter of the. to_a () is XPath, in which case you can identify all the <person> elements anywhere in the document. If we only need the elements on the first level, we can use:
To create an array of matching elements

ruby> persons = Addrbook.elements.to_a ("/addressbook/person")

We can even more directly use XPath as an overloaded index for the. Elements property. For example:
Another way to use Rexml to refer to nested data

Ruby> puts addrbook.elements["//person[2]/address"].attributes["City"]
New York

Note that XPath uses a 1 based index and does not use a 0 based index, as in Ruby and Python arrays. In other words, it's still the same person we're checking out in the city. By looking at Rexml, note that XPath uses a 1 based index and does not use a 0 based index like Ruby and Python arrays. In other words, it's still the same person we're checking out in the city. By viewing
To display an element's XML source code with Rexml

Ruby> puts addrbook.elements["//person[2]/address"] <address city= ' New York ' street= ' 118 St. ' number= '
344 ' state= ' NY '/>
ruby> puts addrbook.elements["//person[2]/contact-info"]
<contact-info>
 <email address= ' robb@iro.ibm.com '/>
  
 

Also, XPath does not have to match only one element. We've seen it while defining the persons array, but another example emphasizes this:
Match multiple elements to XPath

Ruby> puts Addrbook.elements.to_a ("//person/address[@state = ' CA '") <address city= ' Sacramento ' street= '
Spruce Rd. ' number= ' state= ' Ca '/> <address city= ' Los Angeles ' street= '-Pine Rd. ' number= ' 1234 ' state= '
Ca ' >

In contrast, the index of the. Elements property produces only the first matching element:
When XPath only matches the first occurrence

Ruby> puts Addrbook.elements.to_a ("//person/address[@state = ' CA '") <address city= ' Sacramento ' street= '
Spruce Rd. ' number= ' state= ' Ca '/> <address city= ' Los Angeles ' street= '-Pine Rd. ' number= ' 1234 ' state= '
Ca ' >

You can also use an XPath address in the XPath class in Rexml, which has methods such as. A,. each () and. Match ().
A unique idiom for the rexml element is the. Each iterator. Although Ruby has a looping structure for the collection to operate on, ruby programmers often prefer to use an iterator method to pass control to a block of code. The following two structures are equivalent, but the second structure has a more natural Ruby sense:
Iterating by matching XPath in Rexml

Ruby> for addr in Addrbook.elements.to_a ("//address[@state = ' CA ']")
  |  Puts addr.attributes["city"]
  | end
Sacramento
Los Angeles
ruby> Addrbook.elements.each ("// address[@state = ' CA '] ') {
  |  | addr| Puts addr.attributes["City"
  |}
Sacramento
Los Angeles

Using Rexml in streaming mode
for the sake of "just enough," the rexml tree may be the easiest way to Ruby. But Rexml also provides a way of streaming, which is like a lighter variant of SAX. As with SAX, Rexml does not provide the application programmer with the default data structure from the XML document. Instead, the "listener" or "handler" class is responsible for providing a set of methods that respond to various events in the document stream. The following are common collections: start tag, end tag, encountered element text, and so on.
Although the flow pattern is far from as easy as working in a tree, it usually takes a lot faster. The Rexml tutorial claims that the flow mode is 1500 times times faster. I didn't try to benchmark it, but I guess it was a limited situation (my little example was instantaneous in the tree mode). In short, if speed matters, the difference in speed is likely to be significant.
Let's look at a very simple example that does the same thing as the "listing California Cities" example above. It is relatively straightforward to extend it for complex document processing:
Flow processing of XML documents in Rexml

ruby> require "rexml/document"
ruby> require "Rexml/streamlistener"
ruby> include Rexml
Ruby > class Handler
  |  Include Streamlistener
  |  def tag_start name, attrs
  |    If name== ' address ' and ATTRS.ASSOC (' state ') [1]== ' CA
  |     Puts Attrs.assoc ("City") [1]
  |    End
  |  End
  |
ruby> document.parse_stream ((file.new "Address.xml"), handler.new)
Sacramento
Los Angeles

One thing to note in the streaming example is that the Tag property is passed as a set of arrays, and it handles a little more work than hashing (but may be created faster in the library).

Coding problems
rexml All text nodes are encoded in UTF-8, all calling code should pay attention to this, in the program, the string passed to the Rexml must be UTF-8 encoded.

Rexml can't always correctly guess how your text is encoded, so it's always assumed to be UTF-8 encoded. Also, if you try to add text in another encoding, Rexml does not issue a warning. The additive must ensure that the UTF-8 text is added to the user. It does not matter if you add a standard ASCII 7-bit encoding. If you use iso8859-1 text, you must convert to UTF-8 encoding before you add it. You can use Text.unpack ("C"). Pack ("U"). Change encoding for output, only document.write () and document.to_s () support. If you need to output a specific encoded node, you must wrap the output object in print.

E = element.new "<a/>"
e.text = "F\XFCR"  # iso-8859-1 '???
o = '
e.write (output.new (O, "iso-8859-1"))

You can pass any supported encodings to output.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.