Ruby's XML format data parsing library Nokogiri advanced, rubynokogiri

Source: Internet
Author: User

Ruby's XML format data parsing library Nokogiri advanced, rubynokogiri


I. Basic syntax
1. Get the nokogiri object as a string directly:

html_doc = Nokogiri::HTML("

Html_doc and xml_doc here are the nokogiri files.

2. You can also get the nokogiri object through the file handle:

f = File.open("blossom.xml")doc = Nokogiri::XML(f)f.close

3. You can also get it directly from the website:

require 'open-uri'doc = Nokogiri::HTML(open("http://www.xxx.com/"))

Ii. XML file parsing example
Common Methods for capturing fields from XML/HTML files:

Now there is a file named shows. xml with the following content:

<root> <sitcoms>  <sitcom>   <name>Married with Children</name>   <characters>    <character>Al Bundy</character>    <character>Bud Bundy</character>    <character>Marcy Darcy</character>   </characters>  </sitcom>  <sitcom>   <name>Perfect Strangers</name>   <characters>    <character>Larry Appleton</character>    <character>Balki Bartokomous</character>   </characters>  </sitcom> </sitcoms> <dramas>  <drama>   <name>The A-Team</name>   <characters>    <character>John "Hannibal" Smith</character>    <character>Templeton "Face" Peck</character>    <character>"B.A." Baracus</character>    <character>"Howling Mad" Murdock</character>   </characters>  </drama> </dramas></root>

If you want to find out the content of all the character labels, you can do this:

@doc = Nokogiri::XML(File.open("shows.xml"))@doc.xpath("//character")

The xpath and css Methods return a node list, similar to an array. The content of this list is the matching nodes found from the file.

Check the character node list in the dramas node:

@doc.xpath("//dramas//character")

More readable css methods:

characters = @doc.css("sitcoms name")# => ["<name>Married with Children</name>", "<name>Perfect Strangers</name>"]

If you know that the query result is unique, you can directly use at_xpath or at_css if you want to directly return this result instead of the list:

@doc.css("dramas name").first # => "<name>The A-Team</name>"@doc.at_css("dramas name")  # => "<name>The A-Team</name>"

Iii. Namespaces
When there are multiple tags, The namespace plays a very important role.
For example, there is a parts. xml file:

<parts> <!-- Alice's Auto Parts Store --> <inventory xmlns="http://alicesautoparts.com/">  <tire>all weather</tire>  <tire>studded</tire>  <tire>extra wide</tire> </inventory> <!-- Bob's Bike Shop --> <inventory xmlns="http://bobsbikes.com/">  <tire>street</tire>  <tire>mountain</tire> </inventory></parts>

You can use a unique URL as namespaces to differentiate different tires tags:

@doc = Nokogiri::XML(File.read("parts.xml"))car_tires = @doc.xpath('//car:tire', 'car' => 'http://alicesautoparts.com/')bike_tires = @doc.xpath('//bike:tire', 'bike' => 'http://bobsbikes.com/')

To make it easier to use namespace, nokogiri is automatically bound to any suitable namespace found on the root node.
Nokogiri is automatically associated with the provided URL. This convention can reduce the amount of code.
For example, there is an atom. xml file:

<feed xmlns="http://www.w3.org/2005/Atom"> <title>Example Feed</title> <link href="http://example.org/"/> <updated>2003-12-13T18:30:02Z</updated> <author>  <name>John Doe</name> </author> <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id> <entry>  <title>Atom-Powered Robots Run Amok</title>  <link href="http://example.org/2003/12/13/atom03"/>  <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>  <updated>2003-12-13T18:30:02Z</updated>  <summary>Some text.</summary> </entry></feed>

As mentioned above, xmlns has been automatically bound and you do not need to assign values to xmlns manually:

@doc.xpath('//xmlns:title')# => ["<title>Example Feed</title>", "<title>Atom-Powered Robots Run Amok</title>"]

In the same case, css usage:

@doc.css('xmlns|title')

In addition, if the namespaces name is xmlns when css is used, you can ignore this word:

@doc.css('title')


Articles you may be interested in:
  • Tutorial on using Nokogiri package to operate XML format data in Ruby
  • How to Use the XML data processing database rexml in Ruby
  • Example parsing the usage of calling REXML In the Ruby program to parse XML format data
  • Ruby uses the REXML library to parse xml format data
  • How to create and parse XML files in Ruby programs
  • A simple tutorial on XML, XSLT, and XPath processing in Ruby

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.