A simple tutorial on working with XML and XSLT and XPath in Ruby

A simple tutorial on working with XML and XSLT and XPath in Ruby _ruby topics

Last Update:2017-01-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is XML?

XML refers to Extensible Markup Language (extensible Markup Language).

Extensible Markup Language, a subset of standard generic markup languages, a markup language that is used to mark electronic files so that they are structured.

It can be used to mark data, define data types, and is a source language that allows users to define their own markup language. It is ideal for world Wide Web transmission, providing a unified approach to describing and exchanging structured data that is independent of applications or vendors.

XML Parser Architecture and APIs

The parser for XML mainly has two kinds of dom and sax.

The SAX parser is event-based and needs to be scanned from the beginning to the end, and every time a syntax structure is encountered during the scan, an event handler for that particular syntax structure is invoked to send an event to the application.
Dom is the Document Object model parsing, constructs the hierarchical grammatical structure of the document, establishes the DOM tree in the memory, the DOM tree's node is the object form to identify, after the document resolves Wencheng, the document's entire DOM tree will put in the memory.

Parsing and creating XML in Ruby

The parsing of XML documents in Ruby can use this library Rexml library.

The Rexml Library is an XML toolkit for Ruby that is written in a pure Ruby language and adheres to the XML1.0 specification.

In the Ruby1.8 version and beyond, the Ruby standard library will contain rexml.

The path to the Rexml library is: rexml/document

All methods and classes are encapsulated into a rexml module.

The Rexml parser has the following advantages over other parsers:

100% is written by Ruby.
can be applied to SAX and DOM parsers.
It is lightweight and less than 2000 lines of code.
Methods and classes that are easy to understand.
Based on the SAX2 API and full XPath support.
Install using Ruby without having to install it separately.

The following is the XML code for the instance, saved as Movies.xml:

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
  <type>War, Thriller</type>
  <format>DVD</format>
  <year>2003</year>
  <rating>PG</rating>
  <stars>10</stars>
  <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
  <type>Anime, Science Fiction</type>
  <format>DVD</format>
  <year>1989</year>
  <rating>R</rating>
  <stars>8</stars>
  <description>A schientific fiction</description>
</movie>
  <movie title="Trigun">
  <type>Anime, Action</type>
  <format>DVD</format>
  <episodes>4</episodes>
  <rating>PG</rating>
  <stars>10</stars>
  <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
  <type>Comedy</type>
  <format>VHS</format>
  <rating>PG</rating>
  <stars>2</stars>
  <description>Viewable boredom</description>
</movie>
</collection>

DOM Parser

Let's first parse the XML data first by introducing the Rexml/document library, where we can usually introduce the rexml to the top-level namespaces:

#!/usr/bin/ruby-w
 
require ' rexml/document '
include rexml
 
xmlfile = file.new ("Movies.xml")
xmldoc =  Document.new (xmlfile)
 
# get root element
root = Xmldoc.root
puts "root element:" + root.attributes["shelf"]
 
# The following will output the movie title
Xmldoc.elements.each ("Collection/movie") {
  |e| puts "movie title:" + e.attributes["title"]
 
# The following will output all movie types
xmldoc.elements.each ("Collection/movie/type") {
  |e| puts "movie type:" + E.text
}
 
# The following will output all movie descriptions
Xmldoc.elements.each ("Collection/movie/description") {
  |e| puts "movie Description:" + E.text
}

The result of the above example output is:

Root element:new Arrivals
Movie title:enemy behind Movie title:transformers Movie title:trigun
Mov IE Title:ishtar
Movie type:war, Thriller
Movie type:anime, science fiction
Movie, Type:anime, Actionmovie type:comedy
Movie description:talk about a Us-japan war Movie description:a schientific fictionmovie Description:vash the stampede!
Movie description:viewable Boredom
sax-like parsing:

SAX Parser

Working with the same data file: Movies.xml, it is not recommended that sax be parsed into a small file, and here's a simple example:

#!/usr/bin/ruby-w
 
require ' rexml/document '
require ' Rexml/streamlistener '
include Rexml
 
 
class MyListener
 include Rexml::streamlistener
 def tag_start (*args)
  puts "Tag_start: #{args.map {|x| X.inspect}.join (', ')} "
 End
 
 def text (data) return to
  if data =~/^\w*$/   # whitespace only
  abbrev = DATA[0..40] + (Data.length > 40?) "...": "")
  puts "text  :  #{abbrev.inspect}" end end
 
list = Mylistener.new
XMLFile = File.new ("Movies.xml")
document.parse_stream (xmlfile, list)

The above output results are:

tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
 text  :  "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
 text  :  "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
 text  :  "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
 text  :  "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
 text  :  "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
 text  :  "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
 text  :  "Viewable boredom"

XPath and Ruby

We can use XPath to view XML, an XPath is a language for finding information in an XML document (see: XPath tutorial).

XPath is the XML Path language, which is a language used to determine the location of parts of a document in the XML (a subset of standard generic markup languages). XPath is based on an XML tree structure that provides the ability to find nodes in a data structure tree.

Ruby supports XPath through the Rexml XPath class, which is a tree based analysis (Document Object model).

#!/usr/bin/ruby-w
 
require ' rexml/document '
include rexml
 
xmlfile = file.new ("Movies.xml")
xmldoc = Document.new (xmlfile)
 
# first movie information
movie = Xpath.first (xmldoc, "//movie")
P movie
 
# Print all movie types
Xpath.each (xmldoc, "//type") {|e| puts e.text}
 
# Gets the type of all movie formats, returns an array of
names = Xpath.match (xmldoc, "//format"). Map {|x| X.text}
p names

The result of the above example output is:

<movie title= ' enemy behind ' > </>
War, Thriller anime, science
fiction anime
, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]

XSLT and Ruby

There are two XSLT parsers in Ruby, and here's a brief description:
Ruby-sablotron

This parser is written and maintained by the Justice Masayoshi Takahash. This is primarily written for the Linux operating system and requires the following libraries:

Sablot
Iconv
Expat

You can find these libraries in Ruby-sablotron.
xslt4r
xslt4r is written by Michael Neumann. XSLT4R is used for simple command-line interactions that can be used by Third-party applications to transform XML documents.

XSLT4R requires Xmlscan operations, including XSLT4R archiving, which is a 100% ruby module. These modules can be installed using a standard Ruby installation method (that is, Ruby Install.rb).

The XSLT4R syntax format is as follows:

Ruby xslt.rb stylesheet.xsl document.xml [arguments]

If you want to use XSLT4R in your application, you can introduce XSLT and enter the parameters you need. Examples are as follows:

Require "XSLT"
 
stylesheet = File.readlines ("stylesheet.xsl"). to_s xml_doc
= File.readlines ("Document.xml"). to_s
arguments = {' Image_dir ' => '/... '}
 
Sheet = xslt::stylesheet.new (Stylesheet, arguments)
 
# output to StdOut
sheet.apply (xml_doc)
 
# output to ' s TR '
str = ' "
sheet.output = [STR]
sheet.apply (Xml_doc)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A simple tutorial on working with XML and XSLT and XPath in Ruby _ruby topics

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A simple tutorial on working with XML and XSLT and XPath in Ruby _ruby topics

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support