A simple tutorial on working with XML and XSLT and XPath in Ruby _ruby topics

Source: Internet
Author: User
Tags require xml parser xpath xsl xslt


What is XML?



XML refers to Extensible Markup Language (extensible Markup Language).



Extensible Markup Language, a subset of standard generic markup languages, a markup language that is used to mark electronic files so that they are structured.



It can be used to mark data, define data types, and is a source language that allows users to define their own markup language. It is ideal for world Wide Web transmission, providing a unified approach to describing and exchanging structured data that is independent of applications or vendors.



XML Parser Architecture and APIs



The parser for XML mainly has two kinds of dom and sax.


    1. The SAX parser is event-based and needs to be scanned from the beginning to the end, and every time a syntax structure is encountered during the scan, an event handler for that particular syntax structure is invoked to send an event to the application.
    2. Dom is the Document Object model parsing, constructs the hierarchical grammatical structure of the document, establishes the DOM tree in the memory, the DOM tree's node is the object form to identify, after the document resolves Wencheng, the document's entire DOM tree will put in the memory.


Parsing and creating XML in Ruby



The parsing of XML documents in Ruby can use this library Rexml library.



The Rexml Library is an XML toolkit for Ruby that is written in a pure Ruby language and adheres to the XML1.0 specification.



In the Ruby1.8 version and beyond, the Ruby standard library will contain rexml.



The path to the Rexml library is: rexml/document



All methods and classes are encapsulated into a rexml module.



The Rexml parser has the following advantages over other parsers:


    1. 100% is written by Ruby.
    2. can be applied to SAX and DOM parsers.
    3. It is lightweight and less than 2000 lines of code.
    4. Methods and classes that are easy to understand.
    5. Based on the SAX2 API and full XPath support.
    6. Install using Ruby without having to install it separately.


The following is the XML code for the instance, saved as Movies.xml:


<collection shelf="New Arrivals">
<movie title="Enemy Behind">
  <type>War, Thriller</type>
  <format>DVD</format>
  <year>2003</year>
  <rating>PG</rating>
  <stars>10</stars>
  <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
  <type>Anime, Science Fiction</type>
  <format>DVD</format>
  <year>1989</year>
  <rating>R</rating>
  <stars>8</stars>
  <description>A schientific fiction</description>
</movie>
  <movie title="Trigun">
  <type>Anime, Action</type>
  <format>DVD</format>
  <episodes>4</episodes>
  <rating>PG</rating>
  <stars>10</stars>
  <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
  <type>Comedy</type>
  <format>VHS</format>
  <rating>PG</rating>
  <stars>2</stars>
  <description>Viewable boredom</description>
</movie>
</collection>


DOM Parser



Let's first parse the XML data first by introducing the Rexml/document library, where we can usually introduce the rexml to the top-level namespaces:


#!/usr/bin/ruby-w
 
require ' rexml/document '
include rexml
 
xmlfile = file.new ("Movies.xml")
xmldoc =  Document.new (xmlfile)
 
# get root element
root = Xmldoc.root
puts "root element:" + root.attributes["shelf"]
 
# The following will output the movie title
Xmldoc.elements.each ("Collection/movie") {
  |e| puts "movie title:" + e.attributes["title"]
 
# The following will output all movie types
xmldoc.elements.each ("Collection/movie/type") {
  |e| puts "movie type:" + E.text
}
 
# The following will output all movie descriptions
Xmldoc.elements.each ("Collection/movie/description") {
  |e| puts "movie Description:" + E.text
}


The result of the above example output is:


Root element:new Arrivals
Movie title:enemy behind Movie title:transformers Movie title:trigun
Mov IE Title:ishtar
Movie type:war, Thriller
Movie type:anime, science fiction
Movie, Type:anime, Actionmovie type:comedy
Movie description:talk about a Us-japan war Movie description:a schientific fictionmovie Description:vash the stampede!
Movie description:viewable Boredom
sax-like parsing:


SAX Parser



Working with the same data file: Movies.xml, it is not recommended that sax be parsed into a small file, and here's a simple example:


#!/usr/bin/ruby-w
 
require ' rexml/document '
require ' Rexml/streamlistener '
include Rexml
 
 
class MyListener
 include Rexml::streamlistener
 def tag_start (*args)
  puts "Tag_start: #{args.map {|x| X.inspect}.join (', ')} "
 End
 
 def text (data) return to
  if data =~/^\w*$/   # whitespace only
  abbrev = DATA[0..40] + (Data.length > 40?) "...": "")
  puts "text  :  #{abbrev.inspect}" end end
 
list = Mylistener.new
XMLFile = File.new ("Movies.xml")
document.parse_stream (xmlfile, list)


The above output results are:


tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
 text  :  "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
 text  :  "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
 text  :  "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
 text  :  "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
 text  :  "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
 text  :  "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
 text  :  "Viewable boredom"


XPath and Ruby



We can use XPath to view XML, an XPath is a language for finding information in an XML document (see: XPath tutorial).



XPath is the XML Path language, which is a language used to determine the location of parts of a document in the XML (a subset of standard generic markup languages). XPath is based on an XML tree structure that provides the ability to find nodes in a data structure tree.



Ruby supports XPath through the Rexml XPath class, which is a tree based analysis (Document Object model).


#!/usr/bin/ruby-w
 
require ' rexml/document '
include rexml
 
xmlfile = file.new ("Movies.xml")
xmldoc = Document.new (xmlfile)
 
# first movie information
movie = Xpath.first (xmldoc, "//movie")
P movie
 
# Print all movie types
Xpath.each (xmldoc, "//type") {|e| puts e.text}
 
# Gets the type of all movie formats, returns an array of
names = Xpath.match (xmldoc, "//format"). Map {|x| X.text}
p names


The result of the above example output is:


<movie title= ' enemy behind ' > </>
War, Thriller anime, science
fiction anime
, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]


XSLT and Ruby



There are two XSLT parsers in Ruby, and here's a brief description:
Ruby-sablotron



This parser is written and maintained by the Justice Masayoshi Takahash. This is primarily written for the Linux operating system and requires the following libraries:


    1. Sablot
    2. Iconv
    3. Expat


You can find these libraries in Ruby-sablotron.
xslt4r
xslt4r is written by Michael Neumann. XSLT4R is used for simple command-line interactions that can be used by Third-party applications to transform XML documents.



XSLT4R requires Xmlscan operations, including XSLT4R archiving, which is a 100% ruby module. These modules can be installed using a standard Ruby installation method (that is, Ruby Install.rb).



The XSLT4R syntax format is as follows:


Ruby xslt.rb stylesheet.xsl document.xml [arguments]


If you want to use XSLT4R in your application, you can introduce XSLT and enter the parameters you need. Examples are as follows:


Require "XSLT"
 
stylesheet = File.readlines ("stylesheet.xsl"). to_s xml_doc
= File.readlines ("Document.xml"). to_s
arguments = {' Image_dir ' => '/... '}
 
Sheet = xslt::stylesheet.new (Stylesheet, arguments)
 
# output to StdOut
sheet.apply (xml_doc)
 
# output to ' s TR '
str = ' "
sheet.output = [STR]
sheet.apply (Xml_doc)




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.