A simple tutorial on XML, XSLT, and XPath processing in Ruby, rubyxml+txpath

Source: Internet
Author: User
Tags xsl xslt parser

A simple tutorial on XML, XSLT, and XPath processing in Ruby, rubyxml+txpath

What is XML?

XML refers to the eXtensible Markup Language ).

Extensible Markup Language, a subset of Standard Generic Markup languages, a markup language used to mark electronic files to make them structured.

It can be used to mark data and define data types. It is a source language that allows users to define their own markup language. It is ideal for World Wide Web transmission and provides a unified way to describe and exchange structured data independent of applications or vendors.

XML Parser structure and API

XML Parser mainly includes DOM and SAX.

  1. The SAX Parser is based on event processing and needs to scan the XML document from start to end. During the scanning process, each time a syntax structure is encountered, it calls the event handler of the specific syntax structure and sends an event to the application.
  2. DOM is the parsing of the Document Object Model. It constructs the hierarchical syntax structure of the document and creates a DOM tree in the memory. The nodes of the DOM tree are identified as objects. After the document is parsed, the entire DOM tree of the document is stored in the memory.

Parse and create XML in Ruby

This REXML library can be used for parsing XML documents in RUBY.

The REXML library is an XML toolkit of ruby. It is written in pure Ruby and complies with the XML1.0 specification.

In Ruby1.8 and later versions, the RUBY standard library will contain REXML.

The rexml library path is: rexml/document

All methods and classes are encapsulated in a REXML module.

REXML parsers have the following advantages over other parsers:

  1. 100% written by Ruby.
  2. Applicable to the SAX and DOM parser.
  3. It is lightweight and contains less than 2000 lines of code.
  4. Methods and classes that are easy to understand.
  5. Based on the SAX2 API and complete XPath support.
  6. Use Ruby instead of installing it separately.

The following is the XML code of the instance and saved as movies. xml:

<collection shelf="New Arrivals"><movie title="Enemy Behind">  <type>War, Thriller</type>  <format>DVD</format>  <year>2003</year>  <rating>PG</rating>  <stars>10</stars>  <description>Talk about a US-Japan war</description></movie><movie title="Transformers">  <type>Anime, Science Fiction</type>  <format>DVD</format>  <year>1989</year>  <rating>R</rating>  <stars>8</stars>  <description>A schientific fiction</description></movie>  <movie title="Trigun">  <type>Anime, Action</type>  <format>DVD</format>  <episodes>4</episodes>  <rating>PG</rating>  <stars>10</stars>  <description>Vash the Stampede!</description></movie><movie title="Ishtar">  <type>Comedy</type>  <format>VHS</format>  <rating>PG</rating>  <stars>2</stars>  <description>Viewable boredom</description></movie></collection>

DOM parser

Let's first Parse XML data. First, we introduce the rexml/document library. Generally, we can introduce REXML in the top-level namespace:

#! /Usr/bin/ruby-w require 'rexml/document' include rexml xmlfile = File. new ("movies. xml ") xmldoc = Document. new (xmlfile) # Get the root element root = xmldoc. rootputs "Root element:" + root. attributes ["shelf"] # The following output is the title xmldoc. elements. each ("collection/movie") {| e | puts "Movie Title:" + e. attributes ["title"]} # The following output lists all movie types xmldoc. elements. each ("collection/movie/type") {| e | puts "Movie Type:" + e. text} # The following output all the movie descriptions xmldoc. elements. each ("collection/movie/description") {| e | puts "Movie Description:" + e. text}

The output result of the above instance is:

Root element : New ArrivalsMovie Title : Enemy BehindMovie Title : TransformersMovie Title : TrigunMovie Title : IshtarMovie Type : War, ThrillerMovie Type : Anime, Science FictionMovie Type : Anime, ActionMovie Type : ComedyMovie Description : Talk about a US-Japan warMovie Description : A schientific fictionMovie Description : Vash the Stampede!Movie Description : Viewable boredomSAX-like Parsing:

SAX Parser

Process the same data file: movies. xml. We do not recommend that you parse the file into a small file. The following is a simple example:

#!/usr/bin/ruby -w require 'rexml/document'require 'rexml/streamlistener'include REXML  class MyListener include REXML::StreamListener def tag_start(*args)  puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}" end  def text(data)  return if data =~ /^\w*$/   # whitespace only  abbrev = data[0..40] + (data.length > 40 ? "..." : "")  puts " text  :  #{abbrev.inspect}" endend list = MyListener.newxmlfile = File.new("movies.xml")Document.parse_stream(xmlfile, list)

The output result is as follows:

tag_start: "collection", {"shelf"=>"New Arrivals"}tag_start: "movie", {"title"=>"Enemy Behind"}tag_start: "type", {} text  :  "War, Thriller"tag_start: "format", {}tag_start: "year", {}tag_start: "rating", {}tag_start: "stars", {}tag_start: "description", {} text  :  "Talk about a US-Japan war"tag_start: "movie", {"title"=>"Transformers"}tag_start: "type", {} text  :  "Anime, Science Fiction"tag_start: "format", {}tag_start: "year", {}tag_start: "rating", {}tag_start: "stars", {}tag_start: "description", {} text  :  "A schientific fiction"tag_start: "movie", {"title"=>"Trigun"}tag_start: "type", {} text  :  "Anime, Action"tag_start: "format", {}tag_start: "episodes", {}tag_start: "rating", {}tag_start: "stars", {}tag_start: "description", {} text  :  "Vash the Stampede!"tag_start: "movie", {"title"=>"Ishtar"}tag_start: "type", {}tag_start: "format", {}tag_start: "rating", {}tag_start: "stars", {}tag_start: "description", {} text  :  "Viewable boredom"

XPath and Ruby

We can use XPath to view XML. XPath is a language for finding information in XML documents (View: XPath tutorial ).

XPath is the XML Path language. It is a language used to determine a part of a document in XML (a subset of Standard Generic Markup Language. The XML-based tree structure of XPath provides the ability to search nodes in the data structure tree.

Ruby supports XPath through the REXML XPath class, which is a tree-based analysis (Document Object Model ).

#! /Usr/bin/ruby-w require 'rexml/document' include rexml xmlfile = File. new ("movies. xml ") xmldoc = Document. new (xmlfile) # The first movie information movie = XPath. first (xmldoc, "// movie") p movie # print all movie types XPath. each (xmldoc, "// type") {| e | puts e. text} # obtain the types of all movie formats. The returned array names = XPath. match (xmldoc, "// format "). map {| x. text} p names

The output result of the above instance is:

<movie title='Enemy Behind'> ... </>War, ThrillerAnime, Science FictionAnime, ActionComedy["DVD", "DVD", "DVD", "VHS"]

XSLT and Ruby

There are two XSLT parser in Ruby, Which is briefly described below:
Ruby-Sablotron

This parser is compiled and maintained by Justice Masayoshi Takahash. This is mainly written for the Linux operating system and requires the following libraries:

  1. Sablot
  2. Iconv
  3. Expat

You can find these libraries in Ruby-Sablotron.
XSLT4R
Compile t4r is written by Michael norann. Javast4r is used for simple command line interaction and can be used by third-party applications to convert XML documents.

XSLT4R requires the XMLScan operation, including the XSLT4R archive. It is a Ruby module of 100%. These modules can be installed using the standard Ruby installation method (Ruby install. rb.

The syntax format of javast4r is as follows:

ruby xslt.rb stylesheet.xsl document.xml [arguments]

If you want to use javast4r in an application, you can introduce XSLT and enter the required parameters. Example:

require "xslt" stylesheet = File.readlines("stylesheet.xsl").to_sxml_doc = File.readlines("document.xml").to_sarguments = { 'image_dir' => '/....' } sheet = XSLT::Stylesheet.new( stylesheet, arguments ) # output to StdOutsheet.apply( xml_doc ) # output to 'str'str = ""sheet.output = [ str ]sheet.apply( xml_doc )

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.