What is XML?
XML refers to Extensible Markup Language (extensible Markup Language).
Extensible Markup Language, a subset of standard generic markup languages, a markup language that is used to mark electronic files so that they are structured.
It can be used to mark data, define data types, and is a source language that allows users to define their own markup language. It is ideal for world Wide Web transmission, providing a unified approach to describing and exchanging structured data that is independent of applications or vendors.
XML Parser Architecture and APIs
The parser for XML mainly has two kinds of dom and sax.
- The SAX parser is event-based and needs to be scanned from the beginning to the end, and every time a syntax structure is encountered during the scan, an event handler for that particular syntax structure is invoked to send an event to the application.
- Dom is the Document Object model parsing, constructs the hierarchical grammatical structure of the document, establishes the DOM tree in the memory, the DOM tree's node is the object form to identify, after the document resolves Wencheng, the document's entire DOM tree will put in the memory.
Parsing and creating XML in Ruby
The parsing of XML documents in Ruby can use this library Rexml library.
The Rexml Library is an XML toolkit for Ruby that is written in a pure Ruby language and adheres to the XML1.0 specification.
In the Ruby1.8 version and beyond, the Ruby standard library will contain rexml.
The path to the Rexml library is: rexml/document
All methods and classes are encapsulated into a rexml module.
The Rexml parser has the following advantages over other parsers:
- 100% is written by Ruby.
- can be applied to SAX and DOM parsers.
- It is lightweight and less than 2000 lines of code.
- Methods and classes that are easy to understand.
- Based on the SAX2 API and full XPath support.
- Install using Ruby without having to install it separately.
The following is the XML code for the instance, saved as Movies.xml:
<collection shelf="New Arrivals">
<movie title="Enemy Behind">
<type>War, Thriller</type>
<format>DVD</format>
<year>2003</year>
<rating>PG</rating>
<stars>10</stars>
<description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
<type>Anime, Science Fiction</type>
<format>DVD</format>
<year>1989</year>
<rating>R</rating>
<stars>8</stars>
<description>A schientific fiction</description>
</movie>
<movie title="Trigun">
<type>Anime, Action</type>
<format>DVD</format>
<episodes>4</episodes>
<rating>PG</rating>
<stars>10</stars>
<description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
<type>Comedy</type>
<format>VHS</format>
<rating>PG</rating>
<stars>2</stars>
<description>Viewable boredom</description>
</movie>
</collection>
DOM Parser
Let's first parse the XML data first by introducing the Rexml/document library, where we can usually introduce the rexml to the top-level namespaces:
#!/usr/bin/ruby-w
require ' rexml/document '
include rexml
xmlfile = file.new ("Movies.xml")
xmldoc = Document.new (xmlfile)
# get root element
root = Xmldoc.root
puts "root element:" + root.attributes["shelf"]
# The following will output the movie title
Xmldoc.elements.each ("Collection/movie") {
|e| puts "movie title:" + e.attributes["title"]
# The following will output all movie types
xmldoc.elements.each ("Collection/movie/type") {
|e| puts "movie type:" + E.text
}
# The following will output all movie descriptions
Xmldoc.elements.each ("Collection/movie/description") {
|e| puts "movie Description:" + E.text
}
The result of the above example output is:
Root element:new Arrivals
Movie title:enemy behind Movie title:transformers Movie title:trigun
Mov IE Title:ishtar
Movie type:war, Thriller
Movie type:anime, science fiction
Movie, Type:anime, Actionmovie type:comedy
Movie description:talk about a Us-japan war Movie description:a schientific fictionmovie Description:vash the stampede!
Movie description:viewable Boredom
sax-like parsing:
SAX Parser
Working with the same data file: Movies.xml, it is not recommended that sax be parsed into a small file, and here's a simple example:
#!/usr/bin/ruby-w
require ' rexml/document '
require ' Rexml/streamlistener '
include Rexml
class MyListener
include Rexml::streamlistener
def tag_start (*args)
puts "Tag_start: #{args.map {|x| X.inspect}.join (', ')} "
End
def text (data) return to
if data =~/^\w*$/ # whitespace only
abbrev = DATA[0..40] + (Data.length > 40?) "...": "")
puts "text : #{abbrev.inspect}" end end
list = Mylistener.new
XMLFile = File.new ("Movies.xml")
document.parse_stream (xmlfile, list)
The above output results are:
tag_start: "collection", {"shelf"=>"New Arrivals"}
tag_start: "movie", {"title"=>"Enemy Behind"}
tag_start: "type", {}
text : "War, Thriller"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
text : "Talk about a US-Japan war"
tag_start: "movie", {"title"=>"Transformers"}
tag_start: "type", {}
text : "Anime, Science Fiction"
tag_start: "format", {}
tag_start: "year", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
text : "A schientific fiction"
tag_start: "movie", {"title"=>"Trigun"}
tag_start: "type", {}
text : "Anime, Action"
tag_start: "format", {}
tag_start: "episodes", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
text : "Vash the Stampede!"
tag_start: "movie", {"title"=>"Ishtar"}
tag_start: "type", {}
tag_start: "format", {}
tag_start: "rating", {}
tag_start: "stars", {}
tag_start: "description", {}
text : "Viewable boredom"
XPath and Ruby
We can use XPath to view XML, an XPath is a language for finding information in an XML document (see: XPath tutorial).
XPath is the XML Path language, which is a language used to determine the location of parts of a document in the XML (a subset of standard generic markup languages). XPath is based on an XML tree structure that provides the ability to find nodes in a data structure tree.
Ruby supports XPath through the Rexml XPath class, which is a tree based analysis (Document Object model).
#!/usr/bin/ruby-w
require ' rexml/document '
include rexml
xmlfile = file.new ("Movies.xml")
xmldoc = Document.new (xmlfile)
# first movie information
movie = Xpath.first (xmldoc, "//movie")
P movie
# Print all movie types
Xpath.each (xmldoc, "//type") {|e| puts e.text}
# Gets the type of all movie formats, returns an array of
names = Xpath.match (xmldoc, "//format"). Map {|x| X.text}
p names
The result of the above example output is:
<movie title= ' enemy behind ' > </>
War, Thriller anime, science
fiction anime
, Action
Comedy
["DVD", "DVD", "DVD", "VHS"]
XSLT and Ruby
There are two XSLT parsers in Ruby, and here's a brief description:
Ruby-sablotron
This parser is written and maintained by the Justice Masayoshi Takahash. This is primarily written for the Linux operating system and requires the following libraries:
- Sablot
- Iconv
- Expat
You can find these libraries in Ruby-sablotron.
xslt4r
xslt4r is written by Michael Neumann. XSLT4R is used for simple command-line interactions that can be used by Third-party applications to transform XML documents.
XSLT4R requires Xmlscan operations, including XSLT4R archiving, which is a 100% ruby module. These modules can be installed using a standard Ruby installation method (that is, Ruby Install.rb).
The XSLT4R syntax format is as follows:
Ruby xslt.rb stylesheet.xsl document.xml [arguments]
If you want to use XSLT4R in your application, you can introduce XSLT and enter the parameters you need. Examples are as follows:
Require "XSLT"
stylesheet = File.readlines ("stylesheet.xsl"). to_s xml_doc
= File.readlines ("Document.xml"). to_s
arguments = {' Image_dir ' => '/... '}
Sheet = xslt::stylesheet.new (Stylesheet, arguments)
# output to StdOut
sheet.apply (xml_doc)
# output to ' s TR '
str = ' "
sheet.output = [STR]
sheet.apply (Xml_doc)