XML document parsing in ruby

Source: Internet
Author: User

This rexml library can be used for parsing XML documents in ruby.

The rexml library is an XML toolkit of Ruby. It is written in pure Ruby and complies with the xml1.0 specification.
In ruby1.8 and later versions, the ruby standard library will contain rexml.

The rexml library path is: rexml/Document

All methods and classes are encapsulated in a rexml module.

Therefore, you must first import the rexml/document library and then expand the rexml module to the current script environment. In this way, you do not need to use rexml for classes in the rexml module :: module to reference the class in rexml

Require 'Re XML/Document '
Include rexml

Rexml can be used to access an XML document in two ways: Tree and stream.

The tree method is similar to the DOM method in Javascript, but it is simpler and easier to use. The following describes how to parse XML documents using the tree method.

Frequently Used classes in the rexml module include the following:
A document class

1 Document: New
Document class constructor. The parameter can be the path of an XML file or an IO object, but the content between the position pointed by the stream pointer of the IO object and the end of the stream must be ensured, contains a legal XML document.

2 Document # Root
Returns an object of the element type, which is the root element of the XML file.

3 Document # version
Returns the version information of the current XML document, which is parsed from the first line of XML.
For example, <? XML version = "ABCD"?>
# 'Abc' is returned for version. The default value is '1. 0'

4 Document # Encoding
The encoding method of the current XML document returned is the same as that of # version read from the first line of XML
For example, <? XML version = "1.0" encode = "gb2312"?>
Note that this XML document cannot be parsed if the content of the encode is input at will, such as ABC and documnet. New.

Binary Element class
1 element: New (ARG = undefined, parent = nil, context = nil)
Element object constructor.
Arg: The default value is undefined. It is undefined. If it is a string, this string will be used as the name of this element. If it is also an element, the element will be copied, of course, it is only a shallow copy, that is, only copying the name, attribute, and namespace of the element.

Parent: nil by default, which is the parent element of the element.

Context: nil by default.

2 element # add_attribute (Key, value = nil)
Add an attribute to the element. If a key with the same name already exists, the key is overwritten.
The return value is value.

If the key of the first parameter is not a string but an attribute object, the second parameter can be omitted and the attribute object is directly added to the attribute list of the element.Ele = Element. New ' Book ' # <Book/>
Ele. add_attribute ' Name ' , ' Rubys way '   # <Book name = "rubys way"/>
ATT = Attribute. New ' Price ' , ' $5 '
Ele. add_attribute ATT # <Book name = "rubys way" Price = "$5"/>

Note thatCodeFirst, I created the ATT object, and then called the ele # add_attribute method to add the ATT object to ELE's attributes.

3 element # add_attributes (hash)
Add multiple attributes to this element.
Hash: it can be a hash or a two-dimensional array. Ele. add_attributes ({ " Name "   =>   " Rubys way " , " Type "   =>   " Book " }) # Hash
Ele. add_attributes ([[ " Name " , " Rubys way " ], [ " Type " , " Book " ]) # Array of Arrays

4 element # add_element (element, attrs = nil)
Add a child element to this element. You can set the attributes of the child element when adding the child element.

Element: if it is an element, the element is added to the parent element. Otherwise, an element is constructed based on this parameter. For example, this parameter is a string, A new element named string is constructed and added to the sub-element of the element.

ATTR: If this parameter is provided, it must be a hash. The name of this hash will be called the name of attribute, and value is the value of attribute.

5 element # add_namespace (prefix, uri = nil)

Prefix: prefix, xmlns: prefix.

Uri: content.

If there is only one parameter, there will be no prefix, and it will be xmns = "Uri" directly"
For example IRB (main ): 242 : 0 > Ele = Element. New " Filed "
= > <Filed/>
IRB (main ): 243 : 0 > Ele
= > <Filed/>
IRB (main ): 244 : 0 > ELE. add_namespace " Pre " ,   " Uri "
= > <Filed xmlns: Pre = 'Uris '/>
IRB (main ): 245 : 0 > ELE. add_namespace " Uri "
= > <Filed xmlns: Pre = 'Url' xmlns = 'Uris '/>
IRB (main ): 246 : 0 > ELE. add_namespace " Uriuri "
= > <Filed xmlns: Pre = 'Url' xmlns = 'Uriuri '/>

6 element # add_text (string)
Add text to an element, for example: IRB (main ): 251 : 0 > Ele = Element. New " Ele "
= > <Ele/>
IRB (main ): 252 : 0 > Ele
= > <Ele/>
IRB (main ): 253 : 0 > ELE. add_text " Hallo "
= > <Ele> </>
IRB (main ): 254 : 0 > ELE. add_text " World "
= > Nil
IRB (main ): 255 : 0 > Ele
= > <Ele> </>
IRB (main ): 256 : 0 > ELE. Text
= > " Hallo world "

7 element # Attribute (key)
Access a key and obtain the value of the key.

8 element # cdatas ()
Returns an array of the CDATA type.

9 element # clone ()
Returns the superficial clone of the current element, that is, only the clone of the current element, excluding the child element.

10 element # comments
Obtains all comments of this element and returns an array.

11 element # delete_attribute (key)
Deletes the attribute of the specified name.

12 element # delete_element (element)
Delete an element.
Element: the parameter must be an element object, or a string or integer. If it is an element object, this element object is deleted. If it is a string, it is deleted according to the XPath expression, if it is a number, delete the specified element.

13 Element # delete_namespace (namespace = "xmlns ")
Deletes the namespace with the specified name. The default value is xmlns.

14 element # document ()
Returns the document object, or nil if this element doesn't belong to a document.

15 element # each_element (XPath = nil) {| element | ...}
Traverses all elements that comply with XPath based on the search criteria specified by XPath.

16 element # each_element_with_attribute (Key, value = nil, max = 0, name = nil) {| element | ...}
Traverses child elements based on the property value.
The key parameter specifies the attribute name. All child elements with this attribute are matched.
The value parameter limits the value.
Max is the number of matched items.
Name is the child element name.

17 element # each_element_with_text (text = nil, max = 0, name = nil) {| element | ...}
Text:
MAX: the maximum number of conformances. It is not limited to 0.
Name: name of the child element that matches the condition.

18 element # each_with_something (test, max = 0, name = nil) {| child if test. Call (child) and num + = 1 | ...}
Is this method abnormal?
The charm of Ruby language is here. It is very convenient to pass in a proc as a condition for iteration methods.
Test is a proc instance. This proc has only one parameter, that is, each element.
Max is the maximum number of matches. If it is 0, it is not limited.
Name is the name.

The number of times test. Call is called for each element.

19 element # get_elements (XPath)
Returns an array and an Array Based on XPath. Each element is a child element conforming to XPath.

20 element # get_text (Path = nil)
Returns the first text node.
For example:XML=Document. New"<B> some text <C/> more text </B>" #At this time, XML has two text nodes
XML. Root. get_text.value#-> "Some text"

21 element # has_attributes?
Property?

22 element # has_elements?
Child element?

23 element # has_text?
Whether textnode exists

24 element # namespace (prefix = nil)
The value of namespace. The default value is no prefix and "xmlns" is used"

25 element # namespaces ()
Returns a hash value for all namespaces.

26 element # next_element ()
Take the next element

27 element # node_type ()
Node Type: element, attribute, or namespace

28 element # prifixes
The returned result is an array containing all the namespace prefixes.

29 element # previous_element
Returns the previous element. If nil is not returned.

30 element # Root
Returns the root element.

31 element # root_node
You can use this to determine whether two elements belong to the same root, for example, ELE. root_node = ele [0]. root_node.

32 element # text
Return text value

33 element # text =
Set Text Value

34 element # texts
Returns all text.
One thing to mention here is that an element may have multiple text nodes. Let's look at the following example:Ele=Document. New"<A> some string <B/> more string </a>"

Ele. Root. Text#-> "Some string"
Ele. Root. Texts#-> ["Some string", "More string"]

35 element # XPath ()
It is useful to obtain the XPath of a node. If you do not know how to match a node, you can use this method to view it directly.

Tri-elements class
Element. You can use element. elements to access the subset.

1 elements: New (element)
Constructor.
Element: indicates which element the newly created elements belongs.

2 elements # <
Alias for # Add

3 elements # []

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.