XML format and related libxml library learning

Source: Internet
Author: User
Tags cdata xml parser

This article refers to the XML file format syntax as well as the DTD to remove the knowledge points that you deem necessary, as documented here.

First, an instance file of XML is given,

<?XML version= "1.0" encoding= "Utf-8"?><Gadget>  <name>Calendar</name>  <namespace>    <!--_loccomment_text= "{Locked}" -microsoft.windows</namespace>  <version>    <!--_loccomment_text= "{Locked}" -1.1.0.0</version>  <authorname= "Microsoft Corporation">    <InfoURL= "http://go.microsoft.com/fwlink/?LinkId=124093"text= "Www.gallery.microsoft.com"/>    <logosrc= "Logo.png"/>  </author>  <Copyright><!--_loccomment_text= "{Locked}" -? 2009</Copyright>  <Description>Browse for dates in your calendar.</Description>  <icons>    <iconHeight= "$"width= "$"src= "Icon.png"/>  </icons>  <hosts>    <Hostname= "sidebar">      <autoscaledpi><!--_loccomment_text= "{Locked}" -True</autoscaledpi>      <Basetype= "HTML"apiversion= "1.0.0"src= "calendar.html"/>      <Permissions>        <!--_loccomment_text= "{Locked}" - Full</Permissions>      <Platformminplatformversion= "1.0"/>      <Defaultimagesrc= "Drag.png"/>    </Host>  </hosts></Gadget>

XML is a text file, the entire content can be divided into two parts, respectively, in the first row of the file preface (Prolog) and the file body.

The preamble to the file is something that an XML file must declare, primarily to tell the XML parser how to work, where version represents the standard build number used by the XML file, and encoding represents the type of character used in the XML file.

The file body is an XML file, in addition to the remainder of the preamble of the file, as an example above, it consists of the start <gadget> and end </gadget> control tags, which is the root element of the XML; name is the "child element" under the root element, in the child element <author>, name is the attribute of the element, followed by the attribute value of the element.

<!--annotation Content--this sentence is the annotation format in XML.

XML Parser

The XML parser first examines the XML file that will be opened according to the XML specification, whether there is a structural error, and then strips out the markup in the XML file, reads the correct content, and gives it to the subsequent program processing.

The designers of XML strictly stipulate two kinds of XML syntax and structure, one is well-formed XML file and the other is Validating XML file. XML must be well-formed, only to meet this condition, can be parsed correctly by the parser, displayed in the browser. Here are the guidelines for writing well-formed:

1. The first line of the XML file must be the declaration that the file is an XML file and the version of the XML specification it uses.

2. In an XML file, there is only one root element

3. In the XML file, the tag must be closed correctly, with <a>, and the tag </A> will be closed for sure. Special empty element notation < empty element name [property = ' property value ']/>

4. In the XML file, the tags must not be crossed, and the attribute values have to be enclosed in English, and the control marks, directives, and attribute names are case-sensitive in English.

5. To display the input content intact, the XML needs to be specifically marked with CDATA to <! [cdata[start tag, >> as end tag.

6. In the XML file, all whitespace outside the markup, the parser is faithfully handed over to the subsequent application for processing.

The above is an XML file that conforms to the well-formed standard, which is the most basic requirement for writing XML files. XML files are used to transfer data, in the file, in addition to the data content, there is the element name of the data, and these element names are user-defined, which brings problems for subsequent communication. Imagine that a company uses < price > to indicate that company B uses < price > to indicate that the XML file communication between them, although they can parse out the numbers, but the meaning of understanding will be different, the creator of the XML agreed a specification, What tags can be used to write an XML file, what child elements can be included in the parent element, the order in which each element appears, how the attributes in the element are defined, and so on. The specification for both parties is called a DTD (document Type Definiton, which defines the format of documents). You can think of a DTD as a template for writing XML, which is written according to this template, so that both sides can communicate correctly.

If an XML file is well-formed, and it is correctly based on a DTD, it is called a: Validating XML file.

DTDs have two ways of using an internal DTD file that is directly set in an XML file, called by an XML file in an external DTD file.

The internal DTD is defined in the preamble area of the file in the XML file, with the syntax:

<! DOCTYPE element-name[...] >

<! DOCTYPE: Indicates that the DTD is starting to be set

Element-name: Specifies the root element name of this DTD, if the XML file uses a DTD, then the root element in the file is specified here.

[....]>: Defines the elements used in the XML file in the [] tag, ending the DTD definition with >.

An external DTD is a file that is independent of an XML file, with a DTD as the file name extension, which can be used by multiple XML files. An example of an external DTD is given below.

〈?xml version= "1.0" encoding= "GB2312"?
〈! ELEMENT Reference (Book *)
〈! ELEMENT Book (name, author, Price)
〈! ELEMENT name (#PCDATA)
〈! ELEMENT author (#PCDATA)
〈! ELEMENT Price (#PCDATA)
〈! Attlist Price currency Unit CDATA #REQUIRED

XML file, use the <! DOCTYPE element-name SYSTEM dt-url > To reference the created external DTD file.

XML Parsing Library

To really use XML in your project, you need to write the XML parser yourself, and for the sake of simplicity, start with the learning libxml2 and get started with XML parsing tasks quickly. In this chapter, the problem to resolve is to parse an XML configuration file and export the content to the struct.

This part of the content refers to other people on the Internet learning experience, click here to enter.

Under Linux, the GNOME project provides a C parser for XML, called the LIBXML2 Parsing library, that provides a simple and convenient operation of XML files, supports XPath queries, and functions such as partial XSLT transformations. The installation method has to download the source code to install itself or use apt tools to install, it is recommended to use the latter.

Installation method: Apt-get Install LIBXML2

Apt-get Install Libxm2-dev

After installation, three XML-related executables are available in/usr/bin

Xml2-config provides some post-installation XML configuration information, which is required for subsequent compilation.

Xmlcatalog didn't know what it was for.

The xmllint can be used to parse the XML file and output the parsed results.

LIBXML2 provides tools to help compile, find xml2-config, output cflags and Libs configuration information

Both of these paths need to be added to the command line at compile time.

At compile time, the following error is generated:

The reason is that there is no link library, or the link library is not updated. A variety of-L and-l have been added.

Normal compilation options:

GCC Test.c-o test-i/usr/include/libxml2/-lxml2-l/usr/lib/i386-linux-gnu/

Super Pit: GCC compile process, the above normal compilation can pass, change position, put test.c o test behind, compile failed.

The results of the final implementation are as follows:./test ABC, where ABC's contents are as follows:

This is the last generated XML file.

Parse the XML file and add it at night.

To learn about the XML library, just start by understanding its most common functions and data structure types. Here is a description of what is provided in the instance program:

1. Internal character type Xmlchar

All characters and strings in the LIBXML2 library are based on this character type

Prototype defined as typedef unsigned char Xmlchar

It adapts well to UTF-8 encoding, UTF-8 encoding is the internal encoding of LIBXML2, and other formats must first be converted to UTF-8 encoding in order to use LIBXML2.

2. Conversion between xmlchar* and other types

To facilitate type conversion between xmlchar* and char*, a Bad_cast macro is defined

The prototype is as follows: #define BAD_CAST (Xmlchar *)

3. File type data structure xmldoc, file type data structure for pointer xmldocptr

XmlDoc holds the basic information of an XML file, including the file name, type of files, child nodes, and so on.

Xmldocptr equals xmldoc*.

The Xmlnewdoc function is used to create a new file pointer

The Xmlparsefile function reads the file in a UTF-8 format by default and returns the file pointer

The Xmlfreedoc function frees the file pointer. Note that when this function is called, the node memory contained in the file is freed, so in general, you do not have to manually call Xmlfreenode the latter xmlfreenodelist to release the dynamically allocated node memory unless you remove the node from the file.

In general, all nodes in a file should be dynamically allocated, then add the file, and finally call Xmlfreedoc to release the dynamic memory requested by all nodes at once.

Xmlsavefile storing files in a file by default

4. Node type XmlNode, pointer xmlnodeptr for node type

A XmlNode represents a node in an XML file, specifically implemented as a struct,

The principle of XML file operation is to move between nodes, query the information of nodes, and add, delete, modify and so on.

5. Node collection type Xmlnodeset, and its corresponding pointer xmlnodesetptr

A node collection represents a variable that consists of nodes, and the node collection appears only as a result of the XPath query.

The contents of the instance XML file are as follows:

<?XML version= "1.0" encoding= "UTF-8"?><!--edited with XMLSpy v2011 rel. 2 (http://www.altova.com) by Dancelj (EM) -<confXmlns:xsi= "Http://www.w3.org/2001/XMLSchema-instance"xsi:nonamespaceschemalocation= "Conf.xsd">    <Service>        <name>Service_1</name>        <Monitor_interface>Eth0</Monitor_interface>        <exprobe_ip>192.168.8.201</exprobe_ip>        <Update_period>5</Update_period>        <Sample_number>10</Sample_number>        <media_servers>                <IP>10.0.0.1</IP>                <IP>10.0.0.2</IP>                <IP>10.0.0.3</IP>        </media_servers>    </Service>    <Service>        <name>Service_2</name>        <Monitor_interface>Eth1</Monitor_interface>        <exprobe_ip>192.168.8.202</exprobe_ip>        <Update_period>5</Update_period>        <Sample_number>10</Sample_number>        <media_servers>                <IP>10.1.0.1</IP>                <IP>10.1.0.2</IP>        </media_servers>    </Service>    </conf>

First analyze the following XML file, which describes two services, each service has a service name (name), monitoring port (MIF), browse IP (EXPIP), update period (update_period), number of samples (Sample_number) and three media server addresses, taking into account that the media server address may also be increased, it is stored in the form of a list for later expansion. Each service attribute is also connected by a linked list.

EditPlus shortcut keys:

Select the current Word ctrl+w

Select the current line ctrl+r

new browser window Ctrl+shift+b

New Normal Text CTRL + N

Turn on code folding CTRL+SHIFT+F

Finds the next matching text F3

Finds the last matching text shift+f3

Go to the specified line in the document Ctrl+g

FTP Service This side, active and passive, are targeted at the server side.

Active Connection: The client first tells the server port number 21st, I can connect here. Then the server's 20 port actively goes to the port that connects the client.

Passive mode: The client tells the server 21st wide port can be connected, and then, the server's 21 port reply to the client said "My XX port is available, you even take it", so the client initiates the connection, the server side passively accept the connection, this mode, the server XX port is no longer active mode 20 port, Instead, a port greater than 1024.

XML format and related libxml library learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.