Libxml2 creates, parses, searches, and modifies XML files

Source: Internet
Author: User
Tags xml parser

1. libxml2 introduction:

Libxml2 is an xml c-language parser. It was originally developed for the gnome project and is a free open-source software based on MIT license. In addition to the C language version, it also supports binding C ++, PHP, Pascal, Ruby, TCL and other languages, and can run on Windows, Linux, Solaris, MacOSX, and other platforms. The function is still quite powerful. I believe there is no problem in meeting the needs of general users.

Ii. libxml2 installation:

Generally, if you select all the development libraries and development tools when installing the system (under the Fedora Core series), you do not need to install them. The following describes how to install them manually:

1) download the libxmlcompressed package (libxml2-xxxx.tar.gz) from the xmlsoftsite or ftp(ftp.xmlsoft.org.pdf)

2) decompress the compressed package

Tar xvzf libxml2-xxxx.tar.gz

3) enter the decompressed folder to run

./Configure

Make

Make install

After the installation is complete, you can use simple code to parse XML files, including local and remote files, but there are some problems with encoding. By default, libxml only supports UTF-8 encoding, regardless of whether the input and output are UTF-8, So if you parse an XML, the result is UTF-8, if you need to output gb2312 or other encoding, iconv is required for transcoding (the file that generates the UTF-8 encoding can also be used), if iconv is not installed in the system, libiconv needs to be installed.

1) download the libiconvcompressed package (for example, libiconv-1.11.tar.gz)

2) decompress the compressed package

Tar xvzf libiconv-1.11.tar.gz

3) enter the decompressed folder to run

./Configure

Make

Make install

Iii. xml:

Before studying the libxml2 library, let's take a look at the basics of XML. XML is a text-based format that can be used to create structured data that can be accessed through various languages and platforms. It includes a series of HTML-like tags and arranges them in a tree structure.

For example, see the simple document in Listing 1. To better display the general concept of XML, the following is a simplified XML file.

Listing 1. A simple XML file

<? XML version = "1.0" encoding = "UTF-8"?>

<Files>

<Owner> root </owner>

<Action> Delete </Action>

<Age units = "days"> 10 </age>

</Files>

The first line in Listing 1 is the XML declaration, which tells the application responsible for processing XML, that is, the parser, the version of the XML to be processed. Most of the files are written in Version 1.0, but there are also a small number of files in version 1.1. It also defines the encoding used. Most files use UTF-8, but XML is designed to integrate data in a variety of languages, including those that do not use English letters.

Next we will see elements. An element starts with the start tag (such as <files>) and ends with the end tag (such as </Files>). A slash (/) is used to distinguish it from the start tag. An element is a type of node. The XML Document Object Model (DOM) defines several different nodes types, including:

Elements (such as files or age)

Attributes (such as units)

Text (such as root or 10)

The element can have subnodes. For example, the age element has a child element, that is, text node 10.

The XML Parser can use this parent-child structure to traverse documents or even modify the structure or content of documents. Libxml2 is one of these Resolvers, and the sample application in this article uses this structure to achieve this purpose. There are many different parser and libraries for different environments. Libxml2 is the best parser and library for UNIX environments. It has been extended and provides support for several scripting languages, such as Perl and python.

4. Use libxml2

To implement a background program for XML file management in the project, you need to create, parse, modify, and search the XML file. The following describes how to use the library provided by libxml2 to implement the above functions.

1. Create an XML document:

We use xmlnewdoc () to create an XML document, and then use functions such as xmlnewnode (), xmlnewchild (), xmlnewprop (), and xmlnewtext () to add nodes and subnodes to the XML file, set the elements and attributes. After creation, use xmlsaveformatfileenc () to save the XML file to the disk (this function can set the encoding format when saving the XML file ).

Example 1:

# Include <stdio. h>

# Include <libxml/parser. h>

# Include <libxml/tree. h>

Int main (INT argc, char ** argv)

{

Xmldocptr Doc = NULL;/* Document pointer */

Xmlnodeptr root_node = NULL, node = NULL, node1 = NULL;/* node pointers */

// Creates a new document, a node and set it as a root node

Doc = xmlnewdoc (bad_cast "1.0 ");

Root_node = xmlnewnode (null, bad_cast "root ");

Xmldocsetrootelement (Doc, root_node );

// Creates a new node, which is "attached" as child node of root_node node.

Xmlnewchild (root_node, null, bad_cast "node1", bad_cast "content of node1 ");

// Xmlnewprop () creates attributes, which is "attached" to an node.

Node = xmlnewchild (root_node, null, bad_cast "node3", bad_cast "node has attributes ");

Xmlnewprop (node, bad_cast "attribute", bad_cast "yes ");

// Here goes another way to create nodes.

Node = xmlnewnode (null, bad_cast "node4 ");

Node1 = xmlnewtext (bad_cast "other way to create content ");

Xmladdchild (node, node1 );

Xmladdchild (root_node, node );

// Dumping document to stdio or file

Xmlsaveformatfileenc (argc> 1? Argv [1]: "-", Doc, "UTF-8", 1 );

/* Free the document */

Xmlfreedoc (DOC );

Xmlcleanupparser ();

Xmlmemorydump (); // debug memory for regression tests

Return (0 );

}

2. parse XML documents

When parsing a document, you only need a file name and only call one function, and check for errors. Common related functions include xmlparsefile () and xmlparsedoc (). After obtaining the document pointer, you can use xmldocgetrootelement () to obtain the node pointer of the root element. With this pointer, You can roam in the DOM tree and call xmlfreedoc () to release the node.

Example 2:

Xmldocptr Doc; // defines the parsing document pointer

Xmlnodeptr cur; // defines the node pointer (you need it to move between nodes)

Xmlchar * key;

Doc = xmlreadfile (URL, my_encoding, 256); // parse the file

/* Check whether the parsing document is successful. If it fails, libxml indicates a registered error and stops. A common error is improper encoding. XML standard documents can be saved in other encodings in addition to UTF-8 or UTF-16. If so, libxml will automatically convert you to the UTF-8. More information about XML encoding is included in the XML standard. */

If (Doc = NULL ){

Fprintf (stderr, "document not parsed successfully./N ");

Return;

}

Cur = xmldocgetrootelement (DOC); // determine the document root element

/* Check and confirm that the current document contains content */

If (cur = NULL ){

Fprintf (stderr, "empty document/N ");

Xmlfreedoc (DOC );

Return;

}

/* In this example, we need to confirm that the document is of the correct type. "Root" is the root type of the document used in this example. */

If (xmlstrcmp (cur-> name, (const xmlchar *) "root ")){

Fprintf (stderr, "Document of the wrong type, root node! = Root ");

Xmlfreedoc (DOC );

Return;

}

Cur = cur-> xmlchildrennode;

While (cur! = NULL ){

If ((! Xmlstrcmp (cur-> name, (const xmlchar *) "keyword "))){

Key = xmlnodelistgetstring (Doc, cur-> xmlchildrennode, 1 );

Printf ("Keyword: % s/n", key );

Xmlfree (key );

}

Cur = cur-> next;

}

Xmlfreedoc (DOC );

3. Modify XML elements, attributes, and other information.

To modify the element and attribute information in the XML document, you must first parse the XML document to obtain a node pointer (xmlnodeptr node), which is used to roam the DOM tree, you can get, modify, and add relevant information in the XML document.

Example 3:

Get the content of a node:

Xmlchar * value = xmlnodegetcontent (node );

The returned value should use xmlfree (value) to release the memory.

Obtain the attribute value of a node:

Xmlchar * value = xmlgetprop (node, (const xmlchar *) "prop1 ");

The returned value requires xmlfree (value) to release the memory.

Set the content of a node:

Xmlnodesetcontent (node, (const xmlchar *) "test ");

Set the attribute value of a node:

Xmlsetprop (node, (const xmlchar *) "prop1", (const xmlchar *) "V1 ");

Add a node element:

Xmlnewtextchild (node, null, (const xmlchar *) "keyword", (const xmlchar *) "test element ");

Add a node property:

Xmlnewprop (node, (const xmlchar *) "prop1", (const xmlchar *) "test prop ");

4. Search for XML nodes

Sometimes, for an XML document, we may only care about the values or attributes of one or more specific elements. It will be very painful and boring to roam the DOM tree, using XPath, you can easily get the element you want. The following is a UDF:

Example 4:

Xmlxpathobjectptr get_nodeset (xmldocptr doc, const xmlchar * XPath ){

Xmlxpathcontextptr context;

Xmlxpathobjectptr result;

Context = xmlxpathnewcontext (DOC );

If (context = NULL ){

Printf ("context is null/N ");

Return NULL;

}

Result = xmlxpathevalexpression (XPath, context );

Xmlxpathfreecontext (context );

If (result = NULL ){

Printf ("xmlxpathevalexpression return null/N ");

Return NULL;

}

If (xmlxpathnodesetisempty (result-> nodesetval )){

Xmlxpathfreeobject (result );

Printf ("nodeset is empty/N ");

Return NULL;

}

Return result;

}

Query the nodes that meet the criteria of the XPath expression in the XML document to which the doc points. For the method of returning the node set query criteria that meet the criteria, see relevant XPath documents. After obtaining the result set, you can access the node through the returned xmlxpathobjectptr structure:

Example 5:

Xmlchar * XPath = ("/root/node/[@ key = 'keyword']");

Xmlxpathobjectptr app_result = get_nodeset (Doc, XPath );

If (app_result = NULL ){

Printf ("app_result is null/N ");

Return;

}

Int I = 0;

Xmlchar * value;

If (app_result ){

Xmlnodesetptr nodeset = app_result-> nodesetval;

For (I = 0; I <nodeset-> nodenr; I ++ ){

Cur = nodeset-> nodetab [I];

Cur = cur-> xmlchildrennode;

While (cur! = NULL ){

Value = xmlgetprop (cur, (const xmlchar *) "key ");

If (value! = NULL ){

Printf ("value: % s/n", d_convertcharset ("UTF-8", "GBK", (char *) value ));

Xmlfree (value );

}

Value = xmlnodegetcontent (cur );

If (value! = NULL ){

Printf ("value: % s/n", d_convertcharset ("UTF-8", "GBK", (char *) value ));

Xmlfree (value );

}

}

}

Xmlxpathfreeobject (app_result );

}

Through the result set returned by get_nodeset (), we can obtain the elements and attributes of the node, or modify the value of the node. In this example, the d_convertcharset () function is used to obtain the value for printing to change the encoding format to GBK, so as to facilitate the correct reading of possible Chinese characters.

5. Encoding Problems

Since libxml typically saves and manipulate data in UTF-8 format, if your program uses other data formats, such as Chinese characters (gb2312, GBK encoding), you must use the libxml function to convert to the UTF-8. If you want your program to output in a way other than UTF-8 encoding, it must also be converted.

The following sample program provides several functions to convert the data encoding format. Some of them use libiconv, so to ensure that they work properly, first, check whether the libiconv library is installed in the following system.

Example 6:

Xmlchar * convertinput (const char * In, const char * encoding ){

Unsigned char * out;

Int ret;

Int size;

Int out_size;

Int temp;

Xmlcharencodinghandlerptr handler;

If (in = 0)

Return 0;

Handler = xmlfindcharencodinghandler (encoding );

If (! Handler ){

Printf ("convertinput: No encoding handler found for '% s'/N", encoding? Encoding :"");

Return 0;

}

Size = (INT) strlen (in) + 1;

Out_size = size * 2-1;

Out = (unsigned char *) xmlmalloc (size_t) out_size );

If (OUT! = 0 ){

Temp = size-1;

Ret = Handler-> input (Out, & out_size, (const unsigned char *) in, & temp );

If (Ret <0) | (temp-size + 1 )){

If (Ret <0 ){

Printf ("convertinput: conversion wasn't successful./N ");

} Else {

Printf ("convertinput: conversion wasn't successful. converted: % I octets./N", temp );

}

Xmlfree (out );

Out = 0;

} Else {

Out = (unsigned char *) xmlrealloc (Out, out_size + 1 );

Out [out_size] = 0;/* null terminating out */

}

} Else {

Printf ("convertinput: No MEM/N ");

}

Return out;

}

Example 7:

Char * convert (char * encfrom, char * encto, const char * In ){

Static char bufin [1024], bufout [1024], * sin, * sout;

Int mode, Lenin, lenout, RET, nline;

Iconv_t c_pt;

If (c_pt = iconv_open (encto, encfrom) = (iconv_t)-1 ){

Printf ("iconv_open false: % s ==> % s/n", encfrom, encto );

Return NULL;

}

Iconv (c_pt, null );

Lenin = strlen (in) + 1;

Lenout = 1024;

Sin = (char *) in;

Sout = bufout;

Ret = iconv (c_pt, & sin, (size_t *) & Lenin, & sout, (size_t *) & lenout );

If (ret =-1 ){

Return NULL;

}

Iconv_close (c_pt );

Return bufout;

}

Example 8:

Char * d_convertcharset (char * cpencodefrom, char * cpencodeto, const char * cpinput ){

Static char s_strbufout [1024], * sin, * cpout;

Size_t iinputlen, ioutlen, ireturn;

Iconv_t c_pt;

If (c_pt = iconv_open (cpencodeto, cpencodefrom) = (iconv_t)-1 ){

Printf ("iconv_open failed! /N ");

Return NULL;

}

Iconv (c_pt, null );

Iinputlen = strlen (cpinput) + 1;

Ioutf-8 = 1024;

Sin = (char *) cpinput;

Cpout = s_strbufout;

Ireturn = iconv (c_pt, & sin, & iinputlen, & cpout, & ioutlen );

If (ireturn =-1 ){

Return NULL;

}

Iconv_close (c_pt );

Return s_strbufout;

}

The preceding functions allow you to conveniently save and manipulate Chinese Characters in an XML file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.