Parsing--libxml library function interpretation of XML file

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

xml| function

Libxml (i)

Summary
Libxml is a free licensed C language library for processing XML that can easily span multiple platforms. This guide provides examples of its basic functions.
Introduction
Libxml is a C language library that implements the functions of reading, creating, and manipulating XML data. This guide provides an example code and gives an explanation of its basic functionality. There are libxml and more information available on the homepage of this project. Contains the complete API documentation. This guide is not a substitute for these complete documents, but the clarification feature requires a library to do the basic work.
This guide is based on a simple XML application that is generated using an article I wrote that contains metadata and the body of the article.

The example code in this guide demonstrates how to:
• Parsing documents
• Get the text of the specified element
• Add an element and its contents
• Add a property
• Get the value of a property
The complete code for the example is included in the appendix

Data type
Libxml defines a number of data types, and we'll bump into them again and again, hiding clutter from the source so you don't have to deal with it unless you have a specific need. Xmlchar replaces char with a byte string encoded by UTF-8. If your data uses other encodings, it must be converted to UTF-8 to use the Libxml function. More useful information on coding is available on the Libxml Encoding Support Web page.
XmlDoc contains the tree structure created by the parsing document, Xmldocptr is a pointer to this structure.
The xmlnodeptr and XmlNode structure that contains a single node xmlnodeptr is a pointer to the structure that is used to traverse the document tree.

Parsing documents
When parsing a document, only the file name is required and only one function is called, with error checking. Complete code: Appendix C, keyword routine code

①xmldocptr Doc;
②xmlnodeptr cur;
③doc = Xmlparsefile (DocName);
④if (doc = = NULL) {
fprintf (stderr, "Document not parsed successfully. \ n");
Return
}
⑤cur = Xmldocgetrootelement (DOC);
⑥if (cur = = NULL) {
fprintf (stderr, "empty document\n");
Xmlfreedoc (DOC);
Return
}
⑦if (XMLSTRCMP (Cur->name, (const XMLCHAR *) "story")) {
fprintf (stderr, "Document of the wrong type, root node!= story");
Xmlfreedoc (DOC);
Return
}
① defines the resolution of a document pointer.
② defines the node pointer (you need it to move between nodes).
④ checks to see if the document was successful, and if it is unsuccessful, Libxml will refer to a registered error and stop.

Comments
A common error is an improper encoding. XML standard documents can be saved in addition to UTF-8 or UTF-16 other encodings. If the document is this way, Libxml will automatically convert to UTF-8 for you. More information about XML encoding is included in the XML standard.
⑤ Get Document root element
The ⑥ check confirms that the current document contains content.
⑦ In this example, we need to confirm that the document is the correct type. "Story" is the root type of the document used in this guide.

Get element content

You find the element you are looking for in the document tree and you can get its contents. In this example we look for the "story" element. The process will find the elements we are interested in in a lengthy tree. We fake regular you already have a xmldocptr named Doc and a xmlnodptr named cur.
①cur = cur->xmlchildrennode;
②while (cur!= NULL) {
if ((!XMLSTRCMP (Cur->name, (const XMLCHAR *) "Storyinfo")) {
Parsestory (doc, cur);
}
Cur = cur->next;
}

① Gets the first child node of the cur, cur points to the root of the document, the "story" element.
② This loop iteration finds "Storyinfo" through the child elements of "story". This is an element that contains the "keywords" that we will look for. It uses the Libxml string comparison function xmlstrcmp. If it matches, it calls the function parsestory.

void
Parsestory (xmldocptr doc, xmlnodeptr cur) {
Xmlchar *key;
①cur = cur->xmlchildrennode;
②while (cur!= NULL) {
if ((!XMLSTRCMP (Cur->name, (const XMLCHAR *) "keyword")) {
③key = xmlnodelistgetstring (Doc, Cur->xmlchildrennode, 1);
printf ("Keyword:%s\n", key);
Xmlfree (key);
}
Cur = cur->next;
}
Return
}
① again to get the first child node.
② like the loop above, we can iterate over the elements called "keyword" that we are interested in.
③ when we find the element "keyword", we need to print the contents of the records that it contains in the XML, and the text is contained in the child nodes of the element, so we use the cur-> Xmlchildrennode, in order to get the text, We use the function xmlnodelistgetstring, which has a document pointer parameter, in this case we just print it.
Comments
Because xmlnodelistgetstring allocates memory for the string it returns, you must use Xmlfree to release it.

Using XPath to get element content
In addition to the step-by-step traversal of the document tree lookup element, LIBXML2 contains support for using an XPath expression to obtain a specified set of nodes. The complete XPath API documentation is here. XPath allows you to search through a path document for a node that matches a specified condition. In the following example, we search for all the "keyword" elements in the document.

Comments
The following is a complete discussion of XPath. For more information, consult the XPath specification.
This example complete code see Appendix D,xpath routine code.
Using XPath requires setting up a xmlxpathcontext and then supplying the XPath expression and the context to the Xmlxpath Evalexpression
function.
The function returns an XMLXPATHOBJECTPTR, which includes the set of nodes satisfying the XPath expression.
XPath expressions and xmlxpathevalexpression functions are supported by the need to install Xmlxpathcontext using XPath, which returns a xmlxpathobjectptr that contains
The set of nodes for an XPath expression.

Xmlxpathobjectptr
Getnodeset (xmldocptr doc, Xmlchar *xpath) {
①xmlxpathcontextptr context;
Xmlxpathobjectptr result;
②context = Xmlxpathnewcontext (DOC);
③result = Xmlxpathevalexpression (xpath, context);
④if (Xmlxpathnodesetisempty (result->nodesetval)) {
printf ("No result\n");
return NULL;
}
Xmlxpathfreecontext (context);
return result;
}
① first define Variables
② Initialization Variable Context
③ Apply an XPath expression
④ Inspection Results
The xmlpathobjectptr returned by the function contains a set of nodes and other information that needs to be iterated and manipulated. In this example our function returns XMLXPATHOBJECTPTR, which we use to print the contents of the keyword node in our document. The node set object contains the number of elements in the collection (NODENR) and a node (nodetab) array.

①for (i=0 i < nodeset->nodenr; i++) {
②keyword = xmlnodelistgetstring (Doc,
Nodeset->nodetab[i]->xmlchildrennode, printf ("Keyword:%s\n", keyword);
Xmlfree (keyword);
}
The ① variable NODESET-&GT;NR holds the number of elements in the node set. We use it to traverse an array.
② prints the content that each node contains.
Comments
Note that we are printing the "node of the" of the "This is returned, because the contents of the keyword element are a C Hild text node. Note that we print the return value of the node's sub node, because the content of the keyword element is a child text node.

Write element
Writing element content uses many of the same steps above-parsing the document and traversing the tree. We first parse the document and then traverse the tree to find where we want to insert the element. In this example, we look again at the "Storyinfo
element and inserts a keyword. Then we loaded the files to disk. Complete code: Appendix E, add keyword routines
The main difference in this case is parsestory
void
Parsestory (xmldocptr doc, xmlnodeptr cur, char *keyword) {
①xmlnewtextchild (cur, NULL, "keyword", keyword);
Return
}
The ①xmlnewtextchild function adds a new child element of the current node to the tree
Once the node is added, we should write the document to the file. Do you want to specify a namespace for the element? You can add it, in our case, the namespace is null.

Xmlsaveformatfile (DocName, doc, 1);

The first parameter is the name of the file you are writing to, and you notice the same file name as we just read it. In this example, we simply overwrite the original file. The second parameter is a XMLDOC structure pointer, and the third parameter is set to 1, which is guaranteed to be written on the output.
Libxml (ii)

Write properties
Write properties are similar to writing text to a new element. In this example, we will add a reference node URI attribute to our document. Complete code: Appendix F, add the property routine code. Reference is a child node of a story element, so it is easy to find and insert new elements and their properties. Once we have a bug check in Parsedoc, we'll add our new elements in the right place. But before we do, we need to define a data type that we haven't seen before.

Xmlattrptr newattr;

We also need xmlnodeptr:

Xmlnodeptr NewNode;

The remaining Parsedoc, as before, checks whether the root node is story. If yes, then we know we will add our elements at the specified location.
①newnode = xmlnewtextchild (cur, NULL, "reference", null);
②newattr = Xmlnewprop (newnode, "uri", URI);

① uses the Xmlnewtextchild function to add a new node to the current node location.
Once the node is added, the file should be written to disk as the previous example of the elements and text we added.

Get Properties
Getting the property value is similar to the text content of the previous we get a node. In this example, we'll take out the value of the URI that we added in the previous section. Complete code: Appendix G, get the property value routine code.

The initial step of this example is similar to the previous one: parsing the document, finding the element you are interested in, and then entering a function to complete the specified request task. In this example, we call getreference.

void
GetReference (xmldocptr doc, xmlnodeptr cur) {
Xmlchar *uri;
Cur = cur->xmlchildrennode;
while (cur!= NULL) {
if ((!XMLSTRCMP (Cur->name, (const XMLCHAR *) "reference")) {
①uri = Xmlgetprop (cur, "uri");
printf ("uri:%s\n", URI);
Xmlfree (URI);
}
Cur = cur->next;
}
Return
}

The ① key function is Xmlgetprop, which returns a Xmlchar that contains the value of the property. In this case, we just print it.

Comments
If you use a DTD to define a fixed value or default value for a property, the function will also get it.

Encoding Conversion
Data-Encoding compatibility issues are the most common difficulties for programmers when creating new, generic XML or specific XML. According to this
A later discussion to think about designing your application will help you avoid this difficulty. In fact, Libxml can save and manipulate multiple data in a UTF-8 format.
Your program uses other data formats, such as common iso-8859-1 encoding, that must be converted to UTF-8 using the Libxml function. If you want your program to output other than UTF-8, you must also do the conversion.
If you can effectively transform the data libxml will use the converter. Without converters, only UTF-8, UTF-16, and iso-8859-1 can be used as external formats. When there is a converter, it can be used in any format that is interchangeable from other formats with UTF-8. The current converter supports conversion between approximately 150 different encoding formats. The number of formats actually supported is being implemented. Every real-time converter supports every format as much as possible.

Warning
A common mistake is to use different encoding formats for different parts of the internal data. The most common scenario is that an application uses iso-8859-1 as the internal data format, combining the libxml part with the UTF-8 format. The result is an application that faces different internal data formats. After a part of the code executes, it or some other part of the code will use the distorted data.
This example constructs a simple document, then adds the content provided on the command line to the root element and outputs the result to the standard output device using the appropriate encoding. In this example, we use ISO-8859-1 encoding. The contents of the command input will be converted from iso-8859-1 to UTF-8. Complete code: Attachment H, coded conversion routine code.

The transformation function contained in the example uses the Libxml Xmlfindcharencodinghandler function.

①xmlcharencodinghandlerptr handler;
②size = (int) strlen (in) +1;
Out_size = size*2-1;
out = malloc ((size_t) out_size);
...
③handler = Xmlfindcharencodinghandler (encoding);
...
④handler->input (out, &out_size, in, &temp);
...
⑤xmlsaveformatfileenc ("-", Doc, encoding, 1);

① defines a xmlcharencodinghandler function pointer.
The ②xmlcharencodinghandler function needs to give the size of the input and output strings, where the input and output strings are computed.
③xmlfindcharencodinghandler uses the data initial encoding as a parameter to search Libxml the completed converter handle and returns the found function pointer, or NULL if it is not found.
④the conversion function identified by handler requires as it arguments pointers to the input and output strings, along W ith the length of each. The lengths must is determined separately by the application.
The transformation function specified by the handle requests input, output characters, and their lengths as arguments. This length must be specified separately by the application.
⑤ using the specified encoding instead of the UTF-8 output, we use XMLSAVEFORMATFILEENC to refer to the irregular encoding.

A. Compiling
Libxml contains a script xml2-config, which is typically used to generate flags when compiling and linking programs to libraries.
In order to get preprocessing and compile flags, use xml2-config–cflags, in order to get the link mark, use Xml2-config–libs. Other valid parameters please use XML2-CONFIG–HELP lookup.

B. Sample Documentation
<?xml version= "1.0"?>
<story>
<storyinfo>
<author>john fleck</author>
<datewritten>june 2, 2002</datewritten>
<keyword>example keyword</keyword>
</storyinfo>
<body>
<para>this is the body text.</para>
</body>
</story>
C. Keyword routine code
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
void
Parsestory (xmldocptr doc, xmlnodeptr cur) {
Xmlchar *key;
Cur = cur->xmlchildrennode;
while (cur!= NULL) {
if ((!XMLSTRCMP (Cur->name, (const XMLCHAR *) "keyword")) {
Key = Xmlnodelistgetstring (Doc, Cur->xmlchildrennode, 1);
printf ("Keyword:%s\n", key);
Xmlfree (key);
}
Cur = cur->next;
}
Return
}
static void
Parsedoc (char *docname) {
Xmldocptr Doc;
Xmlnodeptr cur;
doc = Xmlparsefile (docname);
if (doc = = NULL) {
fprintf (stderr, "Document not parsed successfully. \ n");
Return
}
cur = xmldocgetrootelement (doc);
if (cur = = NULL) {
fprintf (stderr, "empty document\n");
Xmlfreedoc (DOC);
Return
}
if (xmlstrcmp (Cur->name, (const XMLCHAR *) "story")) {
fprintf (stderr, "Document of the wrong type, root node!= story");
Xmlfreedoc (DOC);
Return
}
Cur = cur->xmlchildrennode;
while (cur!= NULL) {
if ((!XMLSTRCMP (Cur->name, (const XMLCHAR *) "Storyinfo")) {
Parsestory (doc, cur);
}
Cur = cur->next;
}
Xmlfreedoc (DOC);
Return
}
Int
Main (int argc, char **argv) {
Char *docname;
if (argc <= 1) {
printf ("Usage:%s docname\n", argv[0]);
return (0);
}
DocName = argv[1];
Parsedoc (DocName);
return (1);
}
Libxml (iii)
D. XPath routine code
#include <libxml/parser.h>
#include <libxml/xpath.h>
Xmldocptr
Getdoc (char *docname) {
Xmldocptr Doc;
doc = Xmlparsefile (docname);
if (doc = = NULL) {
fprintf (stderr, "Document not parsed successfully. \ n");
return NULL;
}
return doc;
}
Xmlxpathobjectptr
Getnodeset (xmldocptr doc, Xmlchar *xpath) {
Xmlxpathcontextptr context;
Xmlxpathobjectptr result;
context = Xmlxpathnewcontext (doc);
result = Xmlxpathevalexpression (XPath, context);
if (Xmlxpathnodesetisempty (Result->nodesetval)) {
printf ("No result\n");
return NULL;
}
Xmlxpathfreecontext (context);
return result;
}
Int
Main (int argc, char **argv) {
Char *docname;
Xmldocptr Doc;
Xmlchar *xpath = ("//keyword");
Xmlnodesetptr nodeset;
Xmlxpathobjectptr result;
int i;
Xmlchar *keyword;
if (argc <= 1) {
printf ("Usage:%s docname\n", argv[0]);
return (0);
}
DocName = argv[1];
doc = Getdoc (docname);
result = Getnodeset (doc, XPath);
if (result) {
NodeSet = result->nodesetval;
for (i=0 i < nodeset->nodenr; i++) {
Keyword = xmlnodelistgetstring (doc, nodeset->nodetab[i]->printf
("Keyword:%s\n", keyword);
Xmlfree (keyword);
}
Xmlxpathfreeobject (result);
}
Xmlfreedoc (DOC);
Xmlcleanupparser ();
return (1);
}
E. Add keyword routine code
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
void
Parsestory (xmldocptr doc, xmlnodeptr cur, char *keyword) {
Xmlnewtextchild (cur, NULL, "keyword", keyword);
Return
}
Xmldocptr
Parsedoc (char *docname, char *keyword) {
Xmldocptr Doc;
Xmlnodeptr cur;
doc = Xmlparsefile (docname);
if (doc = = NULL) {
fprintf (stderr, "Document not parsed successfully. \ n");
return (NULL);
}
cur = xmldocgetrootelement (doc);
if (cur = = NULL) {
fprintf (stderr, "empty document\n");
Xmlfreedoc (DOC);
return (NULL);
}
if (xmlstrcmp (Cur->name, (const XMLCHAR *) "story")) {
fprintf (stderr, "Document of the wrong type, root node!= story");
Xmlfreedoc (DOC);
return (NULL);
}
Cur = cur->xmlchildrennode;
while (cur!= NULL) {
if ((!XMLSTRCMP (Cur->name, (const XMLCHAR *) "Storyinfo")) {
Parsestory (doc, cur, keyword);
}
Cur = cur->next;
}
return (DOC);
}
Int
Main (int argc, char **argv) {
Char *docname;
Char *keyword;
Xmldocptr Doc;
if (argc <= 2) {
printf ("Usage:%s docname, keyword\n", argv[0]);
return (0);
}
DocName = argv[1];
keyword = argv[2];
doc = Parsedoc (docname, keyword);
if (Doc!= NULL) {
Xmlsaveformatfile (DocName, doc, 0);
Xmlfreedoc (DOC);
}
return (1);
}
F. Add property routine code
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
Xmldocptr
Parsedoc (char *docname, char *uri) {
Xmldocptr Doc;
Xmlnodeptr cur;
Xmlnodeptr NewNode;
Xmlattrptr newattr;
doc = Xmlparsefile (docname);
if (doc = = NULL) {
fprintf (stderr, "Document not parsed successfully. \ n");
return (NULL);
}
cur = xmldocgetrootelement (doc);
if (cur = = NULL) {
fprintf (stderr, "empty document\n");
Xmlfreedoc (DOC);
return (NULL);
}
if (xmlstrcmp (Cur->name, (const XMLCHAR *) "story")) {
fprintf (stderr, "Document of the wrong type, root node!= story");
Xmlfreedoc (DOC);
return (NULL);
}
NewNode = xmlnewtextchild (cur, NULL, "reference", null);
Newattr = Xmlnewprop (newnode, "uri", URI);
return (DOC);
}
Int
Main (int argc, char **argv) {
Char *docname;
Char *uri;
Xmldocptr Doc;
if (argc <= 2) {
printf ("Usage:%s docname, uri\n", argv[0]);
return (0);
}
DocName = argv[1];
URI = argv[2];
doc = Parsedoc (DocName, URI);
if (Doc!= NULL) {
Xmlsaveformatfile (DocName, doc, 1);
Xmlfreedoc (DOC);
}
return (1);
}
G. Getting the property value routine code
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <libxml/xmlmemory.h>
#include <libxml/parser.h>
void
GetReference (xmldocptr doc, xmlnodeptr cur) {
Xmlchar *uri;
Cur = cur->xmlchildrennode;
while (cur!= NULL) {
if ((!XMLSTRCMP (Cur->name, (const XMLCHAR *) "reference")) {
URI = Xmlgetprop (cur, "uri");
printf ("uri:%s\n", URI);
Xmlfree (URI);
}
Cur = cur->next;
}
Return
}
void
Parsedoc (char *docname) {
Xmldocptr Doc;
Xmlnodeptr cur;
doc = Xmlparsefile (docname);
if (doc = = NULL) {
fprintf (stderr, "Document not parsed successfully. \ n");
Return
}
cur = xmldocgetrootelement (doc);
if (cur = = NULL) {
fprintf (stderr, "empty document\n");
Xmlfreedoc (DOC);
Return
}
if (xmlstrcmp (Cur->name, (const XMLCHAR *) "story")) {
fprintf (stderr, "Document of the wrong type, root node!= story");
Xmlfreedoc (DOC);
Return
}
GetReference (doc, cur);
Xmlfreedoc (DOC);
Return
}
Int
Main (int argc, char **argv) {
Char *docname;
if (argc <= 1) {
printf ("Usage:%s docname\n", argv[0]);
return (0);
}
DocName = argv[1];
Parsedoc (DocName);
return (1);
}
H. Coded conversion Routine Code
#include <string.h>
#include <libxml/parser.h>
unsigned char*
Convert (unsigned char *in, char *encoding)
{
unsigned char *out;
int ret,size,out_size,temp;
Xmlcharencodinghandlerptr handler;
size = (int) strlen (in) +1;
Out_size = size*2-1;
out = malloc ((size_t) out_size);
if (out) {
Handler = Xmlfindcharencodinghandler (encoding);
if (!handler) {
Free (out);
out = NULL;
}
}
if (out) {
temp=size-1;
ret = Handler->input (out, &out_size, in, &temp);
if (ret | | temp-size+1) {
if (ret) {
printf ("Conversion wasn ' t successful.\n");
} else {
printf ("Conversion wasn ' t successful. Converted:}
Free (out);
out = NULL;
} else {
out = ReAlloc (out,out_size+1);
out[out_size]=0; /*null terminating out*/
}
} else {
printf ("No mem\n");
}
return (out);
}
Int
Main (int argc, char **argv) {
unsigned char *content, *out;
Xmldocptr Doc;
Xmlnodeptr RootNode;
Char *encoding = "iso-8859-1";
if (argc <= 1) {
printf ("Usage:%s content\n", argv[0]);
return (0);
}
Content = argv[1];
out = CONVERT (content, encoding);
doc = Xmlnewdoc ("1.0");
RootNode = Xmlnewdocnode (Doc, NULL, (const xmlchar*) "root", out);
Xmldocsetrootelement (Doc, RootNode);
Xmlsaveformatfileenc ("-", Doc, encoding, 1);
return (1);
}
......................................................................................................
Char *convert (char *instr,char *encoding)
{
Xmlcharencodinghandlerptr handler;
Xmlbufferptr in,out;
Handler = Xmlfindcharencodinghandler (encoding);
if (NULL!= handler)
{
in = Xmlbuffercreate ();
Xmlbufferwritechar (IN,INSTR);
out = Xmlbuffercreate ();
if (Xmlcharencinfunc (handler, out, in) 〈0)
{
Xmlbufferfree (in);
Xmlbufferfree (out);
return NULL;
}
Else
{
Xmlbufferfree (in);
Return (char *) out-〉content;
}
}
}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Parsing--libxml library function interpretation of XML file

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Parsing--libxml library function interpretation of XML file

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support