PHP Four ways to parse XML

Source: Internet
Author: User
Tags tagname xml parser

XML processing is often encountered in the development process, PHP is also very rich support, this article is just a few of the analytical techniques to do a brief description, including: XML parser, SimpleXML, XMLReader, DOMDocument.

1. XML Expat Parser:

XML parser uses the expat XML parser. Expat is an event-based parser that treats XML documents as a series of events. When an event occurs, it invokes a specified function to process it. Expat is a non-validated parser that ignores any DTD that is linked to a document. However, if the document does not form well, it ends with an error message. Because it is event-based and has no validation, expat has features that are fast and appropriate for Web applications.

The advantage of XML parser is good performance, because it does not load the entire XML document into memory before processing, but the edge parsing edge processing. But because of this, it is not suitable for the need to make dynamic adjustments to the XML structure, or to do complex operations based on the XML context structure. If you just want to parse a well-structured XML document, it can do a good job. It is important to note that XML parser only supports three encoding formats: US-ASCII, iso-8859-1 and UTF-8, if your XML data is other encodings, you need to convert to one of the above three first.
There are generally two kinds of parsing methods used in XML parser (actually two functions): Xml_parse_into_struct and Xml_set_element_handler.


This method parses the XML data into two arrays:
Index array--a pointer to the position of the value in an array of value
Value array--contains the data from the parsed XML

These two arrays are a bit of a hassle to describe, or see an example (from the official PHP document)

$simple = "<para><note>simple note</note></para>"; $p = Xml_parser_create (); xml_parse_into_ struct ($p, $simple, $vals, $index); Xml_parser_free ($p); echo "Index array\n";p rint_r ($index); echo "\nvals array\n"; Print_r ($vals);

Index Array
[PARA] = = Array
[0] = 0
[1] = 2

[NOTE] = = Array
[0] = 1

Vals Array
[0] = = Array
[Tag] = PARA
[Type] = Open
[Level] = 1

[1] = = Array
[Tag] = NOTE
[Type] = complete
[Level] = 2
[value] = simple note

[2] = = Array
[Tag] = PARA
[Type] + = Close
[Level] = 1

Where the index array is named key, the corresponding value is an array containing all the positions of this label in the value array. Then, through this location, find the value for this label.

If there is a discrepancy in each set of data formats in the XML and cannot be fully unified, be careful when writing the code, and you may get the wrong result. For example, the following:

$xml = ' <infos><para><note>note1</note><extra>extra1</extra></para> <para><note>note2</note></para><para><note>note3</note><extra> Extra3</extra></para></infos> '; $p = Xml_parser_create (); Xml_parse_into_struct ($p, $xml, $values, $tags); Xml_parser_free ($p); $result = Array ();//The following traversal method has a bug hidden for ($i =0; $i <3; $i + +) {  $result [$i] = array ();  $result [$i] [Note] = $values [$tags ["Note"] [$i]]["value"];  $result [$i] [Extra] = $values [$tags ["Extra"] [$i]]["value"];} Print_r ($result);

If you follow the same way, it seems that the code is simple, but the most deadly thing is to get the wrong result (Extra3 ran to the second para). So to traverse in a more rigorous way:

$result = Array (), $paraTagIndexes = $tags [' PARA ']; $paraCount = count ($paraTagIndexes); for ($i = 0; $i < $paraCount; $i + = 2) {  $para = array ();  Traverse all values between the Para label pairs for  ($j = $paraTagIndexes [$i]; $j < $paraTagIndexes [$i +1]; $j + +) {    $value = $values [$j] Value '];    if (empty ($value)) continue;    $tagname = Strtolower ($values [$j] [' tag ']);    if (In_array ($tagname, Array (' note ', ' extra ')) {      $para [$tagname] = $value;    }  }  $result [] = $para;}

In fact, I rarely use the xml_parse_into_struct function, so the so-called "rigorous" code discouragement there will be other cases of the bug. - -|

This is the way to set the callback function that handles element initiation, element termination for parser. There is also a callback function Xml_set_character_data_handler used to set the data for the parser. The code written in this way is clearer and easier to maintain.


$xml = <<<XML<infos><para><note>note1</note><extra>extra1</extra> </para><para><note>note2</note></para><para><note>note3</note> <extra>extra3</extra></para></infos>XML; $result = Array (); $index = 1; $currData; function  Charactor ($parser, $data) {global $currData; $currData = $data;}  function startelement ($parser, $name, $attribs) {global $result, $index;  $name = Strtolower ($name);    if ($name = = ' para ') {$index + +;  $result [$index] = array ();  }}function endElement ($parser, $name) {global $result, $index, $currData;  $name = Strtolower ($name);  if ($name = = ' Note ' | | $name = = ' Extra ') {$result [$index] [$name] = $currData; }} $xml _parser = Xml_parser_create (); Xml_set_character_data_handler ($xml _parser, "charactor"); Xml_set_element_  Handler ($xml _parser, "startelement", "endElement"), if (!xml_parse ($xml _parser, $xml)) {echo "Error when parse xml:"; Echo Xml_erroR_string (Xml_get_error_code ($xml _parser));} Xml_parser_free ($xml _parser);p rint_r ($result);

As can be seen, the set handler way, although the number of lines of code, but clear-minded, more readable, but performance slightly slower than the first way, and not strong flexibility. XML parser supports PHP4 for use with older versions of the system. For the PHP5 environment, consider the following method as a priority.

2. SimpleXML

SimpleXML is an easy-to-use XML toolset provided after PHP5 that transforms XML into an easy-to-handle object or organizes XML data to be generated. However, it does not apply to XML that contains namespace, and is guaranteed to be complete in XML format (well-formed). It provides three methods: Simplexml_import_dom, Simplexml_load_file, simplexml_load_string, and the function name is very intuitive to illustrate the function. All three functions return the SimpleXMLElement object, and the read/add of the data is done through the simplexmlelement operation.

$string = <<<xml<?xml version= ' 1.0 '?><document>  <cmd>login</cmd>  < login>imdonkey</login></document>xml; $xml = simplexml_load_string ($string);p rint_r ($xml); $login = $ Xml->login;//returned here is still a SimpleXMLElement object Print_r ($login); $login = (string) $xml->login;//when doing a data comparison, Note that the Print_r ($login) must be cast first;

The advantage of SimpleXML is that it is easy to develop, and the downside is that it will load the entire XML into memory before processing it, so it may be too weak to parse an XML document that is hyper-content. If you are reading small files, and the XML does not contain namespace, then SimpleXML is a good choice.

3. XMLReader

XmlReader is also an extension after PHP5 (after 5.1 default installation), it moves in the document flow like a cursor, and stops at each node and is flexible to operate. It provides fast and non-cached streaming access to inputs that can read streams or documents, allow users to extract data from them, and skip records that have no meaning to the application.
Using the Google Weather API to get information examples of the use of the next XmlReader, here is only a small number of functions, please refer to the official documentation.

 $xml _uri = ' HTTP://WWW.GOOGLE.COM/IG/API?WEATHER=BEIJING&HL=ZH-CN '; $current = Array (); $forecast = Array (); $ reader = new XMLReader (), $reader->open ($xml _uri, ' GBK '), while ($reader->read ()) {//get Current data if ($reader-& Gt;name = = "Current_conditions" && $reader->nodetype = = xmlreader::element) {while ($reader->read () &      & $reader->name! = "Current_conditions") {$name = $reader->name;      $value = $reader->getattribute (' data ');    $current [$name] = $value; }}//get forecast data if ($reader->name = = "Forecast_conditions" && $reader->nodetype = = Xmlreader::ele    ment) {$sub _forecast = array ();      while ($reader->read () && $reader->name! = "Forecast_conditions") {$name = $reader->name;      $value = $reader->getattribute (' data ');    $sub _forecast[$name] = $value;  } $forecast [] = $sub _forecast; }} $reader->close (); 

XmlReader and XML parser are similar, all are side-read edge operations, the big difference is that the SAX model is a "push" model, where the parser pushes events to the application, notifies the application each time a new node is read, Applications that use XmlReader can extract nodes from the reader at will, and are more controllable.
Since XmlReader is based on libxml, some functions should refer to the documentation to see if they apply to your libxml version.

4. DOMDocument

DOMDocument is also part of the DOM extension introduced after PHP5, which can be used to establish or parse html/xml, which currently only supports UTF-8 encoding.

 $xmlstring = <<<xml<?xml version= ' 1.0 '?><document> <cmd attr= ' default ' >login</ cmd> <login>imdonkey</login></document>XML; $dom = new DOMDocument (); $dom->loadxml ($  xmlstring);p Rint_r (GetArray ($dom->documentelement)), function GetArray ($node) {$array = false; if ($node->hasattributes ()) {foreach ($node->attributes as $attr) {$array [$attr->nodename] = $attr    NodeValue; }} if ($node->haschildnodes ()) {if ($node->childnodes->length = = 1) {$array [$node->firstchild->    ; NodeName] = GetArray ($node->firstchild); } else {foreach ($node->childnodes as $childNode) {if ($childNode->nodetype! = Xml_text_node) {$      array[$childNode->nodename][] = GetArray ($childNode);  }}}} else {return $node->nodevalue; } return $array;} 

It looks like JavaScript from the function name, it should be borrowed from some. DOMDocument is also a one-time load of XML into memory, so memory problems also need attention. PHP provides so much XML processing that the developer chooses to take some time to understand and choose a method that is appropriate for the project's needs and system environment, and is easy to maintain.




PHP Four ways to parse XML (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.