Four ways to parse XML in PHP
XML processing is often encountered in the development process, PHP also has a very rich support, this article is only a few of these analytical techniques to do a brief description, including: XML parser, SimpleXML, XMLReader, DOMDocument.
1. XML Expat Parser:
XML parser uses expat XML parsers. Expat is an event-based parser that treats an XML document as a series of events. When an event occurs, it invokes a specified function to handle it. Expat is an unauthenticated parser that ignores any DTD that is linked to the document. However, if the document is not in a good form, it ends with an error message. Because it is event-based and without validation, expat has the characteristics of being fast and appropriate for Web applications.
The advantage of XML parser is that it is good because it does not load the entire XML document into memory and then processes it, but it resolves the edges. Because of this, however, it is not suitable for those who want to dynamically adjust the XML structure or do complex operations based on the XML context structure. If you're just parsing a well-formed XML document, it can do a good job. Note that the XML parser only supports three encoding formats: US-ASCII, Iso-8859-1, and UTF-8, and if your XML data is another encoding, you need to convert to one of the above three first.
There are generally two kinds of parsing methods used in XML parser (two functions in fact): Xml_parse_into_struct and Xml_set_element_handler.
Xml_parse_into_struct
This approach is to parse the XML data into two arrays:
Index array-A pointer to a position that points to a value in the value array
Value array--contains the data from the parsed XML
These two arrays are a bit of a hassle to describe, let's look at an example (from the official PHP document)
$simple = "<para><note>simple note</note></para>";
$p = Xml_parser_create ();
Xml_parse_into_struct ($p, $simple, $vals, $index);
Xml_parser_free ($p);
echo "Index array\n";
Print_r ($index);
echo "\nvals array\n";
Print_r ($vals);
Output:
Index Array
array
(
[PARA] => array
(
[0] => 0
[1] => 2
)
[note] = > Array
(
[0] => 1
)
)
Vals array
Array ([
0] => array
(
[tag] => PARA
[Type] => open
[level] => 1
]
[1] => Array
(
[tag] => note< C34/>[type] => complete
[level] => 2
[value] => simple note
]
[2] => Array
(
[tag] => PARA
[type] => close
[level] => 1
)
)
Where the index array is labeled key, the corresponding value is an array containing all the positions of the tag in the value array. Then through this position, find the value of this label.
If there is a discrepancy in the format of each set of data in the XML and cannot be completely unified, be careful when writing code, or you may get the wrong result. For example, the following example:
$xml = '
<infos>
<para><note>note1</note><extra>extra1</extra></ para>
<para><note>note2</note></para>
<para><note>note3</ note><extra>extra3</extra></para>
</infos>
';
$p = Xml_parser_create ();
Xml_parse_into_struct ($p, $xml, $values, $tags);
Xml_parser_free ($p);
$result = Array ();
The following traversal method has a bug vulnerability for
($i =0 $i <3; $i + +) {
$result [$i] = array ();
$result [$i] ["note"] = $values [$tags ["Note"] [$i]]["value"];
$result [$i] ["extra"] = $values [$tags ["Extra"] [$i]]["value"];
}
Print_r ($result);
If you go through it the way it looks, the code seems simple, but the danger is that the worst is the wrong result (extra3 to the second para). So go through it in a more rigorous way:
$result = Array ();
$paraTagIndexes = $tags [' PARA '];
$paraCount = count ($paraTagIndexes);
for ($i = 0; $i < $paraCount; $i + = 2) {
$para = array ();
Traverses all values between para label pairs for
($j = $paraTagIndexes [$i]; $j < $paraTagIndexes [$i +1]; $j + +) {
$value = $values [$j] Value '];
if (empty ($value)) continue;
$tagname = Strtolower ($values [$j] [' tag ']);
if (In_array ($tagname, Array (' note ', ' extra ')) {
$para [$tagname] = $value;
}
}
$result [] = $para;
}
In fact, I rarely use the xml_parse_into_struct function, so the above so-called "rigorous" code Baobuzzi there will be other cases of bugs. - -|
Xml_set_element_handler
This is to set the callback function for the parser to handle the start of the element and the termination of the element. It also has a callback function that Xml_set_character_data_handler uses to set data for the parser. The code written in this way is more clear and helps maintain.
Example:
$xml = <<<xml <infos> <para><note>note1</note><extra>extra1</extra> </para> <para><note>note2</note></para> <para><note>note3</note>
<extra>extra3</extra></para> </infos> XML;
$result = Array ();
$index =-1;
$currData;
function Charactor ($parser, $data) {global $currData;
$currData = $data;
function startelement ($parser, $name, $attribs) {global $result, $index;
$name = Strtolower ($name);
if ($name = = ' para ') {$index + +;
$result [$index] = array ();
} function EndElement ($parser, $name) {global $result, $index, $currData;
$name = Strtolower ($name);
if ($name = = ' Note ' | | | $name = = ' Extra ') {$result [$index] [$name] = $currData;
}} $xml _parser = Xml_parser_create ();
Xml_set_character_data_handler ($xml _parser, "charactor");
Xml_set_element_handler ($xml _parser, "startelement", "endelement"); if (!xml_parse ($xml _parser, $xml)) {echo "Error when parseXML: ";
Echo xml_error_string (Xml_get_error_code ($xml _parser));
} xml_parser_free ($xml _parser);
Print_r ($result);
As you can see, the set handler method has a lot of lines of code, but it's clear and readable, but the performance is slightly slower than the first way, and the flexibility is not strong. The XML parser supports PHP4 and is suitable for use with older versions of the system. For PHP5 environment, the following method is preferred.
2. SimpleXML
SimpleXML is a set of Easy-to-use XML toolset that PHP5 provides to convert XML into an easily processed object or to organize the generation of XML data. However, it does not apply to XML that contains namespace, but also to ensure that the XML is well-formed (well-formed). It provides three methods: Simplexml_import_dom, Simplexml_load_file, simplexml_load_string, and the function name is a straightforward illustration of the function. All three functions return the SimpleXMLElement object, and the data is read/added through the simplexmlelement operation.
$string = <<<xml
<?xml version= ' 1.0 '?>
<document>
<cmd>login</cmd>
<login>imdonkey</login>
</document>
XML;
$xml = simplexml_load_string ($string);
Print_r ($xml);
$login = $xml->login;//returned here is still a SimpleXMLElement object
print_r ($login);
$login = (string) $xml->login;//when doing a data comparison, note that the
Print_r ($login) is forced to be cast first;
The advantage of SimpleXML is that it is simple to develop, with the disadvantage that it will load the entire XML into memory and then process it, so it may not be able to parse the XML document for the hyper-content. If you are reading a small file, and the XML does not contain namespace, then SimpleXML is a good choice.
3. XMLReader
XmlReader is also an extension after PHP5 (installed by default after 5.1), which moves in the document flow like a cursor and stops at each node, and is flexible to operate. It provides fast and non cached streaming access to input, can read streams or documents, allows users to extract data from them, and skips records that have no meaning to the application.
Using a Google weather API to get information about the use of XmlReader, here is only a small part of the function, more also refer to the official documentation.
$xml _uri = ' HTTP://WWW.GOOGLE.COM/IG/API?WEATHER=BEIJING&HL=ZH-CN '; $current =
Array ();
$forecast = Array ();
$reader = new XMLReader ();
$reader->open ($xml _uri, ' GBK '); while ($reader->read ()) {//get Current data if ($reader->name = = "Current_conditions" && $reader->node Type = = xmlreader::element) {while ($reader->read () && $reader->name!= "Current_conditions") {$name =
$reader->name;
$value = $reader->getattribute (' data ');
$current [$name] = $value; }//get forecast Data if ($reader->name = = "Forecast_conditions" && $reader->nodetype = = Xmlreader::el
ement) {$sub _forecast = array ();
while ($reader->read () && $reader->name!= "Forecast_conditions") {$name = $reader->name;
$value = $reader->getattribute (' data ');
$sub _forecast[$name] = $value;
$forecast [] = $sub _forecast;
}} $reader->close ();
XmlReader is similar to XML parser, which is a side-reading operation, the big difference being that the SAX model is a "push" model in which the parser pushes events to the application and notifies the application each time a new node is read. Applications using XmlReader can extract nodes from the reader at will, and the control is better.
Because XmlReader is based on libxml, some functions refer to the documentation to see if it applies to your libxml version.
4. DOMDocument
DOMDocument is also part of the DOM extension PHP5, which can be used to build or parse html/xml, and only UTF-8 encoding is currently supported.
$xmlstring = <<<xml <?xml version= ' 1.0 '?> <document> <cmd ' attr= '
Default ' >login</cmd> <login>imdonkey</login> </document> XML;
$dom = new DOMDocument ();
$dom->loadxml ($xmlstring);
Print_r (GetArray ($dom->documentelement));
function GetArray ($node) {$array = false; if ($node->hasattributes ()) {foreach ($node->attributes as $attr) {$array [$attr->nodename] = $attr->nod
Evalue; } if ($node->haschildnodes ()) {if ($node->childnodes->length = 1) {$array [$node->firstchild->n
Odename] = GetArray ($node->firstchild); else {foreach ($node->childnodes as $childNode) {if ($childNode->nodetype!= xml_text_node) {$array [$
Childnode->nodename][] = GetArray ($childNode);
else {return $node->nodevalue;
return $array; }
From the function name to see the feeling is similar to JavaScript, it should be a reference. DOMDocument is also a one-time load of XML into memory, so memory issues also need to be noted. With so much XML processing in PHP, developers need to take the time to learn how to choose a method that suits the needs of the project and the system environment and is easy to maintain.
Thank you for reading, I hope to help you, thank you for your support for this site!