Php parses xml in four ways

Source: Internet
Author: User
Tags xml parser
Php parses xml in four ways. XML processing is often encountered during the development process. PHP also has rich support for it. This article only briefly describes some of the parsing technologies, including Xml parser, SimpleXML, XMLReader, and DOMDocument.

1. XML Expat Parser:

XML Parser uses the Expat XML Parser. Expat is an event-based parser that treats XML documents as a series of events. When an event occurs, it calls a specified function to process it. Expat is an unverified parser that ignores any DTD linked to the document. However, if the document format is not good, it will end with an error message. Because it is event-based and does not validate, Expat is fast and suitable for web applications.

The advantage of XML Parser is its good performance because it does not load the entire xml file into the memory before processing, but parse and process it. However, because of this, it is not suitable for those who need to dynamically adjust the xml structure or perform complex operations based on the xml context structure. If you only want to parse and process a well-structured xml document, it can well complete the task. Note that XML Parser only supports three encoding formats: US-ASCII, ISO-8859-1, and UTF-8. if your xml data is other encoding, you need to first convert it to one of the three.
There are two common parsing methods for XML Parser (actually two functions): xml_parse_into_struct and xml_set_element_handler.


Xml_parse_into_struct

This method parses xml data into two arrays:
Index array -- contains a pointer to the position in the Value array
Value array -- contains data from the parsed XML

These two arrays are difficult to describe. let's take a look at the example (from the official php documentation)

$simple = "
 
  
   simple note
  
 ";$p = xml_parser_create();xml_parse_into_struct($p, $simple, $vals, $index);xml_parser_free($p);echo "Index array\n";print_r($index);echo "\nVals array\n";print_r($vals);

Output:
Index array
Array
(
[PARA] => Array
(
[0] => 0
[1] => 2
)

[NOTE] => Array
(
[0] => 1
)
)

Vals array
Array
(
[0] => Array
(
[Tag] => PARA
[Type] => open
[Level] => 1
)

[1] => Array
(
[Tag] => NOTE
[Type] => complete
[Level] => 2
[Value] => simple note
)

[2] => Array
(
[Tag] => PARA
[Type] => close
[Level] => 1
)
)

The index array uses the tag name key, and the corresponding value is an array, which contains all the positions of this tag in the value array. Then, locate the value corresponding to the tag.

If the format of each group of data in xml is different and cannot be completely unified, you should pay attention to it when writing code, and you may get the wrong result. For example:

$ Xml ='
 
  
   
    
Note1
   
   
    
Extra1
   
  
  
   
    
Note2
   
  
  
   
    
Note3
   
   
    
Extra3
   
  
 '; $ P = xml_parser_create (); xml_parse_into_struct ($ p, $ xml, $ values, $ tags); xml_parser_free ($ p); $ result = array (); // The following traversal method has bugs. for ($ I = 0; $ I <3; $ I ++) {$ result [$ I] = array (); $ result [$ I] ["note"] = $ values [$ tags ["NOTE"] [$ I] ["value"]; $ result [$ I] ["extra"] = $ values [$ tags ["EXTRA"] [$ I] ["value"];} print_r ($ result );

If we traverse through the above method, it seems that the code is simple, but the hidden crisis, the most fatal is to get the wrong result (extra3 ran to the second para ). Therefore, we need to traverse in a more rigorous way:

$ Result = array (); $ paraTagIndexes = $ tags ['para']; $ paraCount = count ($ paraTagIndexes); for ($ I = 0; $ I <$ paraCount; $ I + = 2) {$ para = array (); // traverses all values between para tag pairs for ($ j = $ paraTagIndexes [$ I]; $ j <$ paraTagIndexes [$ I + 1]; $ j ++) {$ value = $ values [$ j] ['value']; if (empty ($ value) continue; $ tagname = strtolower ($ values [$ j] ['tag']); if (in_array ($ tagname, array ('note', 'Extra ') {$ para [$ tagname] = $ value ;}$ result [] = $ para ;}

In fact, I rarely use the xml_parse_javas_struct function, so the above so-called "rigorous" code will have bugs in other situations. -|
Xml_set_element_handler

This method sets a callback function for parser to process the start and end of an element. Xml_set_character_data_handler is used to set the data callback function for parser. The code written in this way is clear and easy to maintain.

Example:

$xml = <<
 
  
   
    note1
   
   
    extra1
   
  
  
   
    note2
   
  
  
   
    note3
   
   
    extra3
   
  XML;$result = array();$index = -1;$currData;function charactor($parser, $data) {  global $currData;  $currData = $data;}function startElement($parser, $name, $attribs) {  global $result, $index;  $name = strtolower($name);  if($name == 'para') {    $index++;    $result[$index] = array();  }}function endElement($parser, $name) {  global $result, $index, $currData;  $name = strtolower($name);  if($name == 'note' || $name == 'extra') {    $result[$index][$name] = $currData;  }}$xml_parser = xml_parser_create();xml_set_character_data_handler($xml_parser, "charactor");xml_set_element_handler($xml_parser, "startElement", "endElement");if (!xml_parse($xml_parser, $xml)) {  echo "Error when parse xml: ";  echo xml_error_string(xml_get_error_code($xml_parser));}xml_parser_free($xml_parser);print_r($result);
 

It can be seen that although the set handler method has many lines of code, its ideas are clear and its readability is better, but its performance is slightly slower than the first method, and its flexibility is not strong. XML Parser supports PHP4 and is suitable for systems of earlier versions. For the PHP5 environment, consider the following methods first.

2. SimpleXML

SimpleXML is a simple and easy-to-use xml tool set provided by PHP5. it can convert xml into easy-to-process objects and organize and generate xml data. However, it does not apply to xml that contains namespace, and must ensure that the xml format is complete (well-formed ). It provides three methods: simplexml_import_dom, simplexml_load_file, and simplexml_load_string. the function name intuitively illustrates the function's role. All three functions return SimpleXMLElement objects, and data reading/adding is performed through SimpleXMLElement.

$ String = <
 
    
   
    
Login
     
   
    
Imdonkey
   
  XML; $ xml = simplexml_load_string ($ string); print_r ($ xml); $ login = $ xml-> login; // the returned result is still a SimpleXMLElement object print_r ($ login); $ login = (string) $ xml-> login; // during data comparison, note that you must first forcibly convert print_r ($ login );
 

SimpleXML is easy to develop. Its disadvantage is that it loads the entire xml into the memory before processing. Therefore, it may not be able to resolve xml documents with too much content. If you want to read small files and xml does not contain namespace, SimpleXML is a good choice.


3. XMLReader

XMLReader is also an extension after PHP5 (installed by default after 5.1). It moves in the document stream just like a cursor and stops at each node, making the operation flexible. It provides fast and non-cache stream access to the input, and can read streams or documents, allowing users to extract data from it and skip records that are meaningless to the application.
The following uses the google Weather api to obtain information. XMLReader only involves a small number of functions. For more information, see the official documentation.

$xml_uri = 'http://www.google.com/ig/api?weather=Beijing&hl=zh-cn';$current = array();$forecast = array();$reader = new XMLReader();$reader->open($xml_uri, 'gbk');while ($reader->read()) {  //get current data  if ($reader->name == "current_conditions" && $reader->nodeType == XMLReader::ELEMENT) {    while($reader->read() && $reader->name != "current_conditions") {      $name = $reader->name;      $value = $reader->getAttribute('data');      $current[$name] = $value;    }  }  //get forecast data  if ($reader->name == "forecast_conditions" && $reader->nodeType == XMLReader::ELEMENT) {    $sub_forecast = array();    while($reader->read() && $reader->name != "forecast_conditions") {      $name = $reader->name;      $value = $reader->getAttribute('data');      $sub_forecast[$name] = $value;    }    $forecast[] = $sub_forecast;  }}$reader->close();

Similar to XML Parser, XMLReader and XML Parser are both read-side operations. The big difference is that the SAX model is a "push" model, where the analyzer pushes events to the application, the application is notified every time a new node is read. the XmlReader application can extract nodes from the reader at will, which is more controllable.
Because XMLReader is based on libxml, you need to refer to the documentation to see if it is applicable to your libxml version.

4. DOMDocument

DOMDocument is part of the DOM extension launched after PHP5 and can be used to establish or parse html/xml. Currently, only UTF-8 encoding is supported.

$xmlstring = <<
 
    
   
    login
     
   
    imdonkey
   
  XML;$dom = new DOMDocument();$dom->loadXML($xmlstring);print_r(getArray($dom->documentElement));function getArray($node) {  $array = false;  if ($node->hasAttributes()) {    foreach ($node->attributes as $attr) {      $array[$attr->nodeName] = $attr->nodeValue;    }  }  if ($node->hasChildNodes()) {    if ($node->childNodes->length == 1) {      $array[$node->firstChild->nodeName] = getArray($node->firstChild);    } else {      foreach ($node->childNodes as $childNode) {      if ($childNode->nodeType != XML_TEXT_NODE) {        $array[$childNode->nodeName][] = getArray($childNode);      }    }  }  } else {    return $node->nodeValue;  }  return $array;}
 

From the perspective of function names, it seems like JavaScript. I should have used it for some reference. DOMDocument also loads xml into the memory at a time, so you also need to pay attention to memory problems. PHP provides so many xml processing methods that developers need to take some time to understand in their selection and select a method suitable for the project requirements and system environment, which is easy to maintain.

Transfer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.