Read and write XML DOM in PHP

Source: Internet
Author: User
Keywords Read and write XML DOM in PHP
Tags php foreach
There are many techniques that can be used to read and write XML in PHP. This article provides three ways to read XML: Using the DOM Library, using the SAX parser, and using regular expressions. It also describes writing XML using the DOM and PHP text templates.

Reading and writing Extensible Markup Language (XML) in PHP may seem a bit scary. In fact, XML and all of its related technologies can be scary, but reading and writing XML in PHP is not necessarily a scary task. First, you need to learn a little bit about XML-what it is and what it does with it. Then you need to learn how to read and write XML in PHP, and there are a number of ways you can do it.

This article provides a short introduction to XML and then explains how to read and write XML in PHP.

   What is XML?

XML is a data storage format. It does not define what data is saved, nor does it define the format of the data. XML simply defines the tags and the attributes of those tags. Well-formed XML tags look like this:

<name> Jack Herrington </name>

This <name> tag contains some text: Jack Herrington.

XML tags that do not contain text look like this:

<powerup/>

There is more than one way to write an item in XML. For example, this tag forms the same output as the previous tag:

<powerUp> </powerUp>

You can also add attributes to XML tags. For example, this <name> tag contains the first and last attributes:

<name first= "Jack" last= "Herrington"/>

You can also encode special characters with XML. For example, the,& symbol can be encoded like this:

&

XML file containing tags and attributes if formatted as an example, it is well-formed, which means that the token is symmetric and the character is encoded correctly. Listing 1 is an example of well-formed XML.

Listing 1. Examples of XML book columns

   
      
       
       
         Jack herrington 
        
        PHP Hacks  
       
         O ' reilly
         
       
       
       
       
         Jack Herrington 
        
        podcasting Hacks  
       
         O ' Reilly 
        
         
     

The XML in Listing 1 contains a list of books. The parent tag <books> contains a set of <book> tags, and each <book> tag also contains <author>, <title>, and <publisher> tags.

XML documents are correct when the markup structure and contents of an XML document are validated by an external schema file. Schema files can be specified in different formats. For this article, all you need is well-formed XML.

If you think that XML looks much like Hypertext Markup Language (HTML), then you are right. Both XML and HTML are markup-based languages, and they have many similarities. However, it is emphasized that although XML documents may be well-formed HTML, not all HTML documents are well-formed XML. A newline tag (BR) is a good example of the difference between XML and HTML. This newline tag is well-formed HTML, but not well-formed XML:

<p> This is a paragraph <br>
With a line break </p>

This newline tag is well-formed XML and HTML:

<p> This is a PARAGRAPH<BR/>
With a line break </p>

If you want to write HTML as well-formed XML, follow the Extensible Hypertext Markup Language (XHTML) standard of the Board (see Resources). All modern browsers can present XHTML. Also, it is much easier to use XML tools to read XHTML and find the data in the document than to parse the HTML.

   reading XML using the DOM library

The easiest way to read well-formed XML files is to use the Document Object Model (DOM) libraries that are compiled into some PHP installations. The DOM library reads the entire XML document into memory and uses the node tree to represent it, as shown in 1.

Figure 1. Xml DOM Tree of book XML


The books node at the top of the tree has two book sub-tags. In each book, there are several nodes for author, publisher, and title. The author, publisher, and title nodes have text sub-nodes that contain text, respectively.

The code for reading the book XML file and displaying the content in the DOM is shown in Listing 2.

Listing 2. Reading the book XML with the DOM

   
     load (' books.xml ');      $books = $doc->getelementsbytagname ("book");   foreach ($books as $book)   {   $authors = $book->getelementsbytagname ("author");   $author = $authors->item (0)->nodevalue;      $publishers = $book->getelementsbytagname ("publisher");   $publisher = $publishers->item (0)->nodevalue;      $titles = $book->getelementsbytagname ("title");   $title = $titles->item (0)->nodevalue;      echo "$title-$author-$publisher \ n";   }   ? >   


The script first creates a new DOMdocument object and loads the library XML into the object using the Load method. The script then uses the Getelementsbyname method to get a list of all the elements under the specified name.

In the book node loop, the script uses the Getelementsbyname method to get the nodevalue of author, publisher, and title tags. NodeValue is the text in the node. The script then displays these values.

You can run PHP scripts like this on the command line:

% PHP e1.php
PHP hacks-jack Herrington-o ' Reilly
Podcasting Hacks-jack Herrington-o ' Reilly
%

As you can see, each library block outputs one line. This is a good start. But what if I can't access the XML DOM library?

Reading XML with the SAX parser

Another way to read XML is to use the XML simple API (SAX) parser. Most installations of PHP contain SAX parsers. The SAX parser runs on the callback model. Each time a marker is opened or closed, or whenever the parser sees the text, the user-defined function is recalled with information from the node or text.

The advantage of the SAX parser is that it is really lightweight. The parser does not persist content in memory for a long term, so it can be used for very large files. The disadvantage is that writing a SAX parser callback is a very cumbersome thing to do. Listing 3 shows the code that uses SAX to read the book XML file and display the content.

Listing 3. Reading the book XML with the SAX parser

 
      

The script first sets up the G_books array, which holds all the book and book Information in memory, and the G_elem variable holds the name of the tag that the script is currently working on. The script then defines the callback function. In this example, the callback functions are startelement, endElement, and TextData. When you open and close the tag, call the Startelement and EndElement functions, respectively. Above the text between the start and end tags, call textData.

In this example, the startelement tag looks for the book tag and begins a new element in the book array. The TextData function then looks at the current element to see if it is a publisher, title, or author tag. If so, the function puts the current text in the current book.

To allow parsing to continue, the script creates the parser with the Xml_parser_create function. Then, set the callback handle. The script then reads the file and sends a chunk of the file to the parser. After the file is read, the Xml_parser_free function deletes the parser. The end of the script outputs the contents of the G_books array.

As you can see, this is much more difficult than writing the same functionality as the DOM. What if there is no DOM library and no SAX library? Is there an alternative?

   parsing XML with regular expressions

To be sure, even if you mention this method, some engineers will criticize me, but it is true that XML can be parsed with regular expressions. Listing 4 shows an example of reading a book file using the Preg_ function.

Listing 4. Reading XML with regular expressions
   
      (. *?) \ <\/book\>/S ", $xml, $bookblocks); foreach ($bookblocks [1] as $block) {Preg_match_all ("/\ 
     
       (. *?) \ <\/author\>/", $block, $author); Preg_match_all ("/\ 
      
        (. *?) \ <\/title\>/", $block, $title); Preg_match_all ("/\ 
       
         (. *?) \ <\/publisher\>/", $block, $publisher); Echo ($title [1][0]. "-". $author [1][0]. "-".   $publisher [1][0]. " \ n ");   }   
       
      
     

Please note how short this code is. At the beginning, it reads the file into a large string. Then read each book item with a Regex function. Finally, a Foreach loop is used to cycle through each book block and extract the author, title, and publisher.

So, where is the flaw? The problem with reading XML using regular expression code is that it does not check first to make sure that the XML is well-formed. This means that there is no way to know if the XML is well-formed before reading. Also, some well-formed XML may not match regular expressions, so you must modify them later.

I never recommend reading XML with regular expressions, but sometimes it's the best way to be compatible because regular expression functions are always available. Do not use regular expressions to read XML directly from the user, because there is no control over the format or structure of such XML. XML from the user should always be read with a DOM library or SAX parser.

Writing XML in DOM

Reading XML is only part of the equation. How do you write XML? The best way to write XML is to use the DOM. Listing 5 shows how the DOM constructs the book XML file.

Listing 5. Writing book XML with DOM

 
       ' PHP Hacks ', ' author ' => ' Jack Herrington ', ' publisher ' => ' O ' Reilly ');      $books [] = array (' title ' => ' podcasting Hacks ', ' author ' => ' Jack Herrington ', ' publisher ' => ' O ' Reilly ');   $doc = new DOMDocument ();      $doc->formatoutput = true;   $r = $doc->createelement ("books");      $doc->appendchild ($R);      foreach ($books as $book) {$b = $doc->createelement ("book");   $author = $doc->createelement ("author");   $author->appendchild ($doc->createtextnode ($book [' author ']);      $b->appendchild ($author);   $title = $doc->createelement ("title");   $title->appendchild ($doc->createtextnode ($book [' title ']);      $b->appendchild ($title);   $publisher = $doc->createelement ("publisher");   $publisher->appendchild ($doc->createtextnode ($book [' publisher ']);      $b->appendchild ($publisher);   $r->appendchild ($b);   } echo $doc->savexml (); ? > 

At the top of the script, the books array is loaded with some sample books. This data can be from the user or from the database.

After the sample book is loaded, the script creates a new DOMDocument and adds the root node books to it. The script then creates nodes for each book's author, title, and publisher, and adds a text node for each node. The last step of each book node is to re-add it to the root node books.

The end of the script uses the SaveXML method to output XML to the console. (You can also create an XML file with the Save method.) The output of the script is shown in Listing 6.

Listing 6. The output of the DOM build script
   % php e4.php 
      
      
      
        Jack herrington 
       
       php Hacks  
      
        O ' Reilly 
       
      
      
      
        Jack Herrington 
       
       podcasting Hacks  
      
        O ' Reilly 
       
         

The real value of using the DOM is that the XML it creates is always formatted correctly. But what if you can't create XML with the DOM?

   writing XML in PHP

If the DOM is not available, you can write XML in PHP's text template. Listing 7 shows how PHP constructs the book XML file.

Listing 7. Writing the book XML in PHP
   
      ' PHP Hacks ',   ' author ' => ' Jack Herrington ',   ' publisher ' => ' O ' Reilly '   ); 
      $books [] = Array (   ' title ' => ' podcasting Hacks ',   ' author ' => ' Jack Herrington ',   ' publisher ' => ' O ' Reilly "   );   ? > 
      
       
       <?php Echo ($book ['  
       
        
        
         
         
       
      
     

The top of the script is similar to a DOM script. At the bottom of the script, open the books tag, and then iterate through each book, creating the books tag and all the internal title, author, and publisher tags.

The problem with this approach is to encode the entity. To ensure that the entity is encoded correctly, you must call the Htmlentities function on each project, as shown in Listing 8.

Listing 8. Using the Htmlentities function to encode the entity
    
      
       
       
        <?php Echo ($title);? >  
        
         
        
        
         
        
       
      
     

That's the annoying thing about writing XML with basic PHP. You think you created the perfect XML, but when you try to use the data, you immediately find that some elements are not encoded correctly.

   Concluding remarks

There is always a lot of exaggeration and confusion around XML. But it's not as difficult as you might think-especially in a good language like PHP. After understanding and correctly implementing XML, you will find that there are many powerful tools to use. XPath and XSLT are some of the two tools worth studying.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.