Use PHP to read and compile xml dom for reprinting

Source: Internet
Author: User
Tags fread


Introduction:There are many techniques that can be used to read and write XML in PHP. This article provides three methods to read XML: using the DOM library, using the SAX Parser, and using regular expressions. It also describes how to use Dom and PHP text templates to write XML.

Reading and Writing Extensible Markup Language (XML) in PHP may seem a little scary. In fact, XML and all its related technologies may be terrible, but reading and writing XML with PHP is not necessarily a terrible task. First, you need to learn a little about XML-what it is and what it is used. Then, you need to learn how to read and write XML in PHP, and there are many ways to do this.

This article provides a brief introduction to XML and explains how to use PHP to read and write XML.

What is XML?

XML is a data storage format. It does not define what data to save or define the data format. XML only defines the attributes of tags and these tags. Well-formatted XML markup looks like this:

<name>Jack Herrington</name>

This<name>Mark contains some text: Jack herrington.

XML tags that do not contain text look like this:

<powerUp />

There are more than one way to write something in XML. For example, the output of this tag is the same as that of the previous one:


You can also add attributes to the XML tag. For example<name>Mark includefirstAndlastAttribute:

<name first="Jack" last="Herrington" />

You can also use XML to encode special characters. For example, the & symbol can be encoded like this:


If the XML file containing tags and attributes is formatted as in the exampleWell-formatted, Which means that the mark is symmetric and the character encoding is correct. Listing 1 is an example of a well-formatted XML.

<books>  <book>  <author>Jack Herrington</author>  <title>PHP Hacks</title>  <publisher>O'Reilly</publisher>  </book>  <book>  <author>Jack Herrington</author>  <title>Podcasting Hacks</title>  <publisher>O'Reilly</publisher>  </book>  </books>

XML in Listing 1 contains a list of books. Parent tag<books>Contains a group<book>Tag, each
Mark and contain<author>,<title>And<publisher>Mark.

After the XML document's tag structure and content are verified by the external mode file, the XML document is correct. Mode files can be specified in different formats. For this article, all we need is a well-formatted XML.

If XML looks like HTML, that's right. XML and HTML are both tag-based languages with many similarities. However, it is important to note that although XML documents may be well-formed HTML documents, not all HTML documents are well-formed XML documents. Line feed mark (br) Is a good example of the difference between XML and HTML. This line feed mark is HTML in good format, but not XML in good format:

<p>This is a paragraph<br>
With a line break</p>

This line feed mark is a well-formatted XML and HTML:

<p>This is a paragraph<br />
With a line break</p>

To compile HTML into XML in the same format, follow the W3C Committee's extensible hypertext markup language (XHTML) standard (see
References ). All modern browsers can render XHTML. In addition, you can use XML tools to read XHTML and find the data in the document, which is much easier than parsing HTML.

Use the DOM library to read XML

The easiest way to read well-formed XML files is to compile them into some PHP-installed Document Object Model (DOM) libraries. The Dom library reads the entire XML document into the memory and uses the node tree to represent it, as shown in 1.

Figure 1. xml dom tree of library XML

At the top of the treebooksThere are two nodesbookChild tag. In each bookauthor,publisherAnd
titleSeveral,publisherAndtitleEach node has a text subnode.

The code for reading the XML file of a book and displaying the content in Dom is shown in List 2.

<?php  $doc = new DOMDocument();  $doc->load( 'books.xml' );    $books = $doc->getElementsByTagName( "book" );  foreach( $books as $book )  {  $authors = $book->getElementsByTagName( "author" );  $author = $authors->item(0)->nodeValue;    $publishers = $book->getElementsByTagName( "publisher" );  $publisher = $publishers->item(0)->nodeValue;    $titles = $book->getElementsByTagName( "title" );  $title = $titles->item(0)->nodeValue;    echo "$title - $author - $publisher\n";  }  ?>

The script first createsnew DOMdocumentObject,loadMethod to load the library XML into this object. Later, the script uses
getElementsByNameMethod to obtain a list of all elements under the specified name.

InbookIn the node loop, the script usesgetElementsByNameMethod acquisitionauthor,publisherAnd
titleMarkednodeValue.nodeValueIs the text in the node. The script then displays these values.

Read XML with a SAX Parser

Another way to read XML is to use the XML Simple API (SAX) parser. Most PHP installations contain the SAX Parser. The SAX Parser runs on the callback model. Each time a tag is opened or closed, or each time the parser sees the text, it calls back the User-Defined Function with the node or text information.

The advantage of the SAX Parser is that it is truly lightweight. The parser does not keep content for a long time in the memory, so it can be used for very large files. The disadvantage is that it is very troublesome to write the callback of the SAX Parser. Listing 3 shows the code for reading the XML file of a book using Sax and displaying the content.

<?php  $g_books = array();  $g_elem = null;    function startElement( $parser, $name, $attrs )   {  global $g_books, $g_elem;  if ( $name == 'BOOK' ) $g_books []= array();  $g_elem = $name;  }    function endElement( $parser, $name )   {  global $g_elem;  $g_elem = null;  }    function textData( $parser, $text )  {  global $g_books, $g_elem;  if ( $g_elem == 'AUTHOR' ||  $g_elem == 'PUBLISHER' ||  $g_elem == 'TITLE' )  {  $g_books[ count( $g_books ) - 1 ][ $g_elem ] = $text;  }  }    $parser = xml_parser_create();    xml_set_element_handler( $parser, "startElement", "endElement" );  xml_set_character_data_handler( $parser, "textData" );    $f = fopen( 'books.xml', 'r' );    while( $data = fread( $f, 4096 ) )  {  xml_parse( $parser, $data );  }    xml_parser_free( $parser );    foreach( $g_books as $book )  {  echo $book['TITLE']." - ".$book['AUTHOR']." - ";  echo $book['PUBLISHER']."\n";  }  ?>

The script first setsg_booksArray, which contains all books and book information in the memory,g_elemVariable to save the name of the tag currently being processed by the script. Then the script defines the callback function. In this example, the callback function is
startElement,endElementAndtextData. Call
startElementAndendElementFunction. CalltextData.

In this example,startElementMark searchbookTag, inbookArray to start a new element. Then,textDataFunction to check whether the current element is
publisher,titleOrauthorMark. If yes, the function puts the current text into the current book.

The script usesxml_parser_createFunction creation parser. Then, set the callback handle. Then, the script reads the file and sends the large part of the file to the parser. After reading the file,xml_parser_freeFunction deletion parser. Output at the end of the script
g_booksArray content.

As you can see, this is much more difficult than writing the same function of Dom. What should I do if there is no Dom library and No Sax library? Is there any alternative?

Parse XML using regular expressions

Some engineers may criticize this method, but they can use regular expressions to parse XML. Listing 4 shows how to usepreg_Example of a function reading a book file.

<?php  $xml = "";  $f = fopen( 'books.xml', 'r' );  while( $data = fread( $f, 4096 ) ) { $xml .= $data; }  fclose( $f );    preg_match_all( "/\<book\>(.*?)\<\/book\>/s",   $xml, $bookblocks );    foreach( $bookblocks[1] as $block )  {  preg_match_all( "/\<author\>(.*?)\<\/author\>/",   $block, $author );  preg_match_all( "/\<title\>(.*?)\<\/title\>/",   $block, $title );  preg_match_all( "/\<publisher\>(.*?)\<\/publisher\>/",   $block, $publisher );  echo( $title[1][0]." - ".$author[1][0]." - ".  $publisher[1][0]."\n" );  }  ?>

Note how short the code is. At the beginning, it reads the file into a large string. Then useregexThe function reads each book project. Last useforeachLoop, loop between each library block, and extract author, title, and publisher.

So where are defects? The problem with reading XML using the Regular Expression Code is that it is not checked first to ensure that the XML format is good. This means that before reading, you cannot know whether the XML format is good. In addition, some correctly formatted XML may not match the regular expression, so you must modify them later.

I never recommend using regular expressions to read XML, but sometimes it is the best compatibility, because regular expression functions are always available. Do not use regular expressions to read XML directly from the user, because the format or structure of such XML cannot be controlled. You should always use the DOM library or the SAX Parser to read the XML from the user.

Write XML using dom

Reading XML is only part of the formula. How to write XML? The best way to write XML is to use Dom. Listing 5 shows how to build a library XML file by using Dom.

<?php  $books = array();  $books [] = array(  'title' => 'PHP Hacks',  'author' => 'Jack Herrington',  'publisher' => "O'Reilly"  );  $books [] = array(  'title' => 'Podcasting Hacks',  'author' => 'Jack Herrington',  'publisher' => "O'Reilly"  );    $doc = new DOMDocument();  $doc->formatOutput = true;    $r = $doc->createElement( "books" );  $doc->appendChild( $r );    foreach( $books as $book )  {  $b = $doc->createElement( "book" );    $author = $doc->createElement( "author" );  $author->appendChild(  $doc->createTextNode( $book['author'] )  );  $b->appendChild( $author );    $title = $doc->createElement( "title" );  $title->appendChild(  $doc->createTextNode( $book['title'] )  );  $b->appendChild( $title );    $publisher = $doc->createElement( "publisher" );  $publisher->appendChild(  $doc->createTextNode( $book['publisher'] )  );  $b->appendChild( $publisher );    $r->appendChild( $b );  }    echo $doc->saveXML();  ?>

At the top of the script, some example books are loadedbooksArray. This data can be from the user or the database.

After the example library is loaded, the script createsnew DOMDocumentAnd set the root nodebooksAdd to it. Then, the script creates a node for the author, title, and publisher of each book, and adds a text node to each node. Each
bookThe last step of a node is to add it to the root node again.books.

UsesaveXMLMethod to output XML to the console. (You can also usesaveMethod To create an XML file .)

The real value of using Dom is that the XML it creates is always in the correct format. But what should I do if I cannot use Dom to create XML?

Write XML in PHP

If the Dom is not available, you can use the PHP Text Template to write XML. Listing 7 shows how PHP builds a library XML file.

 <?php  $books = array();  $books [] = array(  'title' => 'PHP Hacks',  'author' => 'Jack Herrington',  'publisher' => "O'Reilly"  );  $books [] = array(  'title' => 'Podcasting Hacks',  'author' => 'Jack Herrington',  'publisher' => "O'Reilly"  );  ?>  <books>  <?php    foreach( $books as $book )  {  ?>  <book>  <title><?php echo( $book['title'] ); ?></title>  <author><?php echo( $book['author'] ); ?>  </author>  <publisher><?php echo( $book['publisher'] ); ?>  </publisher>  </book>  <?php  }  ?>  </books>

The top of the script is similar to the DOM script. Open at the bottom of the scriptbooksTag, and then iterate in each book to createbookMark and all internal

The problem with this method is to encode the object. To ensure that the entity code is correct, you must callhtmlentitiesFunction

<books>  <?php    foreach( $books as $book )  {  $title = htmlentities( $book['title'], ENT_QUOTES );  $author = htmlentities( $book['author'], ENT_QUOTES );  $publisher = htmlentities( $book['publisher'], ENT_QUOTES );  ?>  <book>  <title><?php echo( $title ); ?></title>  <author><?php echo( $author ); ?> </author>  <publisher><?php echo( $publisher ); ?>  </publisher>  </book>  <?php  }  ?>  </books>

This is the annoyance of writing XML in PHP. You think you have created perfect XML, but when trying to use data, you will immediately find that the encoding of some elements is incorrect.


There is always a lot of exaggeration and confusion around XML. However, it is not as difficult as you think-especially in a good language like PHP. After understanding and correctly implementing XML, you will find that many powerful tools are available. XPath and XSLT are two tools worth studying.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.