Reprinted: http://www.ibm.com/developerworks/cn/opensource/os-xmldomphp/
Introduction:There are many techniques that can be used to read and write XML in PHP. This article provides three methods to read XML: using the DOM library, using the SAX Parser, and using regular expressions. It also describes how to use Dom and PHP text templates to write XML.
Reading and Writing Extensible Markup Language (XML) in PHP may seem a little scary. In fact, XML and all its related technologies may be terrible, but reading and writing XML with PHP is not necessarily a terrible task. First, you need to learn a little about XML-what it is and what it is used. Then, you need to learn how to read and write XML in PHP, and there are many ways to do this.
This article provides a brief introduction to XML and explains how to use PHP to read and write XML.
What is XML?
XML is a data storage format. It does not define what data to save or define the data format. XML only defines the attributes of tags and these tags. Well-formatted XML markup looks like this:
<name>Jack Herrington</name>
This<name>
Mark contains some text: Jack herrington.
XML tags that do not contain text look like this:
<powerUp />
There are more than one way to write something in XML. For example, the output of this tag is the same as that of the previous one:
<powerUp></powerUp>
You can also add attributes to the XML tag. For example<name>
Mark includefirst
Andlast
Attribute:
<name first="Jack" last="Herrington" />
You can also use XML to encode special characters. For example, the & symbol can be encoded like this:
&
If the XML file containing tags and attributes is formatted as in the exampleWell-formatted, Which means that the mark is symmetric and the character encoding is correct. Listing 1 is an example of a well-formatted XML.
<books> <book> <author>Jack Herrington</author> <title>PHP Hacks</title> <publisher>O'Reilly</publisher> </book> <book> <author>Jack Herrington</author> <title>Podcasting Hacks</title> <publisher>O'Reilly</publisher> </book> </books>
XML in Listing 1 contains a list of books. Parent tag<books>
Contains a group<book>
Tag, each
<book>
Mark and contain<author>
,<title>
And<publisher>
Mark.
After the XML document's tag structure and content are verified by the external mode file, the XML document is correct. Mode files can be specified in different formats. For this article, all we need is a well-formatted XML.
If XML looks like HTML, that's right. XML and HTML are both tag-based languages with many similarities. However, it is important to note that although XML documents may be well-formed HTML documents, not all HTML documents are well-formed XML documents. Line feed mark (br
) Is a good example of the difference between XML and HTML. This line feed mark is HTML in good format, but not XML in good format:
<p>This is a paragraph<br>
With a line break</p>
This line feed mark is a well-formatted XML and HTML:
<p>This is a paragraph<br />
With a line break</p>
To compile HTML into XML in the same format, follow the W3C Committee's extensible hypertext markup language (XHTML) standard (see
References ). All modern browsers can render XHTML. In addition, you can use XML tools to read XHTML and find the data in the document, which is much easier than parsing HTML.
Use the DOM library to read XML
The easiest way to read well-formed XML files is to compile them into some PHP-installed Document Object Model (DOM) libraries. The Dom library reads the entire XML document into the memory and uses the node tree to represent it, as shown in 1.
Figure 1. xml dom tree of library XML
At the top of the treebooks
There are two nodesbook
Child tag. In each bookauthor
,publisher
And
title
Several nodes.author
,publisher
Andtitle
Each node has a text subnode.
The code for reading the XML file of a book and displaying the content in Dom is shown in List 2.
<?php $doc = new DOMDocument(); $doc->load( 'books.xml' ); $books = $doc->getElementsByTagName( "book" ); foreach( $books as $book ) { $authors = $book->getElementsByTagName( "author" ); $author = $authors->item(0)->nodeValue; $publishers = $book->getElementsByTagName( "publisher" ); $publisher = $publishers->item(0)->nodeValue; $titles = $book->getElementsByTagName( "title" ); $title = $titles->item(0)->nodeValue; echo "$title - $author - $publisher\n"; } ?>
The script first createsnew DOMdocument
Object,load
Method to load the library XML into this object. Later, the script uses
getElementsByName
Method to obtain a list of all elements under the specified name.
Inbook
In the node loop, the script usesgetElementsByName
Method acquisitionauthor
,publisher
And
title
MarkednodeValue
.nodeValue
Is the text in the node. The script then displays these values.
Read XML with a SAX Parser
Another way to read XML is to use the XML Simple API (SAX) parser. Most PHP installations contain the SAX Parser. The SAX Parser runs on the callback model. Each time a tag is opened or closed, or each time the parser sees the text, it calls back the User-Defined Function with the node or text information.
The advantage of the SAX Parser is that it is truly lightweight. The parser does not keep content for a long time in the memory, so it can be used for very large files. The disadvantage is that it is very troublesome to write the callback of the SAX Parser. Listing 3 shows the code for reading the XML file of a book using Sax and displaying the content.
<?php $g_books = array(); $g_elem = null; function startElement( $parser, $name, $attrs ) { global $g_books, $g_elem; if ( $name == 'BOOK' ) $g_books []= array(); $g_elem = $name; } function endElement( $parser, $name ) { global $g_elem; $g_elem = null; } function textData( $parser, $text ) { global $g_books, $g_elem; if ( $g_elem == 'AUTHOR' || $g_elem == 'PUBLISHER' || $g_elem == 'TITLE' ) { $g_books[ count( $g_books ) - 1 ][ $g_elem ] = $text; } } $parser = xml_parser_create(); xml_set_element_handler( $parser, "startElement", "endElement" ); xml_set_character_data_handler( $parser, "textData" ); $f = fopen( 'books.xml', 'r' ); while( $data = fread( $f, 4096 ) ) { xml_parse( $parser, $data ); } xml_parser_free( $parser ); foreach( $g_books as $book ) { echo $book['TITLE']." - ".$book['AUTHOR']." - "; echo $book['PUBLISHER']."\n"; } ?>
The script first setsg_books
Array, which contains all books and book information in the memory,g_elem
Variable to save the name of the tag currently being processed by the script. Then the script defines the callback function. In this example, the callback function is
startElement
,endElement
AndtextData
. Call
startElement
AndendElement
Function. CalltextData
.
In this example,startElement
Mark searchbook
Tag, inbook
Array to start a new element. Then,textData
Function to check whether the current element is
publisher
,title
Orauthor
Mark. If yes, the function puts the current text into the current book.
The script usesxml_parser_create
Function creation parser. Then, set the callback handle. Then, the script reads the file and sends the large part of the file to the parser. After reading the file,xml_parser_free
Function deletion parser. Output at the end of the script
g_books
Array content.
As you can see, this is much more difficult than writing the same function of Dom. What should I do if there is no Dom library and No Sax library? Is there any alternative?
Parse XML using regular expressions
Some engineers may criticize this method, but they can use regular expressions to parse XML. Listing 4 shows how to usepreg_
Example of a function reading a book file.
<?php $xml = ""; $f = fopen( 'books.xml', 'r' ); while( $data = fread( $f, 4096 ) ) { $xml .= $data; } fclose( $f ); preg_match_all( "/\<book\>(.*?)\<\/book\>/s", $xml, $bookblocks ); foreach( $bookblocks[1] as $block ) { preg_match_all( "/\<author\>(.*?)\<\/author\>/", $block, $author ); preg_match_all( "/\<title\>(.*?)\<\/title\>/", $block, $title ); preg_match_all( "/\<publisher\>(.*?)\<\/publisher\>/", $block, $publisher ); echo( $title[1][0]." - ".$author[1][0]." - ". $publisher[1][0]."\n" ); } ?>
Note how short the code is. At the beginning, it reads the file into a large string. Then useregex
The function reads each book project. Last useforeach
Loop, loop between each library block, and extract author, title, and publisher.
So where are defects? The problem with reading XML using the Regular Expression Code is that it is not checked first to ensure that the XML format is good. This means that before reading, you cannot know whether the XML format is good. In addition, some correctly formatted XML may not match the regular expression, so you must modify them later.
I never recommend using regular expressions to read XML, but sometimes it is the best compatibility, because regular expression functions are always available. Do not use regular expressions to read XML directly from the user, because the format or structure of such XML cannot be controlled. You should always use the DOM library or the SAX Parser to read the XML from the user.
Write XML using dom
Reading XML is only part of the formula. How to write XML? The best way to write XML is to use Dom. Listing 5 shows how to build a library XML file by using Dom.
<?php $books = array(); $books [] = array( 'title' => 'PHP Hacks', 'author' => 'Jack Herrington', 'publisher' => "O'Reilly" ); $books [] = array( 'title' => 'Podcasting Hacks', 'author' => 'Jack Herrington', 'publisher' => "O'Reilly" ); $doc = new DOMDocument(); $doc->formatOutput = true; $r = $doc->createElement( "books" ); $doc->appendChild( $r ); foreach( $books as $book ) { $b = $doc->createElement( "book" ); $author = $doc->createElement( "author" ); $author->appendChild( $doc->createTextNode( $book['author'] ) ); $b->appendChild( $author ); $title = $doc->createElement( "title" ); $title->appendChild( $doc->createTextNode( $book['title'] ) ); $b->appendChild( $title ); $publisher = $doc->createElement( "publisher" ); $publisher->appendChild( $doc->createTextNode( $book['publisher'] ) ); $b->appendChild( $publisher ); $r->appendChild( $b ); } echo $doc->saveXML(); ?>
At the top of the script, some example books are loadedbooks
Array. This data can be from the user or the database.
After the example library is loaded, the script createsnew DOMDocument
And set the root nodebooks
Add to it. Then, the script creates a node for the author, title, and publisher of each book, and adds a text node to each node. Each
book
The last step of a node is to add it to the root node again.books
.
UsesaveXML
Method to output XML to the console. (You can also usesave
Method To create an XML file .)
The real value of using Dom is that the XML it creates is always in the correct format. But what should I do if I cannot use Dom to create XML?
Write XML in PHP
If the Dom is not available, you can use the PHP Text Template to write XML. Listing 7 shows how PHP builds a library XML file.
<?php $books = array(); $books [] = array( 'title' => 'PHP Hacks', 'author' => 'Jack Herrington', 'publisher' => "O'Reilly" ); $books [] = array( 'title' => 'Podcasting Hacks', 'author' => 'Jack Herrington', 'publisher' => "O'Reilly" ); ?> <books> <?php foreach( $books as $book ) { ?> <book> <title><?php echo( $book['title'] ); ?></title> <author><?php echo( $book['author'] ); ?> </author> <publisher><?php echo( $book['publisher'] ); ?> </publisher> </book> <?php } ?> </books>
The top of the script is similar to the DOM script. Open at the bottom of the scriptbooks
Tag, and then iterate in each book to createbook
Mark and all internal
title
,author
Andpublisher
Mark.
The problem with this method is to encode the object. To ensure that the entity code is correct, you must callhtmlentities
Function
<books> <?php foreach( $books as $book ) { $title = htmlentities( $book['title'], ENT_QUOTES ); $author = htmlentities( $book['author'], ENT_QUOTES ); $publisher = htmlentities( $book['publisher'], ENT_QUOTES ); ?> <book> <title><?php echo( $title ); ?></title> <author><?php echo( $author ); ?> </author> <publisher><?php echo( $publisher ); ?> </publisher> </book> <?php } ?> </books>
This is the annoyance of writing XML in PHP. You think you have created perfect XML, but when trying to use data, you will immediately find that the encoding of some elements is incorrect.
Conclusion
There is always a lot of exaggeration and confusion around XML. However, it is not as difficult as you think-especially in a good language like PHP. After understanding and correctly implementing XML, you will find that many powerful tools are available. XPath and XSLT are two tools worth studying.