Use PHP to read and write XMLDOM, and use PHP to read and write XMLDOM. many techniques can be used to read and write XML with PHP. This article provides three methods to read XML: using the DOM Library, using the SAX parser, and using regular expressions. It also describes how to use DOM and PHP text templates to write XML. Using PHP to read and write Extensible Markup Language (X "> <LIN has many techniques that can be used to read and write XML in PHP. This article provides three methods to read XML: using the DOM Library, using the SAX parser, and using regular expressions. It also describes how to use DOM and PHP text templates to write XML.
Reading and Writing Extensible Markup Language (XML) in PHP may seem a little scary. In fact, XML and all its related technologies may be terrible, but reading and writing XML with PHP is not necessarily a terrible task. First, you need to learn a little about XML-what it is and what it is used. Then, you need to learn how to read and write XML in PHP, and there are many ways to do this.
This article provides a brief introduction to XML and explains how to use PHP to read and write XML.
What is XML?
XML is a data storage format. It does not define what data to save or define the data format. XML only defines the attributes of tags and these tags. Well-formatted XML markup looks like this:
<Name> Jack Herrington </name>
This <name> flag contains some text: Jack Herrington.
XML tags that do not contain text look like this:
<PowerUp/>
There are more than one way to write something in XML. For example, the output of this tag is the same as that of the previous one:
<PowerUp> </powerUp>
You can also add attributes to the XML tag. For example, the <name> tag contains the first and last attributes:
<Name first = "Jack" last = "Herrington"/>
You can also use XML to encode special characters. For example, the & symbol can be encoded like this:
&
If the XML file containing tags and attributes is formatted as in the example
Well-formatted, Which means that the mark is symmetric and the character encoding is correct. Listing 1 is an example of a well-formatted XML.
Listing 1. XML library list example
<books> <book> <author>Jack Herrington</author> <title>PHP Hacks</title> <publisher>O'Reilly</publisher> </book> <book> <author>Jack Herrington</author> <title>Podcasting Hacks</title> <publisher>O'Reilly</publisher> </book> </books> |
XML in listing 1 contains a list of books. The parent tag <books> contains a set of <book> tags. Each <book> tag contains <author>, <title>, and <publisher> tags.
After the XML document's tag structure and content are verified by the external mode file, the XML document is correct. Mode files can be specified in different formats. For this article, all we need is a well-formatted XML.
If XML looks like HTML, that's right. XML and HTML are both tag-based languages with many similarities. However, it is important to note that although XML documents may be well-formed HTML documents, not all HTML documents are well-formed XML documents. Line Feed mark (br) is a good example of the difference between XML and HTML. This line feed mark is HTML in good format, but not XML in good format:
<P> This is a paragraph <br>
With a line break </p>
This line feed mark is a well-formatted XML and HTML:
<P> This is a paragraph <br/>
With a line break </p>
To write HTML into XML in the same format, follow the W3C Committee's Extensible HyperText Markup Language (XHTML) standard (see references ). All modern browsers can render XHTML. In addition, you can use XML tools to read XHTML and find the data in the document, which is much easier than parsing HTML.
Use the DOM library to read XML
The easiest way to read well-formed XML files is to compile them into some PHP-installed Document Object Model (DOM) libraries. The DOM library reads the entire XML document into the memory and uses the node tree to represent it, as shown in 1.
Figure 1. xml dom tree of library XML
The books node on the top of the tree has two book subtags. In each book, there are several nodes: author, publisher, and title. The author, publisher, and title nodes contain text subnodes.
The code for reading the XML file of a book and displaying the content in DOM is shown in list 2.
Listing 2. reading library XML with DOM
<?php $doc = new DOMDocument(); $doc->load( 'books.xml' ); $books = $doc->getElementsByTagName( "book" ); foreach( $books as $book ) { $authors = $book->getElementsByTagName( "author" ); $author = $authors->item(0)->nodeValue; $publishers = $book->getElementsByTagName( "publisher" ); $publisher = $publishers->item(0)->nodeValue; $titles = $book->getElementsByTagName( "title" ); $title = $titles->item(0)->nodeValue; echo "$title - $author - $publisher\n"; } ?> |
The script first creates a new DOMdocument object and loads the library XML into this object using the load method. Then, the script uses the getElementsByName method to obtain a list of all elements under the specified name.
In the cycle of the book node, the script uses the getElementsByName method to obtain the nodeValue marked by author, publisher, and title. NodeValue is the text in the node. The script then displays these values.
You can run the PHP script like this on the command line:
% Php e1.php
PHP Hacks-Jack Herrington-O 'Reilly
Podcasting Hacks-Jack Herrington-O 'Reilly
%
As you can see, each book block outputs a row. This is a good start. However, what should I do if I cannot access the XML DOM library?
Read XML with a SAX parser
Another way to read XML is to use the XML Simple API (SAX) parser. Most PHP installations contain the SAX parser. The SAX parser runs on the callback model. Each time a tag is opened or closed, or each time the parser sees the text, it calls back the user-defined function with the node or text information.
The advantage of the SAX parser is that it is truly lightweight. The parser does not keep content for a long time in the memory, so it can be used for very large files. The disadvantage is that it is very troublesome to write the callback of the SAX parser. Listing 3 shows the code for reading the XML file of a book using SAX and displaying the content.
Listing 3. Using the SAX parser to read the XML Library
<?php $g_books = array(); $g_elem = null; function startElement( $parser, $name, $attrs ) { global $g_books, $g_elem; if ( $name == 'BOOK' ) $g_books []= array(); $g_elem = $name; } function endElement( $parser, $name ) { global $g_elem; $g_elem = null; } function textData( $parser, $text ) { global $g_books, $g_elem; if ( $g_elem == 'AUTHOR' || $g_elem == 'PUBLISHER' || $g_elem == 'TITLE' ) { $g_books[ count( $g_books ) - 1 ][ $g_elem ] = $text; } } $parser = xml_parser_create(); xml_set_element_handler( $parser, "startElement", "endElement" ); xml_set_character_data_handler( $parser, "textData" ); $f = fopen( 'books.xml', 'r' ); while( $data = fread( $f, 4096 ) ) { xml_parse( $parser, $data ); } xml_parser_free( $parser ); foreach( $g_books as $book ) { echo $book['TITLE']." - ".$book['AUTHOR']." - "; echo $book['PUBLISHER']."\n"; } ?> |
The script first sets the g_books array, which contains all books and book information in the memory. the g_elem variable saves the name of the tag currently being processed by the script. Then the script defines the callback function. In this example, the callback functions are startElement, endElement, and textData. The startElement and endElement functions are called respectively when the tag is enabled or disabled. Call textData on the text between the start and end tags.
In this example, the startElement tag finds the book tag and starts a new element in the book array. Then, the textData function checks the current element to see if it is a publisher, title, or author flag. If yes, the function puts the current text into the current book.
To continue the parsing, the script uses the xml_parser_create function to create a parser. Then, set the callback handle. Then, the script reads the file and sends the large part of the file to the parser. After the file is read, the xml_parser_free function deletes the parser. The content of the g_books array is output at the end of the script.
As you can see, this is much more difficult than writing the same function of DOM. What should I do if there is no DOM library and no SAX library? Is there any alternative?
Parse XML using regular expressions
Some engineers may criticize this method, but they can use regular expressions to parse XML. Listing 4 shows an example of using the preg _ function to read a library file.
Listing 4. reading XML using regular expressions
<?php $xml = ""; $f = fopen( 'books.xml', 'r' ); while( $data = fread( $f, 4096 ) ) { $xml .= $data; } fclose( $f ); preg_match_all( "/\<book\>(.*?)\<\/book\>/s", $xml, $bookblocks ); foreach( $bookblocks[1] as $block ) { preg_match_all( "/\<author\>(.*?)\<\/author\>/", $block, $author ); preg_match_all( "/\<title\>(.*?)\<\/title\>/", $block, $title ); preg_match_all( "/\<publisher\>(.*?)\<\/publisher\>/", $block, $publisher ); echo( $title[1][0]." - ".$author[1][0]." - ". $publisher[1][0]."\n" ); } ?> |
Note how short the code is. At the beginning, it reads the file into a large string. Then, use a regex function to read each book project. At last, use the foreach loop to cycle between each library block and extract author, title, and publisher.
So where are defects? The problem with reading XML using the regular expression code is that it is not checked first to ensure that the XML format is good. This means that before reading, you cannot know whether the XML format is good. In addition, some correctly formatted XML may not match the regular expression, so you must modify them later.
I never recommend using regular expressions to read XML, but sometimes it is the best compatibility, because regular expression functions are always available. Do not use regular expressions to read XML directly from the user, because the format or structure of such XML cannot be controlled. You should always use the DOM Library or the SAX parser to read the XML from the user.
Write XML using DOM
Reading XML is only part of the formula. How to write XML? The best way to write XML is to use DOM. Listing 5 shows how to build a library XML file by using DOM.
Listing 5. using DOM to write XML
<?php $books = array(); $books [] = array( 'title' => 'PHP Hacks', 'author' => 'Jack Herrington', 'publisher' => "O'Reilly" ); $books [] = array( 'title' => 'Podcasting Hacks', 'author' => 'Jack Herrington', 'publisher' => "O'Reilly" ); $doc = new DOMDocument(); $doc->formatOutput = true; $r = $doc->createElement( "books" ); $doc->appendChild( $r ); foreach( $books as $book ) { $b = $doc->createElement( "book" ); $author = $doc->createElement( "author" ); $author->appendChild( $doc->createTextNode( $book['author'] ) ); $b->appendChild( $author ); $title = $doc->createElement( "title" ); $title->appendChild( $doc->createTextNode( $book['title'] ) ); $b->appendChild( $title ); $publisher = $doc->createElement( "publisher" ); $publisher->appendChild( $doc->createTextNode( $book['publisher'] ) ); $b->appendChild( $publisher ); $r->appendChild( $b ); } echo $doc->saveXML(); ?> |
At the top of the script, some example books are loaded into the books array. This data can be from the user or the database.
After the example book is loaded, the script creates a new DOMDocument and adds the root node books to it. Then, the script creates a node for the author, title, and publisher of each book, and adds a text node to each node. The last step of each book node is to add it to the root node books again.
At the end of the script, use the saveXML method to output XML to the console. (You can also use the save method to create an XML file .) The script output is shown in listing 6.
Listing 6. DOM build script output
% php e4.php <?xml version="1.0"?> <books> <book> <author>Jack Herrington</author> <title>PHP Hacks</title> <publisher>O'Reilly</publisher> </book> <book> <author>Jack Herrington</author> <title>Podcasting Hacks</title> <publisher>O'Reilly</publisher> </book> </books> % |
The real value of using DOM is that the XML it creates is always in the correct format. But what should I do if I cannot use DOM to create XML?
Write XML in PHP
If the DOM is not available, you can use the PHP text template to write XML. Listing 7 shows how PHP builds a library XML file.
Listing 7. compiling Library XML with PHP
<?php $books = array(); $books [] = array( 'title' => 'PHP Hacks', 'author' => 'Jack Herrington', 'publisher' => "O'Reilly" ); $books [] = array( 'title' => 'Podcasting Hacks', 'author' => 'Jack Herrington', 'publisher' => "O'Reilly" ); ?> <books> <?php foreach( $books as $book ) { ?> <book> <title><?php echo( $book['title'] ); ?></title> <author><?php echo( $book['author'] ); ?> </author> <publisher><?php echo( $book['publisher'] ); ?> </publisher> </book> <?php } ?> </books> |
The top of the script is similar to the DOM script. Open the books tag at the bottom of the script, and iterate in each book to create the book tag and all the internal title, author, and publisher tags.
The problem with this method is to encode the object. To ensure that the entity code is correct, you must call the htmlentities function on each project, as shown in listing 8.
Listing 8. Using the htmlentities function to encode an object
<books> <?php foreach( $books as $book ) { $title = htmlentities( $book['title'], ENT_QUOTES ); $author = htmlentities( $book['author'], ENT_QUOTES ); $publisher = htmlentities( $book['publisher'], ENT_QUOTES ); ?> <book> <title><?php echo( $title ); ?></title> <author><?php echo( $author ); ?> </author> <publisher><?php echo( $publisher ); ?> </publisher> </book> <?php } ?> </books> |
This is the annoyance of writing XML in PHP. You think you have created perfect XML, but when trying to use data, you will immediately find that the encoding of some elements is incorrect.
Conclusion
There is always a lot of exaggeration and confusion around XML. However, it is not as difficult as you think-especially in a good language like PHP. After understanding and correctly implementing XML, you will find that many powerful tools are available. XPath and XSLT are two tools worth studying.