Reading and writing Extensible Markup Language (XML) in PHP may seem a bit scary. In fact, XML and all of its related technologies may be scary, but reading and writing XML in PHP is not necessarily a scary task. First, you need to learn a little bit about XML-what it is and what it does with it. Then you need to learn how to read and write XML in PHP, and there are a number of ways to do it.
This article provides a brief introduction to XML and then explains how to read and write XML in PHP.
What is XML?
XML is a data storage format. It does not define what data is saved, nor does it define the format of the data. XML simply defines the tags and the attributes of those tags. Well-Formed XML tags look like this:
<name>jack herrington</name>
This <name> tag contains some text: Jack Herrington.
XML tags that do not contain text look like this:
<powerup/>
There are more than one way to write an event in XML. For example, this tag forms the same output as the previous tag:
<powerUp></powerUp>
You can also add attributes to an XML tag. For example, this <name> tag contains the properties:
<name first= "Jack" last= "Herrington"/>
You can also encode special characters in XML. For example, the,& symbol can be encoded like this:
&
XML files that contain tags and attributes, if formatted like the example, are well-formed, meaning that the tags are symmetric and the characters are encoded correctly. Listing 1 is an example of well-formed XML.
Listing 1. An example of XML book column representation
Copy Code code as follows:
<books>
<book>
<author>jack herrington</author>
<title>php hacks</title>
<publisher>o ' reilly</publisher>
</book>
<book>
<author>jack herrington</author>
<title>podcasting hacks</title>
<publisher>o ' reilly</publisher>
</book>
</books>
The XML in Listing 1 contains a list of books. The parent tag <books> contains a set of <book> tags, each of which includes <author>, <title>, and <publisher> tags.
XML documents are correct when the markup structure and content of XML documents are validated by external schema files. The schema file can be specified in a different format. For this article, all you need is well-formed XML.
If you think that XML looks like Hypertext Markup Language (HTML), that's right. Both XML and HTML are based on markup languages, and they have many similarities. However, it is emphasized that although the XML document may be well-formed HTML, not all HTML documents are well-formed XML. A newline marker (BR) is a good example of the difference between XML and HTML. This newline mark is well-formed HTML, but not well-formed XML:
<p>this is a paragraph<br>
With a line break</p>
This line break tag is well-formed XML and HTML:
<p>this is a paragraph<br/>
With a line break</p>
If you want to write HTML as well-formed XML, follow the Extensible Hypertext Markup Language (XHTML) standard of the consortium. All modern browsers are capable of rendering XHTML. Also, you can read XHTML with XML tools and find the data in the document, which is much easier than parsing HTML.
reading XML using the DOM library
The easiest way to read well-formed XML files is to use a Document Object Model (DOM) library that is compiled into certain PHP installations. The DOM library reads the entire XML document into memory and represents it in the node tree, as shown in Figure 1.
Figure 1. XML DOM tree for book XML
The books node at the top of the tree has two book child tags. In each book, there are several nodes author, publisher, and title. Author, publisher, and title nodes have text child nodes that contain text, respectively.
The code that reads the book XML file and displays the content in the DOM is shown in Listing 2.
Listing 2. Reading book XML with DOM
Copy Code code as follows:
<?php
$doc = new DOMDocument ();
$doc->load (' books.xml ');
$books = $doc->getelementsbytagname ("book");
foreach ($books as $book)
{
$authors = $book->getelementsbytagname ("author");
$author = $authors->item (0)->nodevalue;
$publishers = $book->getelementsbytagname ("publisher");
$publisher = $publishers->item (0)->nodevalue;
$titles = $book->getelementsbytagname ("title");
$title = $titles->item (0)->nodevalue;
echo "$title-$author-$publisher \ n";
}
?>
The script first creates a new DOMDocument object and loads the book XML into the object using the Load method. The script then uses the Getelementsbyname method to get a list of all the elements under the specified name.
In the book node loop, the script obtains the nodevalue of author, publisher, and title tags using the Getelementsbyname method. NodeValue is the text in the node. The script then displays these values.
You can run PHP scripts like this on the command line:
% PHP e1.php
PHP hacks-jack Herrington-o ' Reilly
Podcasting Hacks-jack Herrington-o ' Reilly
%
You can see that each book block outputs one row. This is a good start. But what if you don't have access to the XML DOM library?
Reading XML with the SAX parser
Another way to read XML is to use the XML simple API (SAX) parser. Most installations in PHP contain SAX parsers. The SAX parser runs on the callback model. Each time a tag is turned on or off, or whenever the parser sees the text, the user-defined function is recalled with the node or text information.
The advantage of the SAX parser is that it is really lightweight. The parser does not hold content in memory for long periods, so it can be used for very large files. The downside is that writing a SAX parser callback is a hassle. Listing 3 shows the code that reads the book XML file using SAX and displays the content.
Listing 3. Reading book XML with SAX parser
Copy Code code as follows:
<?php
$g _books = Array ();
$g _elem = null;
function startelement ($parser, $name, $attrs)
{
Global $g _books, $g _elem;
if ($name = = ' book ') $g _books []= Array ();
$g _elem = $name;
}
function EndElement ($parser, $name)
{
Global $g _elem;
$g _elem = null;
}
function TextData ($parser, $text)
{
Global $g _books, $g _elem;
if ($g _elem = = ' AUTHOR ' | |
$g _elem = = ' PUBLISHER ' | |
$g _elem = = ' TITLE ')
{
$g _books[count ($g _books)-1 [$g _elem] = $text;
}
}
$parser = Xml_parser_create ();
Xml_set_element_handler ($parser, "startelement", "endelement");
Xml_set_character_data_handler ($parser, "textData");
$f = fopen (' books.xml ', ' R ');
while ($data = Fread ($f, 4096))
{
Xml_parse ($parser, $data);
}
Xml_parser_free ($parser);
foreach ($g _books as $book)
{
echo $book [' TITLE ']. '-'. $book [' AUTHOR ']. "-";
echo $book [' PUBLISHER ']. \ n ";
}
?>
The script first sets the G_books array, which holds all the book and book Information in memory, and the G_elem variable holds the name of the tag that the script is currently working on. The script then defines the callback function. In this example, the callback functions are startelement, endelement, and TextData. Call the startelement and endelement functions individually when the tag is turned on and off. On top of the text between the start and end tags, call textData.
In this example, the startelement tag looks for the book tag and begins a new element in the book array. The TextData function then looks at the current element to see if it is publisher, title, or author tag. If so, the function puts the current text into the current book.
To allow parsing to continue, the script creates the parser with the Xml_parser_create function. Then, set the callback handle. The script then reads the file and sends a chunk of the file to the parser. After the file is read, the Xml_parser_free function deletes the parser. The contents of the G_books array are output at the end of the script.
As you can see, this is much more difficult than writing the same functionality as the DOM. What if there are no DOM libraries and no SAX libraries? Are there alternatives?
--------------------------------------------------------------------------------
Back to the top of the page
Parsing XML with regular expressions
To be sure, even with this approach, some engineers will criticize me, but it is true that XML can be parsed with regular expressions. Listing 4 shows an example of reading a book file using the Preg_ function.
Listing 4. Reading XML with regular expressions
Copy Code code as follows:
<?php
$xml = "";
$f = fopen (' books.xml ', ' R ');
while ($data = Fread ($f, 4096)) {$xml. = $data;}
Fclose ($f);
Preg_match_all ("/\<book\>" (. *?) \<\/book\>/s ",
$xml, $bookblocks);
foreach ($bookblocks [1] as $block)
{
Preg_match_all ("/\<author\>" (. *?) \<\/author\>/",
$block, $author);
Preg_match_all ("/\<title\>" (. *?) \<\/title\>/",
$block, $title);
Preg_match_all ("/\<publisher\>" (. *?) \<\/publisher\>/",
$block, $publisher);
Echo ($title [1][0]. "-". $author [1][0]. "-".
$publisher [1][0]. " \ n ");
}
?>
Please note how short this code is. At the beginning, it reads the file into a large string. Then read each book item with a Regex function. Finally, use a Foreach loop to cycle through each book block and extract author, title, and publisher.
So where is the flaw? The problem with using regular expression code to read XML is that it does not check first to ensure that the XML is well-formed. This means that it is not possible to know if the XML is well-formed before reading. Also, some well-formed XML may not match regular expressions, so you must modify them later.
I never recommend reading XML with regular expressions, but sometimes it is the best way to be compatible, because regular expression functions are always available. Do not read XML directly from the user with a regular expression, because you cannot control the format or structure of such XML. You should always read XML from the user using the DOM library or the SAX parser.
--------------------------------------------------------------------------------
Back to the top of the page
Writing XML with DOM
Reading XML is only part of the equation. How do you write XML? The best way to write XML is to use DOM. Listing 5 shows how the DOM constructs the book XML file.
Listing 5. Writing book XML with DOM
Copy Code code as follows:
<?php
$books = Array ();
$books [] = Array (
' title ' => ' PHP Hacks ',
' Author ' => ' Jack Herrington ',
' publisher ' => ' O ' Reilly '
);
$books [] = Array (
' title ' => ' Podcasting Hacks ',
' Author ' => ' Jack Herrington ',
' publisher ' => ' O ' Reilly '
);
$doc = new DOMDocument ();
$doc->formatoutput = true;
$r = $doc->createelement ("books");
$doc->appendchild ($R);
foreach ($books as $book)
{
$b = $doc->createelement ("book");
$author = $doc->createelement ("author");
$author->appendchild (
$doc->createtextnode ($book [' Author '])
);
$b->appendchild ($author);
$title = $doc->createelement ("title");
$title->appendchild (
$doc->createtextnode ($book [' title '])
);
$b->appendchild ($title);
$publisher = $doc->createelement ("publisher");
$publisher->appendchild (
$doc->createtextnode ($book [' publisher '])
);
$b->appendchild ($publisher);
$r->appendchild ($b);
}
echo $doc->savexml ();
?>
At the top of the script, the books array is loaded with some examples. This data can be from the user or from the database.
After the sample book is loaded, the script creates a new DOMDocument and adds the root node books to it. The script then creates nodes for each book's author, title, and publisher, and adds text nodes to each node. The last step of each book node is to add it back to the root node books.
The end of the script uses the SaveXML method to output the XML to the console. (You can also create an XML file with the Save method.) The output of the script is shown in Listing 6.
Listing 6. The output of the DOM build script
Copy Code code as follows:
PHP e4.php
<?xml version= "1.0"?>
<books>
<book>
<author>jack herrington</author>
<title>php hacks</title>
<publisher>o ' reilly</publisher>
</book>
<book>
<author>jack herrington</author>
<title>podcasting hacks</title>
<publisher>o ' reilly</publisher>
</book>
</books>
The real value of using the DOM is that the XML it creates is always well-formed. But what if you can't create XML with the DOM?
--------------------------------------------------------------------------------
Back to the top of the page
Writing XML in PHP
If the DOM is not available, you can use PHP's text template to write XML. Listing 7 shows how PHP constructs the book XML file.
Listing 7. Writing book XML in PHP
Copy Code code as follows:
<?php
$books = Array ();
$books [] = Array (
' title ' => ' PHP Hacks ',
' Author ' => ' Jack Herrington ',
' publisher ' => ' O ' Reilly '
);
$books [] = Array (
' title ' => ' Podcasting Hacks ',
' Author ' => ' Jack Herrington ',
' publisher ' => ' O ' Reilly '
);
?>
<books>
<?php
foreach ($books as $book)
{
?>
<book>
<title><?php Echo ($book [' title ']);?></title>
<author><?php Echo ($book [' author ']);?>
</author>
<publisher><?php Echo ($book [' publisher ']);?>
</publisher>
</book>
<?php
}
?>
</books>
The top of the script is similar to the DOM script. At the bottom of the script, open the books tag, and then iterate through each book to create the tag and all the internal title, author, and publisher tags.
The problem with this approach is to encode the entities. To ensure that the entity encoding is correct, you must call the Htmlentities function on each project, as shown in Listing 8.
Listing 8. To encode an entity using the Htmlentities function
Copy Code code as follows:
<books>
<?php
foreach ($books as $book)
{
$title = htmlentities ($book [' title '], ent_quotes);
$author = htmlentities ($book [' Author '], ent_quotes);
$publisher = htmlentities ($book [' publisher '], ent_quotes);
?>
<book>
<title><?php Echo ($title);?></title>
<author><?php Echo ($author);?> </author>
<publisher><?php Echo ($publisher);?>
</publisher>
</book>
<?php
}
?>
</books>
This is the annoying thing about writing XML with basic PHP. You thought you created the perfect XML, but when you tried to use the data, you immediately found that some elements were incorrectly encoded.
--------------------------------------------------------------------------------
Conclusion
There's always a lot of exaggeration and confusion around XML. But it's not as hard as you think-especially in a good language like PHP. After understanding and implementing XML correctly, there are many powerful tools that you can use. XPath and XSLT are the two tools worth researching.