_php Foundation of PHP5 's new XML features

Source: Internet
Author: User
Tags exception handling soap object model php code php script xpath xsl zend

Audience oriented

The object of this article is all levels of PHP developers interested in PHP5 's new XML features. We assume that the reader is familiar with the basics of XML. However, if you have already used XML in your PHP, this article will also benefit you a little.


In today's world of the Internet, XML is no longer a buzzword, it has been widely accepted and standardized use. Therefore, the support for XML is more important relative to PHP4,PHP5. Almost all of the PHP4 you face are non-standard, API interrupts, memory leaks, and other incomplete functions. Although some deficiencies have been improved in PHP4.3, developers have decided to discard the original code and rewrite all the code in PHP5.

This article will describe all the exciting new features of XML in PHP5.


Earlier versions of PHP had already started supporting XML, and this was just a sax based interface that could easily parse any XML document. With the addition of the Domxml extension module in PHP4, XML is better supported. Later XSLT was added as a supplement. During the entire PHP4 phase, other features such as HTML,XSLT and DTD validation were added to the domxml extension, unfortunately, because XSLT and domxml extensions were always in the experimental phase and the API parts were modified more than once, they could not be installed by default. In addition, the domxml extension does not follow the DOM standards established by the Consortium, but has its own naming method. Although this part of the PHP4.3 has been improved and many memory leaks and other features have been repaired, it has not developed into a stable phase, and some in-depth issues have been almost impossible to fix. Only sax extensions have been installed by default, and some other extensions have never been widely used.

For all of these reasons, the XML developer of PHP decides to rewrite all the code in PHP5 and follow the usage criteria.

All of the XML-supporting parts in PHP5 are rewritten almost entirely. All XML extensions are now based on the GNOME project's LIBXML2 library. This will allow interoperability between different extension modules, and core developers need only develop on a single underlying library. For example, complex memory management can be implemented only once to improve all XML-related extensions.

In addition to inheriting the well-known SAX parser in PHP4, PHP5 also supports the DOM that adheres to the standard of the Consortium and XSLT based on the LIBXSLT engine. It also joins PHP's unique simplexml extensions and compliant SOAP extensions. As XML becomes more and more valued, PHP developers decide to add more support for XML to the default installation approach. This means you can now use Sax,dom and simplexml, and these extensions will be installed on more servers. The support for XSLT and soap will then need to be explicitly configured at PHP compile time.

Support for data streams

All XML extensions now support PHP data streams, even if you don't access them directly from PHP. For example, in PHP5 you can access a data stream from a file or from an instruction. Basically you can access the PHP data stream anywhere you can access the normal file.

PHP4.3 is a brief introduction to the data stream, which has been further improved in PHP5, including file access, network access, and other operations, such as sharing a set of functional functions. You can even use PHP code to implement your own data streams, so data access will become very simple. Please refer to the PHP documentation for more details on this section.


The full name of Sax is the simple API for XML, which is the interface for parsing XML documents, and is based on the callback form. Sax has been supported since PHP3, and it hasn't changed much yet. In PHP5, the API interface doesn't change, so your code can still run. The only difference is that it is no longer based on the expat library, but on the LIBXML2 library.

This change has led to a number of problems with namespace support that have been addressed in the LIBXML2.2.6 version. However, LIBXML2 was not resolved in previous versions, so if you use Xml_parse_create_ns (), it is highly recommended that you install LIBXML2.2.6 on your system.


The DOM (Document Object model) is a standard set by the consortium to access the XML document tree. Domxml can be used to manipulate this in PHP4, and the main problem with Domxml is that it does not conform to the standard naming method. And there is a memory leak problem for a long time (PHP4.3 has fixed the problem).

The new DOM extension is based on the standards of the consortium, including method and property names. If you are familiar with the DOM in other languages, such as in JavaScript, it will be easy to write similar functionality in PHP. You don't have to look at the document every time because both the method and the parameters are the same.

Domxml based code will not work because of the new standard of the consortium. The APIs in PHP are a lot different. But if your code uses a method named in the same way as the standard of the consortium, porting is not very difficult. You only need to modify the load function and save function to remove the underscore in the function name (the DOM standard uses the first letter capitalization). Other adjustments are of course necessary, but the main logical part can be kept unchanged.

Reading Dom

I'm not going to explain all the features of DOM extensions in this article, and that's not necessary. Perhaps you should bookmark the HTTP://www.w3.org/DOM document. It is essentially the same as the DOM portion of PHP5.

In most of the examples in this article we will use the same XML file, which has a very simple RSS version on the zend.com. Paste the following text into a text file and save it as Articles.xml.



To load this example into a DOM object, you first create a DOMDocument object and then load the XML file.

$dom = new DOMDocument ();
$dom->load ("Articles.xml");

As mentioned above, you can use the PHP data stream to load an XML document, you should write this:

$dom->load ("File:///articles.xml");

(or other type of data stream)

If you want to export an XML document to a browser or as a standard, use:

Print $dom->savexml ();

If you want to save it as a file, please use:

Print $dom->save ("Newfile.xml");

(Note that this will send the file size to stdout)

Of course This example doesn't have much functionality, so let's do something more useful. We're going to get all the title elements. There are many ways to do it, the simplest of which is to use getElementsByTagName ($tagname):

$titles = $dom->getelementsbytagname ("title");
foreach ($titles as $node) {
Print $node->textcontent. "\ n";

The Textcontent property is not a consortium standard, it makes it easy for us to quickly read all the text nodes of an element, using the standard reading of the consortium as follows:


(This is when you want to make sure that the FirstChild node is the text node you need, otherwise you have to traverse all the child nodes to find it).

Another issue to note is that getElementsByTagName () returns a Domnodelist object, instead of returning an array like Get_elements_by_tagname () in PHP4. But as you can see in this example, you could easily traverse it using the foreach statement. You can also use $titles->item (0) directly to access nodes. The method returns the first TITLE element.

Another way to get all the title elements is to traverse from the root node, and you can see that this method is more complex, but this method is more flexible if you need more than just the title element.

foreach ($dom->documentelement->childnodes as $articles) {
If the node is an element (NodeType = 1) and the name is item, continue the loop
if ($articles->nodetype = = 1 && $articles->nodename = = "Item") {
foreach ($articles->childnodes as $item) {
If the node is an element, and the name is title, print it.
if ($item->nodetype = = 1 && $item->nodename = = "title") {
Print $item->textcontent. "\ n";

XPAHT is like XML SQL, using XPath you can query a particular node in an XML document that conforms to some schema syntax. You want to use XPath to get all the title nodes, just do this:

$XP = new Domxpath ($dom);
$titles = $xp->query ("/articles/item/title");
foreach ($titles as $node) {
Print $node->textcontent. "\ n";

This is similar to using the getElementsByTagName () method, but XPath is much more powerful, for example, if we have a TITLE element that is a child element of article (rather than a child element of item), getElementsByTagName ( ) will return it. Using the/articles/item/title syntax, we will only get the title element at the specified depth and position. This is just a simple example, and a bit more likely:

/articles/item[position () = 1]/title Returns all of the first item element

/articles/item/title[@id = ' 23 '] returns all title with an id attribute and a value of 23

/articles//title returns the title of all articles elements (translator://represents any depth)

You can also query for points that contain special sibling elements, elements that contain special text content, or use namespaces, and so on. If you have to query a lot of XML documents, the proper learning to use XPath will save you a lot of time, it's simple, fast, and requires less code than the standard DOM.

Writing data to the DOM
The Document Object model is not only read and queried, you can also manipulate and write. (The DOM standard is a bit verbose because the writer wants to support every conceivable environment as much as possible, but it works very well). Take a look at the following example, which adds a new element to our Article.xml file.

$item = $dom->createelement ("item");
$title = $dom->createelement ("title");
$titletext = $dom->createtextnode ("XML in PHP5");
$title->appendchild ($titletext);
$item->appendchild ($title);
$dom->documentelement->appendchild ($item);
Print $dom->savexml ();

First, we create all the nodes we need, an item element, a TITLE element, and a text node that contains the item title, and then we link all the nodes, add the text node to the title element, add the title element to the item element, Finally, we insert the item element onto the articles root element. Now, we have a new list of articles in our XML document.

Extended Class (Class)
Well, the above example can be done with a domxml extension under PHP4 (except that the API is somewhat different), and being able to extend the DOM class itself is a new feature of PHP5, which makes it possible to write more readable code. The following is an entire example of the DOMDocument class being written back:

Class Articles extends DOMDocument {
function __construct () {
Must call!
Parent::__construct ();

function Addarticle ($title) {
$item = $this->createelement ("item");
$titlespace = $this->createelement ("title");
$titletext = $this->createtextnode ($title);
$titlespace->appendchild ($titletext);
$item->appendchild ($titlespace);
$this->documentelement->appendchild ($item);
$dom = new articles ();
$dom->load ("Articles.xml");
$dom->addarticle ("XML in PHP5");
Print $dom->save ("Newfile.xml");

One feature that is often unnoticed in PHP5 is the LIBXML2 Library's support for HTML, where you can not only use DOM extensions to load well-formed (well-formed) XML documents, but also load unstructured (not-well-formed) HTML documents. Use it as a standard DOMDocument object, using all available methods and features, such as XPath and SimpleXML.

HTML performance is useful when you need to access a site that you can't control. With the help of XPath, XSLT, or simplexml, you omit a lot of code, like using regular expressions to compare strings or sax parsers. This approach is especially useful when the HTML document structure is not very good (this is a frequent problem!). )。

The following code obtains and parses the first page of the php.net and returns the contents of the title element.

$dom = new DOMDocument ();
$dom->loadhtmlfile ("http://www.php.net/");
$title = $dom->getelementsbytagname ("title");
Print $title->item (0)->textcontent;

Note that when the specified element is not found, your output may contain errors. If your site is still using PHP output HTML4 code, there's a good news to tell you that DOM extensions not only load HTML documents, but also save them as files in HTML4 format. After you add the DOM document, use $dom->savehtml () to save it. Note that in order for the output HTML code to conform to the standards of the consortium, it is best not to use a neat extension? (Tidy extension). The HTML supported by the LIBXML2 library does not take into account every possible event, nor does it deal well with the input in the non-common format.

Verification of XML documents is becoming more and more important. For example, if you get an XML document from some foreign resources, you need to verify that it meets a certain format before you process it. Luckily you don't need to write your own validator in PHP, because you can do it using one of the three most widely used standards (Dtd,xml Schema or Relaxng).

A DTD is a standard that arises from the SGML era and lacks some new XML features (such as namespaces) and is difficult to parse and transform because it is not written in XML.
XML Schemai is a standard developed by the consortium, which is widely used and contains almost all the content required to validate XML documents.
Relaxng is the enemy of the complex XML Schema standard, created by the Free People organization, because it is easier to implement than XML schemas, and more and more programs are starting to support Relaxng
If you don't have a legacy planning document or a very complex XML document, use Relaxng. It's easy to write and read, and more and more tools are supporting it. There is even a tool called Trang, which can automatically create a Relaxng document from an XML template. and only Relaxng (and the aging DTDs) are fully supported by LIBXML2, although LIBXML2 is about to fully support the ML Schema.

Validating the syntax of an XML document is fairly straightforward:

$dom->validate (' articles.dtd ');
$dom->relaxngvalidate (' articles.rng ');
$dom->schemavalidate (' articles.xsd ');
For now, all of this will simply return TRUE or FALSE, and the error will be made as a PHP warning output. Obviously want to return to user-friendly information this is not a good idea and will improve in later versions of PHP5.0. How this is going to happen is still under discussion, but the error report will certainly handle it better.

SimpleXML is the last member of PHP's XML family to join the SimpleXML extension to provide a simpler way to access XML documents using standard object properties and iterators. The extension doesn't have much of a way, though it's still quite powerful. Getting all the title nodes from our documents requires less code than the original.

$sxe = simplexml_load_file ("Articles.xml");
foreach ($sxe->item as $item) {
Print $item->title. " \ n ";

What the hell is going on here? First load the articles.xml into a SimpleXML object. Then get the item element in all $sxe, and finally $item->title return the content of the title element, that's it. You can also use associative array query properties, using: $item->title[' id '].

See, it's amazing behind this, there are many different ways to get the results we want, for example, $item->title[0] return and the same result as in the example, on the other hand, foreach ($sxe->item->title as $ Item only returns the first title, not all the title elements in the document. (as I expected in XPath).

SimpleXML is actually the first extension of the new Zend Engine 2 feature. So it's also a test point for these new features, and you know that in the development phase bugs and unpredictable errors are not the only few.

In addition to the method used in the previous example to traverse all nodes, there is also an XPath interface in SimpleXML that provides an easier way to access a single node.

foreach ($sxe->xpath ('/articles/item/title ') as $item) {
Print $item. "\ n";

Admittedly, this code is no shorter than the previous example, but it provides a more complex or deeper nested XML document, and you'll find that using XPath with SimpleXML will save you a lot of input.

Writing data to a SimpleXML document
Not only can you parse and read simplexml, but you can also change the SimpleXML document. At least we add some extensions:

$sxe->item->title = "XML in PHP5"; The new content of the title element.
$SXE->item->title[' id '] = 34; The new attribute of the title element.
$xmlString = $sxe->asxml (); Returns the SimpleXML object as a serialized XML string
Print $xmlString;

Interoperability of collaboration
Since SimpleXML is also based on the LIBXML2 library, you can easily convert SimpleXML objects to DOMDocument objects with little impact on speed. (documents do not have to be replicated internally), because of this mechanism, you have the best part of two objects, using a tool that is right for you to work on, and it works like this:

$sxe = Simplexml_import_dom ($dom);
$dom = Dom_import_simplexml ($SXE);
XSLT is the language used to convert XML documents into other XML documents, and XSLT itself is written in XML and belongs to the functional language family, which differs in program processing and in the face of object language (like PHP). There are two XSLT processors in PHP4: Sablotron (in a widely used XSLT extension) and libxslt (in Domxml extensions), the two APIs are not compatible with each other, and the usage methods are not the same. PHP5 only supports the LIBXSLT processor, because it is LIBXML2 based and therefore more in line with the PHP5 XML concept.

It is also possible theoretically to bind Sablotron to PHP5, but unfortunately no one is doing it. Therefore, if you are using Sablotron, you have to switch to the LIBXSLT processor in PHP5. Libxslt is a sablotron with JavaScript exception handling support, and can even use PHP's powerful data stream to recreate Sablotron's unique program processing (scheme handlers). In addition, LIBXSLT is one of the fastest XSLT processors, so you get a speed boost for free. (Execution speed is twice times that of Sablotron).

As with the other extensions discussed in this article, you can exchange XML documents between XSL extensions, Dom extensions, and vice versa, in fact, you have to do this because ext/xsl extensions do not load and save interfaces to XML documents, only using DOM extensions. When you start to learn about XSLT transformations, you don't have to master much content, and there's no standard for the standards of the consortium, because this API is borrowed from Mozilla.

First you need an XSLT style sheet, paste the following text into a new file and save the gray articls.xsl

Then invoke it with a PHP script::

/* Load XML and XSL documents into the DOMDocument object * *
$xsl = new DOMDocument ();
$xsl->load ("articles.xsl");
$inputdom = new DOMDocument ();
$inputdom->load ("Articles.xml");

/* Create an XSLT processor and import the style sheet * *
$proc = new Xsltprocessor ();
$xsl = $proc->importstylesheet ($xsl);
$proc->setparameter (NULL, "titles", "titles");

/* Convert and OUTPUT XML document * *
$newdom = $proc->transformtodoc ($inputdom);
Print $newdom->savexml ();


The above example first uses the DOM's method load () to load the XSLT stylesheet articles.xsl, and then creates a new Xsltprocessor object that leads to the following XSLT style sheet object, which can be set by the parameter Setparameter (NamespaceURI, name, value), and the last Xsltprocessor object begins the conversion using Transformtodoc ($inputdom) and returns a new DOMDocument object.

. The advantage of this API is that you can use the same stylesheet to transform many XML documents, just load it once and reuse it, because the Transormtodoc () function can be applied to different XML documents.

In addition to Transormtodoc (), there are two ways to convert: Transformtoxml ($dom) returns a string, Transformtouri ($dom, $uri) saves the converted document to a file or to a PHP data stream. Note If you want to use an XSLT syntax such as or indent= "yes", you cannot use Transformtodoc () because the DOMDocument object cannot save that information, only if you save the converted result directly into a string or file.

Calling PHP functions
The last new addition to the XSLT extension is the ability to invoke any PHP function within the XSLT stylesheet, which advocates that Orthodox XML supporters will not like this feature (this style sheet is a bit complex, confusing logic and design) and is useful in some places. XSLT becomes very limited when it comes to functions, even if you want to implement a date in a different language. But with this feature, dealing with these things is just as easy as using PHP. Here is the code to add a function to the XSLT:

function Datelang () {
Return strftime ("%A");

$xsl = new DOMDocument ();
$xsl->load ("datetime.xsl");
$inputdom = new DOMDocument ();
$inputdom->load ("Today.xml");

$proc = new Xsltprocessor ();
$proc->registerphpfunctions ();

Load the document and use $xsl to process
$xsl = $proc->importstylesheet ($xsl);

/* Convert and OUTPUT XML document * *
$newdom = $proc->transformtodoc ($inputdom);

Print $newdom->savexml ();


The following is the XSLT stylesheet datetime.xsl, which calls this function.

The following is an XML document to be converted using a stylesheet, Today.xml (Similarly, Articles.xml will get the same result).

The style sheet above, the PHP script and all the XML files will be the name of the week in the language of the current system settings. You can add more parameters to the php:function (), and the added parameters will be passed to the PHP function. Here's a function php:functionstring (), which automatically converts all input parameters into strings, so you don't need to convert them in PHP.

Note that you need to call $xslt->registerphpfunctions () before the conversion, otherwise the PHP function call will not be executed for security reasons (do you always believe in your XSLT stylesheet?). )。 The current access system has not yet been implemented and may be implemented in future versions of PHP5.

PHP's support for XML has taken a big step forward, it conforms to the standard, is powerful, interoperability is strong, is installed as the default option, has been authorized to use. The new SimpleXML extension provides a quick and easy way to access XML documents, saving you a lot of code, especially if you have structured documents or you can use powerful XPath.

Thanks to the underlying libraries used by LIBXML2-PHP5 XML extensions, validating XML documents using Dtd,relaxng or XML schemas is now supported.

XSL support has also been refurbished, and now using the LIBXSLT Library has significantly improved performance than the original Sablotron library, and invoking PHP functions inside the XSLT stylesheet allows you to write more powerful XSLT code.

If you've already used XML in PHP4 or other languages, you'll like the XML features of PHP5, the XML has changed a lot in PHP5, conforms to the standard, and other tools, the language is the same. (Source: Viphot)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.