Http://trash.chregu. TV /phpconf2003/examples/
New XML features of PHP5
Translated by Christian Stocker ice_berg16)
Target readers
This article targets all PHP developers who are interested in the new XML Functions of PHP5. We assume that the reader has basic knowledge about XML. However, if you have already used XML in your PHP, this article will also benefit you a lot.
Introduction
In today's Internet world, XML is no longer a buzzword, and it has been widely accepted and standardized. Therefore, compared with PHP4, PHP5 pays more attention to XML support. In PHP4, you are faced with almost all non-standard functions, API interruptions, memory leaks, and other incomplete functions. Although some shortcomings have been improved in PHP4.3, developers decided to discard the original code and rewrite all the code in PHP5.
This article will introduce all the exciting new XML features in PHP5 one by one.
PHP4 XML
Earlier versions of PHP began to support XML, which is just a SAX-based interface that can easily parse any XML document. With the DOMXML extension module added to PHP4, XML is better supported. Later, XSLT was added as a supplement. During the entire PHP4 phase, some other functions such as HTML, XSLT, and DTD verification are also added to the DOMXML extension. Unfortunately, because XSLT and DOMXML extensions are always in the experimental phase, the API has been modified more than once. They cannot be installed by default. In addition, DOMXML extensions do not comply with W3C DOM standards and have their own naming methods. Although PHP4.3 has been improved and many memory leaks and other functions have been fixed, it has never evolved to a stable stage, and some in-depth problems are almost impossible to be fixed. Only the SAX extension has been installed by default, and some other extensions have never been widely used.
For all of these reasons, php xml developers decided to rewrite all the code in PHP5 and follow the usage standards.
PHP5 XML
In PHP5, almost all XML-supported parts are rewritten. All XML extensions are based on the LIBXML2 library of the GNOME project. This allows different extension modules to operate on each other. The core developer only needs to develop on an underlying library. For example, complex memory management can improve all XML-related extensions only once.
In addition to inheriting the famous SAX Parser in PHP4, PHP5 also supports DOM compliant with W3C standards and XSLT Based on LIBXSLT engine. The SimpleXML extension exclusive to PHP and the SOAP extension conforming to the standard are also added. As XML becomes more and more important, PHP developers decide to add more XML support to the default installation method. This means that you can now use SAX, DOM, and SimpleXML, and these extensions will be installed on more servers. The support for XSLT and SOAP also needs to be explicitly configured during PHP compilation.
Data Stream support
All XML extensions now support PHP Data streams, even if you do not directly access them from PHP. For example, in PHP5, you can access data streams from a file or from a command. Basically, you can access the PHP Data Stream wherever you can access common files.
PHP4.3 briefly introduces the data stream, which has been further improved in PHP5, including file access, network access and other operations, such as sharing a set of functions. You can even use PHP code to implement your own data stream, which makes data access very simple. For more information about this part, see the PHP documentation.
SAX
The full name of SAX is Simple API for XML. It is an interface used to parse XML documents and is based on callback. Since the beginning of PHP3, it has supported SAX and has not changed much. In PHP5, the API has not changed, so your code can still run. The only difference is that it is no longer based on the EXPAT library, but based on the LIBXML2 library.
This change brings about some issues with namespace support, which has been resolved in LIBXML2.2.6. However, LIBXML2 is not resolved in earlier versions. Therefore, if you use xml_parse_create_ns (), we strongly recommend that you install LIBXML2.2.6 on your system.
DOM
DOM (Document Object Model) is a set of standards developed by W3C to access the XML document tree. You can use DOMXML in PHP4 to perform this operation. The main problem with DOMXML is that it does not comply with standard naming methods. Memory leakage still exists for a long time (PHP4.3 has fixed this problem ).
The new DOM extension is based on W3C standards, including methods and attribute names. If you are familiar with DOM in other languages, such as Javascript, writing similar functions in PHP will become very easy. You don't have to view the document every time, because the methods and parameters are the same.
Because the new W3C standard is used, DOMXML-based code cannot run. The APIs in PHP are very different. However, if your code uses a naming method similar to the W3C standard, porting is not very difficult. You only need to modify the Loading Function and save the function, and delete the underline in the function name (DOM standard uses uppercase letters ). Other adjustments are required, but the main logic remains unchanged.
Read DOM
I will not explain all the features of DOM extension in this article, which is also unnecessary. Maybe you should add the HTTP: // www.w3.org/domfile to the Document Signing...? /A>
In most examples of this article, we will use the same XML file. zend.com has a very simple RSS version. Paste the following text into a text file and save it as articles. xml.
Http://www.zend.com/zend/week/week172.php
Http://www.zend.com/zend/tut/tut-hatwar3.php
To load this example to a DOM object, you must first create a DOMDocument object and then load the XML file.
$ Dom = new DomDocument ();
$ Dom-> load ("articles. xml ");
As mentioned above, you can use the PHP Data Stream to load an XML document. You should write as follows:
$ Dom-> load ("file: // articles. xml ");
(Or other data streams)
If you want to output the XML document to a browser or mark it as a standard, use:
Print $ dom-> saveXML ();
If you want to save it as a file, use:
Print $ dom-> save ("newfile. xml ");
(Note that this will send the file size to stdout)
Of course, this example does not have many functions, so let's do something more useful. To obtain all the title elements. There are many ways to do this. The simplest is to use getElementsByTagName ($ tagname ):
$ Titles = $ dom-> getElementsByTagName ("title ");
Foreach ($ titles as $ node ){
Print $ node-> textContent. "\ n ";
}
The textContent attribute is not W3C standard. It allows us to quickly read all text nodes of an element. The following is the standard reading of W3C:
$ Node-> firstChild-> data;
(At this time, make sure that the firstChild node is the text node you need. Otherwise, you have to traverse all the child nodes for search ).
Another issue to note is that getElementsByTagName () returns a DomNodeList object instead of an array like get_elements_by_tagname () in PHP4, but as you can see in this example, you can use the foreach statement to easily traverse it. You can also directly use $ titles-> item (0) to access the node. This method returns the first title element.
Another way to obtain all the title elements is to traverse from the root node. You can see that this method is more complex, but this method is more flexible if you need more than the title element.
Foreach ($ dom-> documentElement-> childNodes as $ articles ){
// If the node is an element (nodeType = 1) and its name is item, the loop continues.
If ($ articles-> nodeType = 1 & $ articles-> nodeName = "item "){
Foreach ($ articles-> childNodes as $ item ){
// If a node is an element and its name is title, print it.
If ($ item-> nodeType = 1 & $ item-> nodeName = "title "){
Print $ item-> textContent. "\ n ";
}
}
}
}
XPath
XPaht is like an xml SQL statement. With XPath, You can query specific nodes that conform to some Schema syntax in an XML document. To use XPath to obtain all the title nodes, you only need to do this:
$ Xp = new domxpath ($ dom );
$ Titles = $ xp-> query ("/articles/item/title ");
Foreach ($ titles as $ node ){
Print $ node-> textContent. "\ n ";
}
?>
This is similar to using the getElementsByTagName () method, but Xpath is much more powerful. For example, if we have a title element that is a sub-element of article (rather than a sub-element of item), getElementsByTagName () it will be returned. When using the/articles/item/title syntax, we only obtain the title element at the specified depth and position. This is just a simple example, and further details may be as follows:
/Articles/item [position () = 1]/title returns all
/Articles/item/title [@ id = '23'] returns all titles with the id attribute and the value 23.
/Articles // return the title under all articles elements)
You can also query vertices with special sibling elements, elements with special text content, or namespaces. If you have to query a large number of XML documents, appropriate learning to use XPath will save you a lot of time, it is simple to use, fast to execute, less code than the standard DOM.
Write Data to the DOM
The Document Object Model does not only support reading and querying, but also supports operations and writing. (The DOM standard is a bit lengthy, because the compiler tries its best to support every conceivable environment, but it works very well ). Let's take a look at the example below. It adds a new element to our article. xml file.
$ Item = $ dom-> createElement ("item ");
$ Title = $ dom-> createElement ("title ");
$ Titletext = $ dom-> createTextNode ("XML in PHP5 ");
$ Title-> appendChild ($ titletext );
$ Item-> appendChild ($ title );
$ Dom-> documentElement-> appendChild ($ item );
Print $ dom-> saveXML ();
First, we created all the required nodes, an item element, a title element, and a text node containing the item title. Then we linked all the nodes, add the text node to the title element, add the title element to the item element, and insert the item element to the articles root element. Now, there is a new article list in our XML document.
Class)
All of the above examples can be done using DOMXML extensions under PHP4 (but the APIs are somewhat different). Being able to expand the DOM class by yourself is a new feature of PHP5, this makes it possible to write more readable code. The following is an example of rewriting with the DOMDocument class:
Class Articles extends DomDocument {
Function _ construct (){
// Required!
Parent: :__ construct ();
}
Function addArticle ($ title ){
$ Item = $ this-> createElement ("item ");
$ Titlespace = $ this-> createElement ("title ");
$ Titletext = $ this-> createTextNode ($ title );
$ Titlespace-> appendChild ($ titletext );
$ Item-> appendChild ($ titlespace );
$ This-> documentElement-> appendChild ($ item );
}
}
$ Dom = new Articles ();
$ Dom-> load ("articles. xml ");
$ Dom-> addArticle ("XML in PHP5 ");
Print $ dom-> save ("newfile. xml ");
HTML
In PHP5, a feature that is not frequently noticed is the support of the libxml2 library for HTML. You can not only use DOM extensions to load well-formed XML documents, you can also load unstructured (not-well-formed) HTML documents as standard DOMDocument objects and use all available methods and features, such as XPath and SimpleXML.
When you need to access a website that you cannot control, the HTML performance is very useful. With the help of XPath, XSLT, or SimpleXML, you have saved a lot of code, such as comparing strings or SAX parsers using regular expressions. This method is especially useful when the HTML document structure is not very good (this is a frequent problem !).
The following code retrieves and parses the php.net homepage and returns the content of the first title element.
$ Dom = new DomDocument ();
$ Dom-> loadHTMLFile ("http://www.php.net /");
$ Title = $ dom-> getElementsByTagName ("title ");
Print $ title-> item (0)-> textContent;
Note that when the specified element is not found, your output may contain errors. If your website still uses PHP to output HTML 4 Code, there is good news that DOM extensions can not only load HTML documents, but also save them as HTML 4 files. After you add the DOM document, use $ dom-> saveHTML () to save it. It should be noted that, in order to make the output HTML code comply with W3C standards, it is best not to use neat extensions? (Tidy extension ). The HTML supported by the Libxml2 library does not take every possible event into account, nor can it well process non-common format input.
Verify
XML document verification becomes more and more important. For example, if you get an XML document from some foreign resources, you need to check whether it complies with a certain format before processing. Fortunately, you don't need to write your own verification program in PHP, because you can use one of the three most widely used standards (DTD, XML Schema or RelaxNG) to complete it ..
DTD is a standard generated in the SGML era and lacks some new XML features (such as namespaces). It is difficult to parse and convert because it is not written in XML.
XML Schemai is a W3C standard that is widely used and contains almost all the content required to verify XML documents.
RelaxNG is the opposite of the complex XML Schema standard and is created by a free-user organization. Because it is easier to implement than XML Schema, more and more programs are beginning to support RelaxNG.
If you do not have any legacy planning documents or complex XML documents, use RelaxNG. It is easy to write and read, and more tools support it. Another tool, Trang, can automatically create a RelaxNG document from the XML template. In addition, only RelaxNG (and aging DTDS) is fully supported by libxml2, although libxml2 is about to fully support ML Schema.
The syntax for verifying XML documents is quite simple:
$ Dom-> validate ('articles. dtd ');
$ Dom-> relaxNGValidate ('articles. rng ');
$ Dom-> schemaValidate ('articles. xsd ');
Currently, only true or false will be returned for all these requests, and the error will be output as a PHP warning. Obviously, it is not a good idea to return user-friendly information. It will be improved in Versions later than PHP5.0. How to implement it is still under discussion, but the error report will certainly be better handled.
SimpleXML
SimpleXML is the last Member to be added to the XML family of PHP. The purpose of adding SimpleXML extensions is to provide a simpler way to access XML documents using standard Object Attributes and iterators. There are not many methods for this extension, although it is quite powerful. Getting all the title nodes from our documents requires less code than the original ones.
$ Sxe = simplexml_load_file ("articles. xml ");
Foreach ($ sxe-> item as $ item ){
Print $ item-> title. "\ n ";
}
What is this? First, load articles. xml into a SimpleXML object. Obtain all the item elements in $ sxe, and then return the content of the title element from $ item-> title. You can also use the join array to query attributes and use: $ item-> title ['id'].
It's amazing to see that. There are many different ways to get what we want, for example, $ item-> title [0] returns the same result as in the example. On the other hand, foreach ($ sxe-> item-> title as $ item) returns only the first title, not all title elements in the document. (As I expected in XPath ).
SimpleXML is actually the first extension that uses the new features of Zend Engine 2. Therefore, it has become a test point for these new features. You need to know that the bugs and unexpected errors are not a minority in the development phase.
In addition to the method used in the preceding example to traverse all nodes, SimpleXML also has an XPath interface, which provides a simpler way to access a single node.
Foreach ($ sxe-> xpath ('/articles/item/title') as $ item ){
Print $ item. "\ n ";
}
It is undeniable that this piece of code is no shorter than the previous example, but it provides more complex or deeper Nested XML documents. You will find that using XPath with SimpleXML will save you a lot of input.
Write Data to SimpleXML documents
You can not only parse and read SimpleXML, but also change the SimpleXML document. At least we should add some extensions:
$ Sxe-> item-> title = "XML in PHP5"; // new content of the title element.
$ Sxe-> item-> title ['id'] = 34; // new attribute of the title element.
$ XmlString = $ sxe-> asXML (); // The SimpleXML object is returned as a serialized XML string.
Print $ xmlString;
Interoperability
Because SimpleXML is based on the libxml2 library, you can easily convert SimpleXML objects to DomDocument objects without affecting the speed. (Documents do not need to be copied internally.) because of this mechanism, you have the best part of the two objects. Use a tool suitable for your work. It is used as follows:
$ Sxe = simplexml_import_dom ($ dom );
$ Dom = dom_import_simplexml ($ sxe );
XSLT
XSLT is a language used to convert XML documents into other XML documents. XSLT is written in XML and belongs to the functional language family. It is used for program processing and Object Language (such as PHP) different. PHP4 has two XSLT processors: Sablotron (in widely used XSLT extensions) and Libxslt (in domxml extensions), which are incompatible with each other, the usage is also different. PHP5 only supports the libxslt processor. It is selected because it is based on Libxml2 and is more in line with the XML concept of PHP5.
Theoretically, binding Sablotron to PHP5 is also possible, but unfortunately no one will do it. Therefore, if you are using Sablotron, you have to switch to the libxslt processor in PHP5. Libxslt is a Sablotron with Javascript Exception Processing support. It can even use PHP's powerful data stream to re-implement scheme handlers, which is unique to Sablotron ). In addition, libxslt is one of the fastest XSLT processors, so you get a free speed boost. (The execution speed is twice that of Sablotron ).
Like other extensions discussed in this article, you can exchange XML documents between XSL extensions, DOM extensions, and vice versa. In fact, you must do this, because EXT/XSL extensions do not have interfaces for loading and saving XML documents, you can only use DOM extensions. You don't need to know much about XSLT conversion at the beginning. There is no W3C standard here, because this API is "borrowed" from Mozilla.
First, you need an XSLT style sheet to paste the following text into a new file and save the gray articls. xsl
Then use the PHP script to call it ::
/* Load XML and XSL documents to the DOMDocument object */
$ Xsl = new DomDocument ();
$ Xsl-> load ("articles. xsl ");
$ Inputdom = new DomDocument ();
$ Inputdom-> load ("articles. xml ");
/* Create An XSLT processor and import the style sheet */
$ Proc = new effectprocessor ();
$ Xsl = $ proc-> importStylesheet ($ xsl );
$ Proc-> setParameter (null, "titles", "Titles ");
/* Convert and output the XML document */
$ Newdom = $ proc-> transformToDoc ($ inputdom );
Print $ newdom-> saveXML ();
?>
The preceding example first uses the DOM method load () to load the XSLT style table articles. xsl, and a new XsltProcessor object is created. This object is imported to the XSLT style table object to be used later. You can set setParameter (namespaceURI, name, value) as follows ), finally, the deletprocessor object starts the conversion using transformToDoc ($ inputdom) and returns a new DOMDocument object.
. The advantage of this API is that you can use the same style sheet to convert many XML documents. You only need to load it once and reuse it again, because the transormToDoc () function can be applied to different XML documents.
In addition to transormToDoc (), there are two methods for conversion: transformToXML ($ dom) returns a string, transformToURI ($ dom, $ uri) save the converted document to a file or a PHP data stream. Note: If you want to use an XSLT syntax such as or indent = "yes", you cannot use transformToDoc () because the DOMDocument object cannot save this information, this can only be done when you directly Save the converted result to a string or file.
Call PHP Functions
The last new feature of XSLT extension is that it can call any PHP function within the XSLT style sheet, and advocates that Orthodox XML supporters will not like this function (such style sheets are a bit complicated, it is easy to confuse logic and Design), but in some places it is very useful. XSLT becomes very limited when it comes to functions, and it is very troublesome to output a date in different languages. However, using this function is as easy as simply using PHP. The following code adds a function to XSLT:
Function dateLang (){
Return strftime ("% ");
}
$ Xsl = new DomDocument ();
$ Xsl-> load ("datetime. xsl ");
$ Inputdom = new DomDocument ();
$ Inputdom-> load ("today. xml ");
$ Proc = new effectprocessor ();
$ Proc-> registerPhpFunctions ();
// Load the document and use $ xsl for processing
$ Xsl = $ proc-> importStylesheet ($ xsl );
/* Convert and output the XML document */
$ Newdom = $ proc-> transformToDoc ($ inputdom );
Print $ newdom-> saveXML ();
?>
The following is the XSLT style table datetime. xsl, which calls this function.
The following is the XML document for converting the style sheet, today. xml (Similarly, articles. xml will also get the same result ).
In the style sheet above, PHP scripts and all XML files will output the name of the week in the language set by the current system. You can add more parameters to php: function (), and the added parameters will be passed to the PHP function. Here is a function php: functionString (), which automatically converts all input parameters to strings, so you do not need to convert them in PHP.
Note that you need to call $ xslt-> registerPhpFunctions () before conversion. Otherwise, PHP function calls will not be executed for security reasons (Do you always trust your XSLT style sheet ?). Currently, the access system has not been implemented. This function may be implemented in the PHP5 version in the future.
Summary
PHP's support for XML has taken a big step forward. It complies with standards, has powerful functionality, strong interoperability, and is installed as the default option and has been authorized for use. The new SimpleXML extension provides a simple and quick way to access XML documents, saving you a lot of code, especially when you have structured documents or can use powerful XPath.
Thanks to the underlying library used by the libxml2-PHP5 XML extension, the validation of XML documents with DTD, RelaxNG or XML Schema is now supported.
The XSL support has also been updated. The Libxslt library is now used, which is much better than the original Sablotron library in terms of performance, calling the PHP function in the XSLT style sheet allows you to write more powerful XSLT code.
If you have already used XML in PHP4 or other languages, you will like the XML features of PHP5. XML has made great changes in PHP5, complying with standards, and other tools, the language is equivalent.
Link
PHP 4 Problems
Domxml Extension: http://www.php.net/domxml/
Sablotron Extension: http://www.php.net/xslt/
Libxslt: http://www.php.net/manual/en/functi...-stylesheet.php
PHP 5 Problems
SimpleXML: http://www.php.net/simplexml/
Streams: http://www.php.net/manual/en/ref.stream.php
Standard
DOM: http://www.w3.org/DOM
XSLT: http://www.w3.org/TR/xslt
XPath: http://www.w3.org/TR/xpath
XML Schema: http://www.w3.org/XML/Schema
RelaxNG: http://relaxng.org/
Xinclude: http://www.w3.org/TR/xinclude/
Tools
Libxml2, the underlying library: http://xmlsoft.org/
Trang, a Schema/RelaxNG/etc converter: http://www.thaiopensource.com/relaxng/trang.html
About the author
Christian Stocker is the founder and CEO of Shi Bitflux GmbH. He is a co-author of XSL, DOM and imagick extensions, a German book PHP de Luxe, and is committed to other open-source projects, such as Bitflux Editor and Popoon .. You can use a generic chregu@php.net. contact him.