PHP XML Analysis function code 1th/2 page _php tips

Source: Internet
Author: User
Tags cdata php and processing instruction xml example xml parser
First of all, I have to admit I like computer standards. If everyone complies with the standards of the industry, the Internet will be a better medium. The use of standardized data interchange formats enables open and platform-independent computing patterns to be feasible. That's why I'm a fan of XML.
Luckily, my favorite scripting language not only supports XML, but it's also growing in support. PHP allows me to quickly publish XML documents to the Internet, collect statistical information about XML documents, and convert XML documents into other formats. For example, I often use PHP's XML processing power to manage the articles and books I write with XML.
In this article, I'll discuss any PHP-built expat parser to work with XML documents. With the example, I'll demonstrate the expat approach. At the same time, examples can tell you
How is it:
Build your own handler function
Convert XML documents into your own PHP data structure
Introduction Expat
XML parsers, also known as XML processors, enable programs to access the structure and content of XML documents. Expat is the XML parser for the PHP scripting language. It is also used in
Other projects, such as Mozilla, Apache, and Perl.
What is an event-based parser?
Two basic types of XML parsers:
Tree-based parser: Converts an XML document into a tree-like structure. This type of parser analyzes the entire article and provides an API to access each element of the resulting tree. Its pass
The standard used is DOM (Document object mode).
event-based Parser: Treats an XML document as a series of events. When a particular event occurs, the parser will invoke the function provided by the developer to handle it.
An event-based parser has a view of the dataset in an XML document, which means it is concentrated in the data portion of the XML document, not its structure. These parsers from beginning to end
Processes the document and reports to the application similar to the start of the element, the end of the element, the start of the feature data, and so on-events through the callback (callback) function. To
The following is an example of a "Hello-world" XML document:
<greeting>
Hello World
</greeting>
The event-based parser will report as three events:
Start element: Greeting
CDATA The beginning of the item, the value is: Hello World
End element: Greeting
Unlike a tree-based parser, an event-based parser does not produce a structure that describes the document. In CDATA items, an event-based parser doesn't let you get the parent element
Greeting of information.
However, it provides a lower level of access, which makes it possible to make better use of resources and faster access. In this way, there is no need to put the entire document into memory
; In fact, the entire document can even be larger than the actual memory value.
Expat is such an event-based parser. Of course, if you use expat, it can generate a complete native tree structure in PHP as necessary.
The examples above hello-world include the full XML format. However, it is not valid because there is neither a DTD (document type definition) associated with it, nor an inline DTD.
For expat, this makes no difference: Expat is a parser that does not check for validity, and therefore ignores any DTD associated with the document. However, it should be noted that the document still needs to be completed
The entire format, otherwise expat (as with any other XML-compliant parser) will stop with the error message.
As a parser that does not check the validity of the exapt, the fast and lightweight nature of the device makes it ideal for Internet applications.
Compiling expat
Expat can be compiled into the PHP3.0.6 version (or more). Starting with Apache1.3.9, expat has been part of Apache. In Unix systems, through the-with
-xml option to configure PHP, you can compile it into PHP.
If you compile PHP as an Apache module, expat will default as part of Apache. In Windows, you have to load the XML dynamic connection library.
XML Example: Xmlstats
One way to learn about expat's functions is through examples. The example we are going to discuss is using expat to collect statistics for XML documents.
For each element in the document, the following information is output:
The number of times the element is used in the document
The number of character data in the element
Element's parent element
Child elements of an element
Note: In order to demonstrate, we use PHP to create a structure to hold the parent element and child element of the element
Get ready
The function used to produce an instance of an XML parser is xml_parser_create (). The instance will be used for all future functions. This idea is very similar to the MySQL function in PHP
The connection mark. The event-based parser typically requires you to register a callback function before parsing a document-for a particular event to occur. Expat no exceptions, it
The following seven possible events are defined:
Object XML parsing function description
Start and end of element Xml_set_element_handler () element
Start of character data Xml_set_character_data_handler () character data
External entity Xml_set_external_entity_ref_handler () external entity appears
unresolved external entity xml_set_unparsed_entity_decl_handler () unresolved external entities appear
Processing instruction Xml_set_processing_instruction_handler () the appearance of processing instructions
The appearance of the Declaration of Xml_set_notation_decl_handler () notation of notation
Default Xml_set_default_handler () other events that do not have a handler function specified
All callback functions must have an instance of the parser as its first argument (in addition to other parameters).
For the example script at the end of this article. What you need to be aware of is that it uses both the element handler function and the character data processing function. The callback handler function for the element
Xml_set_element_handler () to register.
This function requires three parameters:
Instance of the parser
Name of the callback function that handles the start element
Name of the callback function that handles the end element
When you begin parsing an XML document, the callback function must exist. They must be defined as consistent with the prototype described in the PHP manual.
For example, expat passes three arguments to the handler function of the start element. In the scripting example, it is defined as follows:
function Start_element ($parser, $name, $attrs)
The first parameter is the parser indicator, the second parameter is the name of the start element, and the third parameter is an array containing all the attributes and values of the element.
Once you start parsing the XML document, expat will call your start_element () function and pass the argument to the beginning when it encounters the start element.
Case folding options FOR XML
Close the case folding option with the Xml_parser_set_option () function. This option is turned on by default, so that the element names passed to the handler are automatically converted to
Capital. However, XML is sensitive to capitalization (so capitalization is very important for statistical XML documents). For our example, the case folding option must be closed.
Parsing documents
Now that the script can finally parse the XML document after all the preparations have been done:
Xml_parse_from_file (), a custom function that opens the file specified in the parameter and resolves it in 4kb size
As with Xml_parse () and Xml_parse_from_file (), False is returned when an error occurs that the XML document is not in full format.
You can use the Xml_get_error_code () function to get the last error of the numeric code. Pass this digital code to the Xml_error_string () function to get
The wrong text message.
Outputs the current number of rows in the XML, making debugging easier.
In the parsing process, the callback function is invoked.
Describe document structure
When parsing a document, the question for expat needs to be emphasized: how do you keep a basic description of the document structure?
As mentioned earlier, the event-based parser itself does not produce any structural information.
However, the tag structure is an important feature of XML. For example, the element sequence <book><title> the meaning of the expression is different from < figure><title>. In other words, any
Will tell you that the title is not related to the name of the picture, although they all use the term "title". Therefore, in order to more effectively use an event-based parser to process XML
, you must use your own stack (stacks) or list (lists) to maintain the document's structural information.
To produce a mirror image of the document structure, the script needs to know at least the parent element of the current element. The EXAPT API is not implemented, it only reports events for the current element, not
There is any information about the relationship. Therefore, you need to build your own stack structure.
The script example uses the advanced back-out (FILO) stack structure. With an array, the stack saves all the start elements. For the start element handler function, the current element will be
The Array_push () function is pushed to the top of the stack. Accordingly, the end element handler function removes the topmost element by Array_pop ().
For sequence <book><title></title></book&gt, the stack is populated as follows:
Start element Book: assigns "book" to the first element of the stack ($stack [0]).
Start element title: Assign "title" to the top of the stack ($stack [1]).
End element Title: Removes the topmost element from the stack ($stack [1]).
End element Title: Removes the topmost element from the stack ($stack [0]).
PHP3.0 implements an example by using a $depth variable to manually control the nesting of elements. This makes the script look more complex. PHP4.0 through Array_pop () and
Array_push () Two functions to make the script look more concise.
Collect Data
To gather information about each element, the script needs to remember the events for each element. Save all the different elements in a document by using a global array variable $elements
。 An array of items is an instance of an element class, with 4 properties (variables of the class)
$count-Number of times the element was found in the document
$chars-Number of bytes of character event in element
$parents-Parent Element
$childs-child element
As you can see, it's easy to keep the class instance in an array.
Note: One feature of PHP is that you can traverse the entire class structure through the while (list () = each ()) loop as you traverse the entire corresponding array. All classes are changed
Quantity (and the method name when you use PHP3.0) is output as a string.
When an element is found, we need to increment its corresponding register to track how many times it appears in the document. Add one to the count element in the corresponding $elements item.
We also want the parent element to know that the current element is its child element. Therefore, the name of the current element will be added to the project of the parent element's $childs array. Finally, the eye
The former element should remember who is its parent element. Therefore, the parent element is added to the project of the current element $parents array.
Show statistic Information
The rest of the code loops through the $elements array and its child arrays to show its statistical results. This is the simplest nested loop, although the output is the correct result, but the code is not simple
Clean doesn't have any special skills, it's just a cycle that you can use to get the job done every day.
The scripting example is designed to be invoked through the command line in PHP's CGI mode. Therefore, the format of the statistical results output is text format. If you're going to use the script on the Internet,
, you need to modify the output function to produce HTML format.
Summarize
EXAPT is the XML parser for PHP. As an event-based parser, it does not produce a structure description of the document. But by providing low-level access, this makes it possible to better utilize the funding
Source and faster access.
As a parser that does not check the validity, expat ignores the DTD that is connected to the XML document, but if the document is not fully formatted, it will stop with the error message.
Provides event-handling functions to process documents
Build your own event structures such as stacks and trees to get the benefits of XML structure information tags.
New XML programs appear every day, and PHP's support for XML is growing (for example, adding support for DOM-based XML parser libxml).
With PHP and expat, you can prepare for an emerging set of effective, open, and platform-independent standards.
Current 1/2 page 12 Next read the full text

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.