Phpxml analysis function code

Source: Internet
Author: User
Tags xml example xml parser
My favorite scripting language not only supports XML, but also keeps increasing support for it. PHP allows me to quickly publish XML documents to the Internet, collect XML document statistics, and convert XML documents to other formats. First, I must admit that I like computer standards. If everyone complies with industry standards, the Internet will be a better media. Standardized data exchange formats can make open and platform-independent computing modes feasible. This is why I am a fan of XML.
Fortunately, my favorite scripting language not only supports XML, but also keeps increasing support for it. PHP allows me to quickly publish XML documents to the Internet, collect XML document statistics, and convert XML documents to other formats. For example, I often use PHP's XML processing capability to manage articles and books written in XML.
In this article, I will discuss any use of PHP's built-in Expat parser to process XML documents. The example shows how to handle Expat. At the same time, the example can tell you
How:
Create your own processing functions
Convert the XML file into your own PHP Data structure
Introduction to Expat
The XML parser, also known as the XML processor, allows the program to access the structure and content of the XML document. Expat is an XML parser for PHP scripting. It is also used in
In other projects, such as Mozilla, Apache, and Perl.
What is an event-based parser?
Two basic types of XML parser:
Tree-based parser: converts an XML document into a tree structure. This type of parser analyzes the entire article and provides an API to access each element of the generated tree. Qitong
The standard is DOM (document object mode ).
Event-based parser: treats XML documents as a series of events. When a special event occurs, the parser calls the functions provided by the developer for processing.
The event-based parser has a centralized data view of the XML document, that is, it is concentrated in the data part of the XML document, rather than its structure. These resolvers start from end to end
Process the document and report the event to the application through the callback function, such as the start of the element, the end of the element, and the start of the feature data. To
Below is an XML document example of "Hello-World:

Hello World

The event-based parser reports three events:
Start element: greeting
Start of the CDATA entry; value: Hello World
End element: greeting
Unlike the tree-based parser, the event-based parser does not generate the structure of the description document. In the CDATA item, the event-based parser won't let you get the parent element
Greeting information.
However, it provides a more underlying access, which enables better resource utilization and faster access. In this way, there is no need to put the entire document into the memory
In fact, the entire document can be larger than the actual memory value.
Expat is such an event-based parser. Of course, if you use Expat, it can also generate a full native tree structure in PHP if necessary.
The preceding Hello-World example contains the complete XML format. However, it is invalid because no DTD (Document Type Definition) is associated with it and no DTD is embedded.
For Expat, there is no difference: Expat is a parser that does not check validity, so ignore any DTD associated with the document. However, it should be noted that the document still needs to be completed
Format. otherwise, Expat (the same as other XML-compliant parser) will stop with the error message.
As a parser that does not check the validity, Exapt is very suitable for Internet applications because of its fast and lightweight nature.
Compile Expat
Expat can be compiled into PHP3.0.6 (or later. Since Apache1.3.9, Expat is already part of Apache. In Unix systems
-Configure PHP with the xml option. you can compile it into PHP.
If you compile PHP into an Apache module, Expat uses it as part of Apache by default. In Windows, you must load the XML dynamic connection Library.
XML Example: XMLstats
One way to understand the Expat function is through examples. The example we will discuss is to use Expat to collect statistics for XML documents.
For each element in the document, the following information is output:
Number of times this element is used in the document
Number of characters in the element
Element parent element
Child element of an element
Note: for demonstration, we use PHP to generate a structure to save the parent and child elements of an element.
Preparation
The function used to generate an XML parser instance is xml_parser_create (). This instance will be used for all future functions. This idea is very similar to the MySQL function in PHP.
Connection tag. Before parsing a document, the event-based parser usually requires you to register a callback function-called when a specific event occurs. Expat has no exception event. it
The following seven possible events are defined:
Object XML parsing function description
Start and end of element xml_set_element_handler ()
Character data xml_set_character_data_handler () start of character data
External entity xml_set_external_entity_ref_handler () external entity appears
External entity xml_set_unparsed_entity_decl_handler () not resolved external entity appears
Processing Command xml_set_processing_instruction_handler () processing command appears
The emergence of the xml_set_notation_decl_handler () method declaration
By default, xml_set_default_handler () is used for events that do not specify a processing function.
All callback functions must take the parser instance as its first parameter (and other parameters ).
For the sample script at the end of this article. Note that it uses both the element processing function and the character data processing function. Element callback handler
Xml_set_element_handler.
This function requires three parameters:
Parser instance
Name of the callback function for processing the start element
Name of the callback function for processing the end element
When parsing XML documents, the callback function must exist. They must be defined as consistent with the prototype described in the PHP Manual.
For example, Expat passes three parameters to the processing function of the start element. In the script example, it is defined as follows:
Function start_element ($ parser, $ name, $ attrs)
The first parameter is the parser identifier, the second parameter is the name of the start element, and the third parameter is an array containing all attributes and values of the element.
Once you start parsing the XML document, Expat will call your start_element () function and pass the parameters when encountering the starting element.
Case Folding options of XML
Use the xml_parser_set_option () function to disable the Case folding option. This option is enabled by default, so that the element name passed to the handler function is automatically converted
Uppercase. However, XML is case sensitive (so it is very important to collect XML documents ). For our example, the case folding option must be disabled.
Parsing document
After completing all the preparations, the script can finally parse the XML document:
Xml_parse_from_file (), a custom function that opens the file specified in the parameter and parses it in 4 kb size.
Xml_parse () is the same as xml_parse_from_file (). in case of an error, that is, if the XML file format is incomplete, false is returned.
You can use the xml_get_error_code () function to get the last wrong numeric code. Pass the code to the xml_error_string () function.
Incorrect text information.
Output the current number of lines in XML to make debugging easier.
Call the callback function during parsing.
Description Document structure
When parsing a document, the question that Expat needs to emphasize is: how to maintain the basic description of the document structure?
As mentioned above, the event-based parser itself does not generate any structure information.
The tag structure is an important feature of XML. For example, element sequence It means different from <figure> <title>. That is to say, any author <BR> will tell you that the title and the name of the image do not matter, although they all use the term "title. Therefore, to more effectively use the event-based parser to process XML <BR>, you must use your own stack (stacks) or list (lists) to maintain the structure information of the document. <BR> to generate an image of the document structure, the script must at least know the parent element of the current element. The Exapt API cannot be implemented. it only reports the events of the current element, but does not <BR> have any information about the frontend and backend relationships. Therefore, you need to build your own stack structure. <BR> The script example uses the stack structure of FILO. Through an array, the stack will save all the starting elements. For the start element processing function, the current element will be pushed to the top of the stack by the <BR> array_push () function. Correspondingly, the end element handler removes the top element through array_pop. <BR> for the sequence <book> <title> Stack filling:
Start element book: assign "book" to the first element of the stack ($ stack [0]).
Start element title: assign "title" to the top of the stack ($ stack [1]).
End element title: remove the top element from the stack ($ stack [1]).
End element title: remove the top element from the stack ($ stack [0]).
PHP3.0 uses a $ depth variable to manually control the nesting of elements to implement an example. This makes the script look complicated. PHP4.0 uses array_pop () and
Array_push () functions make the script look more concise.
Collect data
To collect information about each element, the script needs to remember the events of each element. You can use a global array variable $ elements to save all the different elements in the document.
. An array project is an element class instance with four attributes (class variables)
$ Count-number of times this element is found in the document
$ Chars-Number of bytes of the character event in the element
$ Parents-parent element
$ Childs-child element
As you can see, it is easy to store class instances in arrays.
Note: One feature of PHP is that you can traverse the entire class structure through the while (list () = each () loop, just as you traverse the entire array. All class changes
Amount (when you use PHP3.0, there is also a method name) are all output in string mode.
When an element is found, we need to add its corresponding counter to track how many times it appears in the document. Add one to the count element in the corresponding $ elements item.
We also need to let the parent element know that the current element is its child element. Therefore, the name of the current element will be added to the project of the $ childs array of the parent element. Finally
The former element should remember who is its parent element. Therefore, the parent element is added to the project of the current element $ parents array.
Show statistics
The remaining code cyclically displays the statistical results in the $ elements array and its subarrays. This is the simplest nested loop. although the correct results are output, the code is not concise.
Jie has no special skills. it is just a cycle that you may use to complete your work every day.
The script example is designed to be called through the command line in CGI mode of PHP. Therefore, the output format of statistical results is text. If you want to apply scripts to the Internet
You need to modify the output function to generate the HTML format.
Summary
Exapt is the XML parser of PHP. As an event-based parser, it does not generate the structure description of the document. However, by providing underlying access, you can make better use of resources.
Source and faster access.
As a parser that does not check the validity, Expat ignores the DTD connected to the XML document. However, if the document format is incomplete, it will stop with the error message.
Provides event processing functions to process documents.
Create your own event structures, such as stacks and trees, to obtain the advantages of XML structure information tagging.
Every day, new XML programs are available, and PHP's support for XML is also increasing (for example, XML parser LibXML based on DOM is added ).
With PHP and Expat, you can prepare for the upcoming effective, open, and platform-independent standards.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.