Php xml analysis function (this article introduces the XML analysis function in PHP.

Source: Internet
Author: User
Tags mysql functions xml example
Php xml analysis functions I must admit that I like computer standards first. If everyone complies with industry standards, the Internet will be a better media. Standardized data exchange formats can make open and platform-independent computing modes feasible. This is why I am a fan of XML. Fortunately, my favorite scripting language not only supports XML but also XML analysis functions of PHP.

First, I must admit that I like computer standards. If everyone complies with industry standards, the Internet will be a better media. Standardized data exchange formats can make open and platform-independent computing modes feasible. This is why I am a fan of XML.

Fortunately, my favorite scripting language not only supports XML, but also keeps increasing support for it. PHP allows me to quickly publish XML documents to the Internet, collect XML document statistics, and convert XML documents to other formats. For example, I often use PHP's XML processing capability to manage articles and books written in XML.

In this article, I will discuss any use of PHP's built-in Expat parser to process XML documents. The example shows how to handle Expat. At the same time, the example shows you how:

Create your own processing functions
Convert the XML file into your own PHP Data structure

Introduction to Expat

The XML parser, also known as the XML processor, allows the program to access the structure and content of the XML document. Expat is an XML parser for PHP scripting. It is also used in other projects, such as Mozilla, Apache, and Perl.

What is an event-based parser?

Two basic types of XML parser:

Tree-based parser: converts an XML document into a tree structure. This type of parser analyzes the entire article and provides an API to access each element of the generated tree. Its general standard is DOM (document object mode ).
Event-based parser: treats XML documents as a series of events. When a special event occurs, the parser calls the functions provided by the developer for processing.
The event-based parser has a centralized data view of the XML document, that is, it is concentrated in the data part of the XML document, rather than its structure. These parsers process documents from start to end and report events, such as the beginning of an element, the end of an element, and the start of feature data, to the application through the callback function. The following is an XML document example of "Hello-World:


Hello World


The event-based parser reports three events:

Start element: greeting
Start of the CDATA entry; value: Hello World
End element: greeting
Unlike the tree-based parser, the event-based parser does not generate the structure of the description document. In the CDATA item, the event-based parser does not give you information about the parent element greeting.
However, it provides a more underlying access, which enables better resource utilization and faster access. In this way, there is no need to put the entire document into the memory. In fact, the entire document can be larger than the actual memory value.


Expat is such an event-based parser. Of course, if you use Expat, it can also generate a full native tree structure in PHP if necessary.


The preceding Hello-World example contains the complete XML format. However, it is invalid because no DTD (Document Type Definition) is associated with it and no DTD is embedded.


For Expat, there is no difference: Expat is a parser that does not check validity, so ignore any DTD associated with the document. However, it should be noted that the document still needs the complete format, otherwise Expat (the same as other XML-compliant parser) will stop with the error message.


As a parser that does not check the validity, Exapt is very suitable for Internet applications because of its fast and lightweight nature.


Compile Expat

Expat can be compiled into PHP3.0.6 (or later. Since Apache1.3.9, Expat is already part of Apache. In Unix systems, you can use the "-with-xml" option to configure PHP and compile it into PHP.


If you compile PHP into an Apache module, Expat uses it as part of Apache by default. In Windows, you must load the XML dynamic connection Library.

XML Example: XMLstats

One way to understand the Expat function is through examples. The example we will discuss is to use Expat to collect statistics for XML documents.


For each element in the document, the following information is output:

Number of times this element is used in the document
Number of characters in the element
Element parent element
Child element of an element
Note: for demonstration, we use PHP to generate a structure to save the parent and child elements of an element.

Preparation

The function used to generate an XML parser instance is xml_parser_create (). This instance will be used for all future functions. This idea is very similar to the connection mark of MySQL functions in PHP. Before parsing a document, the event-based parser usually requires you to register a callback function-called when a specific event occurs. Expat has no exception event. it defines the following seven possible events:


Object XML parsing function description

Start and end of element xml_set_element_handler ()

Character data xml_set_character_data_handler () start of character data

External entity xml_set_external_entity_ref_handler () external entity appears

External entity xml_set_unparsed_entity_decl_handler () not resolved external entity appears

Processing Command xml_set_processing_instruction_handler () processing command appears

The emergence of the xml_set_notation_decl_handler () method declaration

By default, xml_set_default_handler () is used for events that do not specify a processing function.

All callback functions must take the parser instance as its first parameter (and other parameters ).


For the sample script at the end of this article. Note that it uses both the element processing function and the character data processing function. The element callback handler is registered through xml_set_element_handler.


This function requires three parameters:

Parser instance
Name of the callback function for processing the start element
Name of the callback function for processing the end element
When parsing XML documents, the callback function must exist. They must be defined as consistent with the prototype described in the PHP Manual.


For example, Expat passes three parameters to the processing function of the start element. In the script example, it is defined as follows:


Function start_element ($ parser, $ name, $ attrs)


The first parameter is the parser identifier, the second parameter is the name of the start element, and the third parameter is an array containing all attributes and values of the element.


Once you start parsing the XML document, Expat will call your start_element () function and pass the parameters when encountering the starting element.


Case Folding options of XML

Use the xml_parser_set_option () function to disable the Case folding option. This option is enabled by default, so that the element name passed to the handler function is automatically converted to uppercase. However, XML is case sensitive (so it is important to collect XML documents ). For our example, the case folding option must be disabled.


Parsing document

After completing all the preparations, the script can finally parse the XML document:

Xml_parse_from_file (), a custom function that opens the file specified in the parameter and parses it in 4 kb size.
Xml_parse () is the same as xml_parse_from_file (). in case of an error, that is, if the XML file format is incomplete, false is returned.
You can use the xml_get_error_code () function to get the last wrong numeric code. Pass this numeric code to the xml_error_string () function to get the incorrect text information.
Output the current number of lines in XML to make debugging easier.
Call the callback function during parsing.
Description Document structure

When parsing a document, the question that Expat needs to emphasize is: how to maintain the basic description of the document structure?


As mentioned above, the event-based parser itself does not generate any structure information.


However, the tag structure is an important feature of XML. For example, element sequence It means different from <figure> <title>. That is to say, any author will tell you that the title and the graph name are irrelevant, although they all use the term "title. Therefore, in order to more effectively use the event-based parser to process XML, you must use your own stack (stacks) or list (lists) to maintain the structure information of the document. <Br/> to generate an image of the document structure, the script must at least know the parent element of the current element. The Exapt API cannot be implemented. it only reports the events of the current element without any information on the frontend and backend relationships. Therefore, you need to build your own stack structure. <Br/> The script example uses the stack structure of FILO. Through an array, the stack will save all the starting elements. For the start element processing function, the current element will be pushed to the top of the stack by the array_push () function. Correspondingly, the end element handler removes the top element through array_pop. <Br/> for sequences <book> <title> Stack filling:

Start element book: assign "book" to the first element of the stack ($ stack [0]).
Start element title: assign "title" to the top of the stack ($ stack [1]).
End element title: remove the top element from the stack ($ stack [1]).
End element title: remove the top element from the stack ($ stack [0]).
PHP3.0 uses a $ depth variable to manually control the nesting of elements to implement an example. This makes the script look complicated. PHP4.0 uses the array_pop () and array_push () functions to make the script look more concise.


Collect data

To collect information about each element, the script needs to remember the events of each element. You can use a global array variable $ elements to save all the different elements in the document. An array project is an element class instance with four attributes (class variables)

$ Count-number of times this element is found in the document
$ Chars-Number of bytes of the character event in the element
$ Parents-parent element
$ Childs-child element
As you can see, it is easy to store class instances in arrays.


Note: One feature of PHP is that you can traverse the entire class structure through the while (list () = each () loop, just as you traverse the entire array. All class variables (and method names when PHP3.0 is used) are output as strings.


When an element is found, we need to add its corresponding counter to track how many times it appears in the document. Add one to the count element in the corresponding $ elements item.


We also need to let the parent element know that the current element is its child element. Therefore, the name of the current element will be added to the project of the $ childs array of the parent element. Finally, the current element should remember who is its parent element. Therefore, the parent element is added to the project of the current element $ parents array.


Show statistics

The remaining code cyclically displays the statistical results in the $ elements array and its subarrays. This is the simplest nested loop. although the correct results are output, the code is not concise and has no special skills. it is just a loop that you may use to complete your work every day.


The script example is designed to be called through the command line in CGI mode of PHP. Therefore, the output format of statistical results is text. If you want to apply scripts to the Internet, you need to modify the output function to generate HTML format.

Summary

Exapt is the XML parser of PHP. As an event-based parser, it does not generate the structure description of the document. However, by providing underlying access, you can make better use of resources and access faster.


As a parser that does not check the validity, Expat ignores the DTD connected to the XML document. However, if the document format is incomplete, it will stop with the error message.


Provides event processing functions to process documents.
Create your own event structures, such as stacks and trees, to obtain the advantages of XML structure information tagging.
Every day, new XML programs are available, and PHP's support for XML is also increasing (for example, XML parser LibXML based on DOM is added ).


With PHP and Expat, you can prepare for the upcoming effective, open, and platform-independent standards.

Example

/*************************************** **************************************
* Name: XML parsing example: XML Document Information Statistics
* Description
* This example uses the PHP Expat parser to collect and collect XML document information (for example, the number of times each element appears, parent element, and child element ).
* The XML file is used as a parameter./xmlstats_PHP4.php3 test. xml
* $ Requires: Expat requirement: Expat PHP4.0 is compiled into CGI mode
**************************************** *************************************/

// The first parameter is the XML file.
$ File = $ argv [1];

// Variable initialization
$ Elements = $ stack = array ();
$ Total_elements = $ total_chars = 0;

// Basic Element class
Class element
{
Var $ count = 0;
Var $ chars = 0;
Var $ parents = array ();
Var $ childs = array ();
}

// Functions used to parse XML files
Function xml_parse_from_file ($ parser, $ file)
{
If (! File_exists ($ file ))
{
Die ("Can't find file \" $ file \".");
}

If (! ($ Fp = @ fopen ($ file, "r ")))
{
Die ("Can't open file \" $ file \".");
}

While ($ data = fread ($ fp, 4096 ))
{
If (! Xml_parse ($ parser, $ data, feof ($ fp )))
{
Return (false );
}
}

Fclose ($ fp );

Return (true );
}

// Output result function (in box format)
Function print_box ($ title, $ value)
{
Printf ("\ n + % '-60 s + \ n ","");
Printf ("| % 20 s", "$ title :");
Printf ("% 14 s", $ value );
Printf ("% 26s | \ n ","");
Printf ("+ % '-60 s + \ n ","");
}

// Output result function (row form)
Function print_line ($ title, $ value)
{
Printf ("% 20 s", "$ title :");
Printf ("% 15s \ n", $ value );
}

// Sorting function
Function my_sort ($ a, $ B)
{
Return (is_object ($ a) & is_object ($ B )? $ B-> count-$ a-> count: 0 );
}

Function start_element ($ parser, $ name, $ attrs)
{
Global $ elements, $ stack;

// Is the element already in the global $ elements array?
If (! Isset ($ elements [$ name])
{
// No-adds an element to the class instance
$ Element = new element;
$ Elements [$ name] = $ element;
}

// Add a counter for this element
$ Elements [$ name]-> count ++;

// Is there a parent element?
If (isset ($ stack [count ($ stack)-1])
{
// Yes-assign the parent element to $ last_element
$ Last_element = $ stack [count ($ stack)-1];

// If the parent element array of the current element is empty, the initialization value is 0.
If (! Isset ($ elements [$ name]-> parents [$ last_element])
{
$ Elements [$ name]-> parents [$ last_element] = 0;
}

// Add one to the element's parent element counter
$ Elements [$ name]-> parents [$ last_element] ++;

// If the child element array of the parent element of the current element is null, the initialization value is 0.

If (! Isset ($ elements [$ last_element]-> childs [$ name])
{
$ Elements [$ last_element]-> childs [$ name] = 0;
}

// The child element counter of the parent element of this element plus one
$ Elements [$ last_element]-> childs [$ name] ++;
}

// Add the current element to the stack
Array_push ($ stack, $ name );
}

Function stop_element ($ parser, $ name)
{
Global $ stack;

// Remove the top element from the stack
Array_pop ($ stack );
}

Function char_data ($ parser, $ data)
{
Global $ elements, $ stack, $ depth;

// Increase the number of characters in the current element
$ Elements [$ stack] [count ($ stack)-1]-> chars + = strlen (trim ($ data ));
}

// The instance that generates the parser
$ Parser = xml_parser_create ();

// Set the processing function
Xml_set_element_handler ($ parser, "start_element", "stop_element ");
Xml_set_character_data_handler ($ parser, "char_data ");
Xml_parser_set_option ($ parser, XML_OPTION_CASE_FOLDING, 0 );

// Parse the file
$ Ret = xml_parse_from_file ($ parser, $ file );
If (! $ Ret)
{
Die (sprintf ("XML error: % s at line % d ",
Xml_error_string (xml_get_error_code ($ parser )),
Xml_get_current_line_number ($ parser )));
}

// Release the parser
Xml_parser_free ($ parser );

// Release the assistance element
Unset ($ elements ["current_element"]);
Unset ($ elements ["last_element"]);

// Sort by the number of elements
Uasort ($ elements, "my_sort ");

// Collect element information cyclically in $ elements
While (list ($ name, $ element) = each ($ elements ))
{
Print_box ("Element name", $ name );

Print_line ("Element count", $ element-> count );
Print_line ("Character count", $ element-> chars );

Printf ("\ n % 20s \ n", "* Parent elements ");

// Loop in the parent of the element and output the result
While (list ($ key, $ value) = each ($ element-> parents ))
{
Print_line ($ key, $ value );
}
If (count ($ element-> parents) = 0)
{
Printf ("% 35s \ n", "[root element]");
}

// Loop in the child of the element and output the result
Printf ("\ n % 20s \ n", "* Child elements ");
While (list ($ key, $ value) = each ($ element-> childs ))
{
Print_line ($ key, $ value );
}
If (count ($ element-> childs) = 0)
{
Printf ("% 35s \ n", "[no childs]");
}

$ Total_elements + = $ element-> count;
$ Total_chars + = $ element-> chars;
}

// Final result
Print_box ("Total elements", $ total_elements );
Print_box ("Total characters", $ total_chars );
?>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.