Summary
XML stands for Extensible Markup Language (abbreviation of eXtensible Markup Language ). XML is a set of rules for defining semantic tags that divide documents into many parts and identify them. It is also a meta-markup language, which defines the syntax language used to define Other semantic and structured Markup languages related to specific fields. XML is the most popular technology today. PHP also has the ability to analyze XML documents. Next we will discuss the XML application in PHP.
XML Overview
When talking about XML (eXtended Markup Language: Extensible Markup Language), let's take a look at the HTML code:
<Html>
<Title> XML </title>
<Body>
<P> <center> <font color = "red"> TEXT </font> </center> </p>
<A href = "www.domain.com"> </a>
</Body>
</Html>
The above Code conforms to XML rules in structure. XML can be understood as a tree structure type containing data:
1. Use the same case when referencing the same element. For example, <center> </Center> does not comply with the specified conditions.
2. Any attribute value (such as href = "???? ") It must be caused by" ". For example, <a href = www.yahoo.com> is incorrect.
3. All elements must be composed of open <and close> labels. The elements should be like <body> </body> or empty elements . If "/>" missing "/" at the end is the error code
4. All elements must be nested with each other, just like a program loop, and all elements must be nested in the root element, for example, all the content of the above Code is nested in 5. The element name (the above body, such as a p img) should start with a letter.
How to apply the php xml Parser Expat?
Expat is an XML Parser (also called an XML processor) in PHP scripting language. It allows programs to access the structure and content of XML documents. It is an event-based parser. The XML Parser has two basic types:
Tree-based Parser: converts an XML document into a tree structure. This type of parser analyzes the entire article and provides an API to access each element of the generated tree. Its general standard is DOM (Document Object Mode ).
Event-based Parser: treats XML documents as a series of events. When a special event occurs, the parser calls the functions provided by the developer for processing. The event-based parser has a centralized data view of the XML document, that is, it is concentrated in the data part of the XML document, rather than its structure. These parsers process documents from start to end and report events, such as the beginning of an element, the end of an element, and the start of feature data, to the application through the callback function.
The following is an XML document example of "Hello-World:
<Greeting>
Hello World
</Greeting>
The event-based parser reports three events:
Start Element: greeting
Start of the CDATA entry; Value: Hello World
End Element: greeting
The event-based parser does not generate the structure of the description document. Of course, if you use Expat, it can also generate a full native tree structure in PHP if necessary. In the CDATA item, the event-based parser does not obtain the greeting information of the parent element. However, it provides a more underlying access, which enables better resource utilization and faster access. In this way, there is no need to put the entire document into the memory. In fact, the entire document can be larger than the actual memory value.
Although the preceding Hello-World example includes the complete XML format, it is invalid because no DTD (Document Type Definition) is associated with it, and no DTD is embedded. However, Expat is a parser that does not check validity, so ignore any DTD associated with the document. It should be noted that the document still needs the complete format, otherwise Expat (same as other XML-compliant parser) will stop with the error message.
Compile Expat
Expat can be compiled into PHP3.0.6 (or later. Since Apache1.3.22, Expat is already part of Apache. In Unix systems, you can use the-with-xml option to configure PHP to compile it into PHP.
If PHP is compiled into an Apache module, Expat is used as part of Apache by default. In Windows, you must load the XML dynamic Connection Library.
XML example: XMLstats
The example we will discuss is to use Expat to collect statistics for XML documents.
For each element in the document, the following information is output:
* Number of times this element is used in the document
* Number of Characters in the element
* Parent element of an element
* Child element of an element
Note: For demonstration, we use PHP to generate a structure to save the parent and child elements of an element.
What functions are used to generate an XML Parser instance?
The function used to generate an XML Parser instance is xml_parser_create (). This instance will be used for all future functions. This idea is very similar to the connection mark of MySQL functions in PHP. Before parsing a document, an event-based parser usually requires registration of a callback function-called when a specific event occurs. Expat has no exception event. It defines the following seven possible events:
Object XML parsing function description
Start and end of element xml_set_element_handler ()
Character data xml_set_character_data_handler () Start of character data
External entity xml_set_external_entity_ref_handler () external entity appears
External entity xml_set_unparsed_entity_decl_handler () not resolved external entity appears
Processing Command xml_set_processing_instruction_handler () Processing Command appears
The emergence of the xml_set_notation_decl_handler () method declaration
By default, xml_set_default_handler () is used for events that do not specify a processing function.
All callback functions must take the parser instance as its first parameter (and other parameters ).
For the sample script at the end of this article, you must note that it uses both the element processing function and the character data processing function. The element callback handler is registered through xml_set_element_handler.
This function requires three parameters:
Parser instance
Name of the callback function for processing the Start Element
Name of the callback function for processing the End Element
When parsing XML documents, the callback function must exist. They must be defined as consistent with the prototype described in the PHP manual.
For example, Expat passes three parameters to the processing function of the Start Element. In the script example, it is defined as follows:
Function start_element ($ parser, $ name, $ attrs)
$ Parser is the parser identifier, $ name is the name of the Start element, and $ attrs is an array containing all attributes and values of the element.
Once the XML document is parsed, Expat will call the start_element () function and pass the parameters in the past when the start element is encountered.
Case Folding options of XML
Use the xml_parser_set_option () function to disable the Case folding option. This option is enabled by default, so that the element name passed to the handler function is automatically converted to uppercase. However, XML is case sensitive (so it is important to collect XML documents ). For our example, The case folding option must be disabled.
How to parse the document?
After completing all the preparations, the script can finally parse the XML document:
Xml_parse_from_file (), a custom function that opens the file specified in the parameter and parses it in 4 kb size.
Xml_parse (), which is the same as xml_parse_from_file (). In case of an error, that is, if the XML file format is incomplete, false is returned.
We can use the xml_get_error_code () function to get the last wrong numeric code. Pass this numeric code to the xml_error_string () function to get the incorrect text information. Output the current number of lines in XML to make debugging easier.
When parsing a document, the question that Expat needs to emphasize is: how to maintain the basic description of the document structure?
As mentioned above, the event-based parser itself does not generate any structure information. However, the tag structure is an important feature of XML. For example, the element sequence <book> <title> indicates a different meaning than <figure> <title>. There is no relationship between the title and the graph name, although they all use the term "title. Therefore, to use the event-based parser to process XML more effectively, you must use your own stack (stacks) or list (lists) to maintain the structure information of the document.
To generate an image of the document structure, the script must at least know the parent element of the current element. The Exapt API cannot be implemented. It only reports the events of the current element without any information on the frontend and backend relationships. Therefore, you need to build your own stack structure.
The script example uses the stack structure of FILO. Through an array, the stack will save all the starting elements. For the start element processing function, the current element will be pushed to the top of the stack by the array_push () function. Correspondingly, the End Element handler removes the top element through array_pop.
For the sequence <book> <title> </book>, stack filling is as follows:
Start Element book: Assign "book" to the first element of the stack ($ stack [0]).
Start Element title: Assign "title" to the top of the stack ($ stack [1]).
End Element title: remove the top element from the stack ($ stack [1]).
End Element title: Move the most