Php xml analysis functions I must admit that I like computer standards first. If everyone complies with industry standards, the Internet will be a better media. Standardized data exchange formats can make open and platform-independent computing modes feasible. This is why I am a fan of XML. Lucky SyntaxHighlighter. all ();
Php xml analysis functions I must admit that I like computer standards first. If everyone complies with industry standards, the Internet will be a better media. Standardized data exchange formats can make open and platform-independent computing modes feasible. This is why I am a fan of XML. Fortunately, my favorite scripting language not only supports XML, but also keeps increasing support for it. PHP allows me to quickly publish XML documents to the Internet, collect XML document statistics, and convert XML documents to other formats. For example, I often use PHP's XML processing capability to manage articles and books written in XML. In this article, I will discuss any use of PHP's built-in Expat parser to process XML documents. The example shows how to handle Expat. At the same time, the example can tell you how to: build your own processing functions to convert the XML document into your own PHP Data structure introduction Expat XML parser, also known as the XML processor, allows the program to access the structure and content of the XML document. Expat is an XML parser for PHP scripting. It is also used in other projects, such as Mozilla, Apache, and Perl. What is an event-based parser? Two basic types of XML parser: tree-based parser: converts an XML document into a tree structure. This type of parser analyzes the entire article and provides an API to access each element of the generated tree. Its general standard is DOM (document object mode ). Event-based parser: treats XML documents as a series of events. When a special event occurs, the parser calls the functions provided by the developer for processing. The event-based parser has a centralized data view of the XML document, that is, it is concentrated in the data part of the XML document, rather than its structure. These parsers process documents from start to end and report events, such as the beginning of an element, the end of an element, and the start of feature data, to the application through the callback function. The following is an XML document example of "Hello-World: Hello World The event-based parser reports three events: the start element: the start of the greeting CDATA item. The value is: Hello World end element: greeting, unlike the tree-based parser, the event-based parser does not generate the structure of the description document. In the CDATA item, the event-based parser does not give you information about the parent element greeting. However, it provides a more underlying access, which enables better resource utilization and faster access. In this way, there is no need to put the entire document into the memory. In fact, the entire document can be larger than the actual memory value. Expat is such an event-based parser. Of course, if you use Expat, it can also generate a full native tree structure in PHP if necessary. The preceding Hello-World example contains the complete XML format. However, it is invalid because no DTD (Document Type Definition) is associated with it and no DTD is embedded. For Expat, there is no difference: Expat is a parser that does not check validity, so ignore any DTD associated with the document. However, it should be noted that the document still needs the complete format, otherwise Expat (the same as other XML-compliant parser) will stop with the error message. As a parser that does not check the validity, Exapt is very suitable for Internet applications because of its fast and lightweight nature. Compiling Expat can be compiled into PHP3.0.6 (or later. Since Apache1.3.9, Expat is already part of Apache. In Unix systems, you can use the "-with-xml" option to configure PHP and compile it into PHP. If you compile PHP into an Apache module, Expat uses it as part of Apache by default. In Windows, you must load the XML dynamic connection Library. XML Example: One way for XMLstats to understand the Expat function is to use the example. The example we will discuss is to use Expat to collect statistics for XML documents. For each element in the document, the following information will be output: the number of times this element is used in the document. note: for demonstration, we use PHP to generate a structure to save the parent and child elements of the element. the function used to generate an XML parser instance is xml_parser_create (). This instance will be used for all future functions. This idea is very similar to the connection mark of MySQL functions in PHP. Before parsing a document, the event-based parser usually requires you to register a callback function-called when a specific event occurs. Expat has no exception event. it defines the following seven possible events: the start and end character data of the xml_set_element_handler () element of the XML parsing function of the object xml_set_character_data_handler () start of character data external entity begin () external entity appears external entity not resolved external entity failed () external entity appears processing instruction xml_set_processing_instruction_handler () display method declaration xml_set_notation_decl_handler () when the default xml_set_default_handler () occurs in the statement, all the callback functions that do not specify the handler must use the parser instance as their first parameter (in addition Other parameters ). For the sample script at the end of this article. Note that it uses both the element processing function and the character data processing function. The element callback handler is registered through xml_set_element_handler. This function requires three parameters: the name of the callback function of the parser instance processing start element processing end element processing the name of the callback function when parsing the XML document, the callback function must exist. They must be defined as consistent with the prototype described in the PHP Manual. For example, Expat passes three parameters to the processing function of the start element. In the script example, it is defined as follows: function start_element ($ parser, $ name, $ attrs) the first parameter is the parser identifier, and the second parameter is the name of the start element, the third parameter is an array containing all attributes and values of the element. Once you start parsing the XML document, Expat will call your start_element () function and pass the parameters when encountering the starting element. The Case Folding option of XML is disabled using the xml_parser_set_option () function. This option is enabled by default, so that the element name passed to the handler function is automatically converted to uppercase. However, XML is case sensitive (so it is important to collect XML documents ). For our example, the case folding option must be disabled. After all the preparations are completed, the script can finally parse the XML document: Xml_parse_from_file (), a custom function, and open the file specified in the parameter, parsing xml_parse () with a size of 4 kB is the same as xml_parse_from_file (). If an error occurs, that is, the XML document is incorrectly formatted, false is returned. You can use the xml_get_error_code () function to get the last wrong numeric code. Pass this numeric code to the xml_error_string () function to get the incorrect text information. Output the current number of lines in XML to make debugging easier. Call the callback function during parsing. Description Document structure what Expat needs to emphasize when parsing a document is: how to maintain the basic description of the document structure? As mentioned above, the event-based parser itself does not generate any structure information. However, the tag structure is an important feature of XML. For example, element sequence It means different from <figure> <title>. That is to say, any author will tell you that the title and the graph name are irrelevant, although they all use the term "title. Therefore, in order to more effectively use the event-based parser to process XML, you must use your own stack (stacks) or list (lists) to maintain the structure information of the document. To generate an image of the document structure, the script must at least know the parent element of the current element. The Exapt API cannot be implemented. it only reports the events of the current element without any information on the frontend and backend relationships. Therefore, you need to build your own stack structure. The script example uses the stack structure of FILO. Through an array, the stack will save all the starting elements. For the start element processing function, the current element will be pushed to the top of the stack by the array_push () function. Correspondingly, the end element handler removes the top element through array_pop. For the sequence <book> <title> Stack filling: start element book: assign "book" to the first element of the stack ($ stack [0]). Start element title: assign "title" to the top of the stack ($ stack [1]). End element title: remove the top element from the stack ($ stack [1]). End element title: remove the top element from the stack ($ stack [0]). PHP3.0 uses a $ depth variable to manually control the nesting of elements to implement an example. This makes the script look complicated. PHP4.0 uses the array_pop () and array_push () functions to make the script look more concise. To collect information about each element, the script must remember the events of each element. You can use a global array variable $ elements to save all the different elements in the document. An array project is an element class instance with four attributes (class variables) $ count-number of times this element has been found in the document $ chars-Number of bytes of the character event in the element $ parents-parent element $ childs-child element as you can see, it is easy to store class instances in arrays. Note: One feature of PHP is that you can traverse the entire class structure through the while (list () = each () loop, just as you traverse the entire array. All class variables (and method names when PHP3.0 is used) are output as strings. When an element is found, we need to add its corresponding counter to track how many times it appears in the document. Add one to the count element in the corresponding $ elements item. We also need to let the parent element know that the current element is its child element. Therefore, the name of the current element will be added to the project of the $ childs array of the parent element. Finally, the current element should remember who is its parent element. Therefore, the parent element is added to the project of the current element $ parents array. The remaining code for displaying statistics is displayed cyclically in the $ elements array and its subarrays. This is the simplest nested loop. although the correct results are output, the code is not concise and has no special skills. it is just a loop that you may use to complete your work every day. The script example is designed to be called through the command line in CGI mode of PHP. Therefore, the output format of statistical results is text. If you want to apply scripts to the Internet, you need to modify the output function to generate HTML format. To sum up, Exapt is the XML parser of PHP. As an event-based parser, it does not generate the structure description of the document. However, by providing underlying access, you can make better use of resources and access faster. As a parser that does not check the validity, Expat ignores the DTD connected to the XML document. However, if the document format is incomplete, it will stop with the error message. Provides event processing functions to process documents and establish their own event structures, such as stacks and trees, to obtain the advantages of XML structure information tagging. Every day, new XML programs are available, and PHP's support for XML is also increasing (for example, XML parser LibXML based on DOM is added ). With PHP and Expat, you can prepare for the upcoming effective, open, and platform-independent standards. Example