XML operations for PHP extension (2) -- XML parser installation and overview

Source: Internet
Author: User
Tags php windows processing instruction
I. Overview and installation of XML (Extensible Markup Language, eXtensibleMarkupLanguage) is a data format used for structured document interaction on the Internet. It is a standard defined by the Internet Association (W3C. XML and its related...



I. Overview and installation

XML (eXtensible Markup Language, eXtensible Markup Language) is a data format used for structured document interaction on the Internet. It is a standard defined by the Internet Association (W3C. Information about XML and related technologies can be accessed #.

This PHP extension supports the expat written by James Clark using PHP. This toolkit parses (but cannot validate) XML documents. It supports three character encodings provided by PHP: US-ASCII, ISO-8859-1, and UTF-8. UTF-16 is not supported.

This extension creates an XML parser and defines different XML Events.Handler). Each XML parser has a few adjustable parameters.

This extension requires libxml PHP extension. This indicates that you need to use-- Enable-libxml, Though this is done implicitly because libxml is enabled by default.

By default, this extension uses expat compat layer. You can also use expat, which is located #. The Makefile in the expat library does not build the warehouse file by default. you can use the following build rules to build the database:


libexpat.a: $(OBJS)    ar -rc $@ $(OBJS)    ranlib $@

The source code RPM installation package of expat can be found in.

This extension is enabled by default and can be disabled by the following options during compilation:-- Disable-xml

These functions are valid by default and use the bundled expat library. You can use parameters-- Disable-xmlTo block XML support. If you compile PHP into a module of Apache 1.3.9 or later, PHP automatically uses the expat library bound with Apache. If you do not want to use the bound expat library, use the parameter when running the PHP configure configuration script.-- With-expat-dir = DIR, DIR should point to the root directory installed by expat.

PHP Windows has built-in support for this extension. You do not need to load additional extensions to use these functions.

II. event processor

The XML event processor is defined as follows:

Supported XML processors
PHP processor functions Event description
Xml_set_element_handler () Element events are triggered when the XML parser encounters a start or end tag. The start tag and end tag have different processors.
Xml_set_character_data_handler () The character data field refers to all unlabeled content in the XML document, including spaces between tags. Note: The XML parser does not add or delete any spaces. the application (you) determines whether spaces are meaningful.
Xml_set_processing_instruction_handler () PHP programmers must be familiar with processing commands (PI ). Is the processing instruction, where php is called the "processing instruction object ". Except that all processing instruction objects starting with "XML" are reserved by the system, other processing functions are specified by the application.
Xml_set_default_handler () If no other processing functions are executed, the default processing functions are executed. You can obtain information such as XML and document type declarations in the default processing functions.
Xml_set_unparsed_entity_decl_handler () The unresolved object declaration (NDATA) calls this processing function.
Xml_set_notation_decl_handler () The symbolic declaration calls this processing function.
Xml_set_external_entity_ref_handler () This processing function is called when the XML parser finds a reference to a common external entity that has been parsed. For example, reference a file or URL. For examples, see XML external entity routines.
III. capital conversion

The element processing function converts an element name to case-folded (uppercase letter. Case-folding is defined as a string operation to replace non-capital letters with the corresponding capital letters ". In other words, in XML, case-folding is converted to uppercase.

By default, all element names that pass the processing function are converted to uppercase letters. Each XML parser can query and control this function through the xml_parser_get_option () and xml_parser_set_option () functions.

IV. Error code

The following constants are XML-related error codes (returned values of the xml_parse () function ):

  • XML_ERROR_NONE

  • XML_ERROR_NO_MEMORY

  • XML_ERROR_SYNTAX

  • XML_ERROR_NO_ELEMENTS

  • XML_ERROR_INVALID_TOKEN

  • XML_ERROR_UNCLOSED_TOKEN

  • XML_ERROR_PARTIAL_CHAR

  • XML_ERROR_TAG_MISMATCH

  • XML_ERROR_DUPLICATE_ATTRIBUTE

  • XML_ERROR_JUNK_AFTER_DOC_ELEMENT

  • XML_ERROR_PARAM_ENTITY_REF

  • XML_ERROR_UNDEFINED_ENTITY

  • XML_ERROR_RECURSIVE_ENTITY_REF

  • XML_ERROR_ASYNC_ENTITY

  • XML_ERROR_BAD_CHAR_REF

  • XML_ERROR_BINARY_ENTITY_REF

  • XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF

  • XML_ERROR_MISPLACED_XML_PI

  • XML_ERROR_UNKNOWN_ENCODING

  • XML_ERROR_INCORRECT_ENCODING

  • XML_ERROR_UNCLOSED_CDATA_SECTION

  • XML_ERROR_EXTERNAL_ENTITY_HANDLING

V. Character encoding

Php xml extensions support Unicode character sets through several different character encodings. There are two types of character encoding, the original encoding and the target encoding. in the internal representation of PHP, the document is always encoded using UTF-8.

After the XML is parsed, the original encoding is complete. When creating an XML parser, you can specify the original encoding (this encoding cannot be modified in the subsequent lifecycle of the XML parser ). Supported primitive encodings include ISO-8859-1, US-ASCII, and UTF-8. The first two are single-byte encodings, that is, each character is represented as a byte. The UTF-8 encodes a string of up to 21 bits into 1 to 4 bytes. The default original encoding used in PHP is ISO-8859-1.

When PHP passes data to the XML processing function, the target encoding is complete. When creating an XML processor, the target encoding is set to be the same as the original encoding, but can be modified at will. The target encoding affects the character data, tag names, and processing command targets.

If the XML parser encounters characters out of the original encoding range, an error is returned.

If PHP encounters a character that cannot be expressed by the specified target encoding in the parsed XML document, the character will be "degraded ". Generally, those characters are replaced with question marks (?).

The above is the XML operation for PHP extension (2) -- content of the XML parser installation and overview. For more information, see PHP Chinese website (www.php1.cn )!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.