XML document for the 2nd Tony Zhang of the learning process of the "XML Getting Started classic"

Source: Internet
Author: User
Tags cdata html tags naming convention xml parser

The XML parser helps the application parse the XML document and provides the application with the information it needs. The XML parser reads each character in the XML document and determines which characters are the label of the document, which is the data, and makes some other necessary processing of the XML before the application processes the data.

All the tags in the XML document make up the markup language of the XML.

The tag usage of XML is the same as that of HTML.

The text content between the first label and the label is collectively referred to as an element.

The text between the tags is called the element content, and the concept has a special term that can parse the character data (PCDATA). These terms are inherited by XML from SGML.

Besides the traditional <XXX></XXX>, the label can also be written in <XXX/>. This is a self-closing label.

Rules for XML and HTML definition elements:

  • HTML does not have to have a start tag like XML to have an end tag.
  • The HTML does not have to be case-sensitive (label) like XML.

The above two differences make the parser of HTML very difficult to write, in order to consider these factors developers must increase the code, so that the parser will become more and more large, followed by the difficulty of debugging up.

Different browsers use different parsing methods, so non-standardized parsing methods can cause incompatibilities between them.

Naming rules for elements:

      • The initial letter must be an English letter or hyphen.
      • There must be no spaces in the name.
      • Cannot have a colon.
      • The start character of a name cannot be XML, whether uppercase or lowercase.
      • There can be spaces between the element name and Terminator >, but not with <.

About whitespace characters in Pcdata:

HTML will delete more than one consecutive whitespace character in the content of the element, leaving only one in the display. To display as the original document, you need to include some special HTML tags. XML, by default, preserves these whitespace characters.

However, if you display an XML document in IE, the whitespace in the XML will still be deleted as if it were HTML. This is because IE does not directly display an XML document, which uses XSL to transform XML into HTML and then display it.

Additional whitespace characters:

White space characters in XML exist in addition to the contents of the element, as well as the line terminator after the start tag and the white space character before the start tag. These whitespace characters simply make the document more readable, and they are not part of the data.

In many cases, the application does not have to consider the existence of a space character at all, and the application only needs to ask the parser for the data contained in the specified vicious, without querying the pcdata in the extra tags at all.

Property:

The element's name cannot contain spaces, which is a choice for adding attributes to the element. properties can only be inserted into the start tag instead of the end tag, and each property must have a value that is enclosed in "or". However, because in the attribute value can also use quotation marks, so when adding quotation marks do not let the parser misunderstanding (for example, "xx ' xx" or ' XXX "xx ' can be, but" xx "XX" will not be ").

The naming convention for element names is the same as the naming convention for property names. In addition, you cannot have the same attribute name in the same element.

The order of attributes in the same element is not required, but the parser processes the elements in the order of the elements in the document.

The XML parser will preprocess the properties before passing them to the application. The most important of these is to delete the line terminator and replace it with a space.

in practice, there is no real metadata-all information is data for the application. in HTML, the information is divided into two categories: one is for people to look at the data, and the other is the data that the browser uses to format the previous type of data. We can refer to the latter type of data as metadata, but for browsers or developers, metadata is also data.

Properties take up little space, but today's compression technology is far more effective than using a large number of properties to reduce space, rather than using properties superfluous. In addition, the extensive use of attributes loses many of the benefits of the XML language, such as readability and descriptive label names, and so on.

The pcdata of an element can be flexible, and the value of the attribute does not have the benefit, especially if its line terminator is converted to a space character by the parser.

Comments:

Start with <!--to end.

Can not be inserted into the label.

There is no-in the comment content.

Comments are not passed to the application, and if you want to pass them to the application, you must put the relevant content in the element or attribute.

Empty element:

When there is no pcdata in the element, you can also use self-enclosing tags in addition to the traditional start and end tags.

Self-closing tags can not have spaces before >, they must be tightly matched.

Self-enclosing labels can also insert properties.

XML declaration statements:

    • Its function is to declare the file as an XML document for other operating system platforms.
    • Starting from <?xml, with? > for the end.
    • Declaration statements must have a version property, while encoding and standalone are optional.
    • The order of the attributes is performed according to version, encoding, standalone.
    • The value of version must be 1.0 or 1.1.
    • The declaration statement must be at the beginning of the document, and it is best not to leave spaces in front of <.

The version property describes which versions of XML the document follows, whereas the current XML specification is only 1.0 and 1.1. The difference is not so much, but when you name the elements, they take a different approach to some Unicode characters, and they have different rules for line terminators in some systems.

About the Encoding property:

A character code is a one-way correspondence between a character set and a set of binary digits that represent these characters. Character encoding refers to the representation of a number in a character code.

There are two options for ASCII code: 7-bit and 8-bit. 7-bit is a more general standard for text, while 8 bits have different encodings, but their first 128 characters are identical to the 7-bit ASCII character code.

The Unicode code is invented for the character limitations that can be expressed in ASCII code, and it is redesigned to encompass all the characters in all human languages. It mainly has two kinds of schemes: UTF-8 and UTF-16. The latter uses two bytes to represent one character.

UTF-8 and UTF-16:

The coding of UTF-8 is quite ingenious. It uses a byte to represent 7-bit ASCII characters, but when representing other characters, it has to be represented by more than two bytes, so 7-bit ASCII is only a subset of UTF-8, and using UTF-8 to produce smaller files when used only in English in the document. But for other languages, using UTF-16 in this case is better because UTF-16 always uses two bytes to represent one character, while UTF-8 may use 3 or more.

The Encoding property tells the parser which encoding to use for our document, and then the parser reads the document according to the correct encoding method and converts them to Unicode characters. If there is no encoding property, the parser uses the default UTF-8 or UTF-16 to parse.

Standalone properties:

    • Yes indicates that the document can exist completely independently, and does not depend on other files.
    • No indicates that the document may depend on an external DTD file.

The full name of this property is the standalone document declaration (SDD). The XML recommendation standard does not require the parser to do any processing of the SDD. For the parser, it just plays a cue.

Processing instructions:

Embed application-specific directives in documents that control the processing of documents. They are not part of the document data, but are passed to the application.

In most cases, the application does not need the information in the XML declaration statements at all, they are only useful to the parser. Even the encoding property is of little use to the application, because when the parser passes the document to the application, it is converted to Unicode format, regardless of the original encoding of the document.

Illegal pcdata characters:

Pcdata cannot have < and & If you want to use it, you have to add additional references and take advantage of the escape character.

With "&lt;" and "&amp;" You can represent < and & respectively, and of course, if you want to represent > you can use &gt;.

The use of escape characters can also be extended to & #nnn这样的字符串, where nnn is a Unicode code for a character (nnn is a decimal number and xnnn is a hexadecimal number).

When you want to use a large number of escape characters, the readability of the document is poorer. CDATA can solve this problem. They tell the parser not to parse it, so that the content inside it stays the same. When parsing, the parser ignores the text and passes it directly to the application.

CDATA can contain whitespace characters.

Error in XML:

The XML recommendation standard specifies how the parser handles errors in the document, in addition to how the parser reads the information from the XML document.

XML defines two types of errors: generic and fatal.

  • A generic error means that only the rules in the specification are violated, but the results are not deterministic. Allow the XML processor to return to normal from this type of error and continue processing other content.
  • Such errors are more severe than those of the previous class. When the parser encounters one such error, it cannot continue to run normally (although it may keep working on the XML document and further identify other errors in the document). This error has resulted in an XML document not being a good-structured XML document.


XML document for the 2nd Tony Zhang of the learning process of the "XML Getting Started classic"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.