Introduction to XML structure and syntax

Source: Internet
Author: User
Tags format definition uppercase character xml parser xsl file
Now we can use notepad to create an XML file. Let's first look at an XML file: now we are using Notepad to create an XML file. First look at an XML file:

Example 1:

<〈? Xml version = "1.0" encoding = "gb2312 "? > <References> <Books> <name> XML getting started </name> <author> Zhang San </author> <price currency unit = "RMB"> 20.00 </price> 〉 </Books> <name> XML syntax </name> <〉〈! -- This book is coming soon --> <author> Li Si </author> <price currency unit = "RMB"> 18.00 </price> </Books> </reference materials> 〉

This is a typical XML file. after editing, it is saved as a file suffixed with. xml. This file can be divided into two major parts: the Prolog and the object body. The first line in this file is the preface of the file. This row is an XML file that must be declared and must be located in the first line of the XML file. it mainly tells the XML parser how to work. The version indicates the standard version number used by the XML file, which must be available. the encoding indicates the character type used in the XML file, which can be omitted, when you omit this statement, the subsequent encoding code must be a Unicode encoding code (we recommend that you do not omit it ). In this example, the GB2312 encoding code is used, so the encoding statement cannot be omitted. There are some statement statements in the preface of the file, which we will introduce later.

The rest of the file is the file subject, and the content of the XML file is stored here. We can see that the file body is composed of the starting reference and ending reference control mark, which is called the "root element" of the XML file "; books are sub-elements directly under the root element. books have sub-elements such as name, author, and price. The currency unit is an "attribute" in the "price" element, and "RMB" is "attribute value ".

<〈! -- This book will be published soon --> like HTML, it is a comment. in the XML file, the comment is placed in "<! The part between the -- Mark and the --> mark.

As you can see, the XML file is quite simple. Like HTML, XML files are also composed of a series of tags. However, tags in XML files are custom tags with clear meanings, we can describe the meaning of the content in the tag.

After getting a preliminary impression on the XML file, let's talk about the syntax of the XML file in detail. Before speaking the syntax, we must understand an important concept: XML Parse ).

1. XML parser

The main function of the parser is to check whether the XML file has a structure error, strip the mark from the XML file, read the correct content, and hand it to the next application for processing. XML is a markup language for structured file information. the XML specification provides a detailed rule for how to mark the structure of a file, the parser is the software written according to these rules (written in Java ). Like HTML, the browser must have an HTML parser so that the browser can "read" various webpages consisting of HTML tags and display them in front of us. If the HTML parser of the browser does not understand the mark, the error message will be returned to us.

Because the current HTML tag is actually quite messy, there are a lot of Nonstandard tags (some web pages can be normally displayed with IE, but not with Netscape Navigator), so from the very beginning, the XML designer strictly defines the XML syntax and structure, and the XML file we write must follow these rules, otherwise the XML parser will show you error messages without mercy.

There are two types of XML files: Well-Formed XML files and Validating XML files.

If an XML file meets some relevant rules in the XML specification and does not use DTD (file format definition-later description), this file can be called Well-Formed. If an XML file is Well-Formed and the DTD is used correctly, and the syntax in the DTD is correct, the file is Validating. There are two types of XML files, one is the Well-Formed parser and the other is the Validating parser. IE 5 contains the Validating parser, which can also be used to parse the Well-Formed XML file.

Check whether it meets the Well-Formed condition. You can open the first XML file you just edited in an IE 5 or later browser.

You may ask why the display in the browser is the same as that in my source file? That's right, because for XML files, we just need to render the contents, and its display form is handed over to CSS or XSL. Here, we have not defined its CSS or XSL file for this XML file, so it is displayed in the original format. In fact, for electronic data exchange, you only need an XML file. if you want to display it in some form, we must edit the CSS or XSL file (this issue will be discussed later ).

2. XML file of Well-Formed

We know that XML must be Well-Formed to be correctly parsed by the parser and displayed in the browser. So what is the XML file of Well-Formed? There are several principles below. we must satisfy them when creating XML files.

1. the first line of the XML file must be to declare that the file is an XML file and the XML Standard version it uses. There cannot be other elements or comments before the file.

2. there is only one root element in the XML file. In our first example, <references>... </references> is the root element of the XML file.

3. the tag in the XML file must be properly closed, that is, in the XML file, the control tag must have a corresponding end tag. For example, the <name> tag must have a corresponding </name> end tag. Unlike HTML, the end tag of some tags can be unavailable. If you encounter a tag of a self-contained unit in an XML file, it is similar to when there is no ending mark, XML calls it "null element", which must be written as follows: <empty element name/> /〉, if an element contains an attribute, the write rule is: <empty element name property name = "attribute value"/> "/〉.

4. do not cross tags. In the previous HTML file, you can write as follows:

<B> <H> XXXXXXX </B> </H>, <B>, and <H> 〉

There are overlapping areas between tags, and in XML, the format of such a tag is strictly prohibited. Tags must appear in regular order.

5. the attribute value must be included with the "" sign. For example, "1.0", "gb2312", and "RMB" in the first example ". They are all enclosed by "" and cannot be missed.

6. Control tags, commands, and attribute names must be case sensitive. Unlike HTML, in HTML, a tag similar to <B> and <B> has the same meaning. in XML, tags like <name>, <NAME>, or <Name> are different.

7. we know that in an HTML file, if we want the browser to display the entered content intact, you can place these items in the middle of the <pre> </pre> or <xmp> </xmp> mark. This is essential for creating HTML teaching web pages, because the source code of HTML must be displayed on the web pages. To implement such a function in XML, you must use the CDATA tag. The information in the CDATA tag is transmitted to the application by the parser without parsing any control flag in the segment. The CDATA region is composed of: "<! [CDATA ["indicates the start mark, and"> "indicates the end mark. For example, the source code in example 2, except "<! [CDATA ["and"> ", the rest of the content parser will be left intact to the downstream application, even if the beginning and end of the CDATA area are blank and line breaks, are also transferred (note that CDATA is an uppercase character ).

Example 2:

<〈! [CDATA [flying xml >>>, :-) oooo <<<<<>>> 〉

8. the blank characters in XML processing are different from those in HTML. According to the HTML standard, no matter how many white spaces there are, they are treated as a white space. in XML, all white spaces other than the mark are specified, the parser must be faithfully handed over to downstream applications for processing. In this way, we sometimes have to discard the habit of shrinking HTML files because the parser also needs to process the space. For example:

<Author> Zhang San </author> and <author> Zhang San </author> 〉

The above content is different for the parser (except for the three characters in the latter, it also includes two line breaks and the text escape characters before "zhang san ). Therefore, the parser will have different processing results after passing the information to the application after removing the mark.

If we want to clearly tell the XML program that the blank in the mark has a clear meaning, do not remove it (for example, in some poems, spaces have their specific meanings ), you can add an XML built-in attribute-xml: space in the tag. For example (note the case sensitivity of attribute names and values ):

<Poetry xml: space = "preserver"> motherland! Motherland! My Motherland! </Poetry> 〉

In addition, if you want to use special characters in Table 1 in an XML file, you must use the corresponding symbol instead.

Table 1:

Special character replacement symbol & <strong> ""''

Summary:

The XML file that meets the preceding requirements is the XML file Well-Formed. This is the most basic requirement for compiling XML files. The syntax of the XML file is much stricter than that of the HTML file. Due to such strict regulations, it is much easier for software engineers to write XML parsers. Unlike the HTML parsers, they must try their best to adapt to different web pages, improve the adaptability of your browser. In fact, this is also a good thing for beginners. You don't have to worry about the way HTML is written.

We can see that most of the XML files use custom tags. However, if two Companies A and B in the same industry use XML files to exchange data with each other, Company A uses the price tag to indicate the price information of their products, company B may use the price to indicate the price information. If an XML application reads the information in their respective XML files, if it only knows the price information in the <price> Mark, if the price information of Company B cannot be obtained, errors will occur. Obviously, for entities that want to use XML files to exchange information, there must be a convention between them-the tags that can be used to write XML files and the child elements that can be included in the parent element, the order in which each element appears, and how the attributes in the element are defined. In this way, they can communicate smoothly only when using XML to exchange data. This convention is called DTD (Document Type Definition, Document format Definition ). You can regard DTD as a template for compiling XML files. For XML data exchange between the same industry, it is much easier to have a fixed DTD. For example, if the XML Web pages of major e-stores on the internet follow the same DTD, we can easily compile an application based on this DTD, go online and automatically capture what we are interested in. In fact, there are already several defined DTD, such as MathML and SMIL mentioned above.

If an XML file is Well-Formed and is correctly created based on a DTD, the XML file is called Validating XML. The corresponding Parser is called Validating Parser.

The above is a detailed introduction to the XML structure and syntax. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.