The structure and syntax of XML introductory fine solution

Source: Internet
Author: User
Tags format definition empty end format definition html tags first row version
xml| syntax Now let's use Notepad to create our XML file. First look at an XML file:

Example 1

〈?xml version= "1.0" encoding= "gb2312"?
Resources
Books
Name 〉xml Introductory Refinement 〈/name
Author John 〈/author
Price currency unit = "RMB" 〉20.00〈/price
〈/Books
Books
Name 〉xml Grammar 〈/name
!--The book is about to be published--〉
Author Dick 〈/author
Price currency unit = "RMB" 〉18.00〈/price
〈/Books
〈/Resources

This is a typical XML file, edited and saved as a file with an. xml suffix. We can divide this file into a file preface (Prolog) and a two large section of the file body. The first line in this file is the preamble to the file. The line is something that an XML file must declare, and it must also be in the first row of an XML file, which basically tells the XML parser how to work. Wherein, version is the standard version number that is used to indicate this XML file, and must have; encoding indicates the type of character used in this XML file and can be omitted, and when you omit this declaration, the following character code must be a Unicode character code (not recommended). Because we use the GB2312 character code in this example, encoding this statement cannot be omitted. There are also some statement statements in the preamble of the document, which we give in the following sections.

The rest of the file is part of the file body, and the content information for the XML file is stored here. As we can see, the body of the file is composed of the starting reference and the ending 〈/reference control tag, this is called the "root element" of the XML file; The book is the "child element" under the root element, and under the book there are the child elements of the name, author and price. The monetary unit is an attribute in the price element, and the renminbi is a property value.

!--The book is about to be published--〉 this sentence is the same as HTML, is the annotation, in the XML file, the annotation part is placed between "!--" and "--〉" the part of the tag.

As you can see, the XML file is fairly straightforward. Like HTML, an XML file is made up of a series of tags, but the markup in the XML file is our custom tag, with a clear meaning that we can explain the meaning of the content in the tag.

With an initial impression of the XML file, let's talk about the syntax of the XML file in detail. Before we talk about grammar, we have to understand an important concept, the XML parser (XML Parse).

1.XML Parser

The main function of the parser is to check the XML file for structural errors, peel the tags in the XML file, and read the correct content to the next application. XML is a markup language for structured file information, and there is a detailed rule in XML specification for how to tag a file, and the parser is the software written in accordance with these rules (written in Java). Like HTML, in a browser, you must have an HTML parser so that browsers can "read" various HTML markup pages and display them in front of us. If a browser's HTML parser does not read the tag, it will return the error message to us.

Since the HTML tags are actually quite confusing, there are a lot of nonstandard tags (some of the Web page with IE can be normal display, but not with Netscape Navigator), so from the outset, the designer of XML strictly defined the syntax and structure of XML, The XML file We write must follow these rules, or the XML parser will show you an error message without mercy.

There are two kinds of XML files, one is the well-formed XML file, the other is the validating XML file.

If an XML file satisfies some of the relevant rules in the XML specification and does not use a DTD (a file format definition-after detailed), it can be called a well-formed. And if an XML file is well-formed and correctly using the syntax in DTD,DTD is correct, then this file is validating. For two kinds of XML files, there are two kinds of XML parsers, one is the well-formed parser, the other is the validating parser. The validating parser is included in IE 5, and the validating parser can also be used to parse well-formed XML files.

Check to see if it satisfies the well-formed condition. We can open the first XML file that we just edited with IE 5 version of the browser.

You may want to ask why the display in the browser is the same as my source file? Yes, because for XML files, we pelochely Bibroni 龉 匦, and it's displayed in the form of CSS or XSL. Here, we do not define its CSS or XSL file for this XML file, so it appears in its original form. In fact, for electronic data interchange, you just need an XML file, and if you want to show it in some form, we have to edit the CSS or XSL file (this problem will be discussed later).

2.well-formed XML file

We know that XML must be well-formed in order to be parsed correctly by the parser and displayed in the browser. So what is a well-formed XML file? There are several guidelines that we must meet when creating XML files.

First, the first line of the XML file must be to declare that the file is an XML file and the XML specification version it uses. No other elements or annotations can be in front of the file.

Second, there is only one root element in the XML file. In our first example, the reference 〉...〈/reference is the root element of this XML file.

Third, the tags in the XML file must be properly closed, that is, in an XML file, the control tag must have an end tag corresponding to it. For example: the "name" tag must have a corresponding 〈/name of the end tag, unlike HTML, some tag end tag is optional. If you encounter tags from a single cell in an XML file, this is similar to the 〈img src=.....〉 in HTML when there is no end tag, XML calls it the "empty element", which must be written in this way: empty element name/〉, if the element contains attributes when the Write law is: "EMPTY element name Property name = "Property value"/〉.

Four, the mark must not cross. In the previous HTML file, you could write this:

〈b〉〈h〉xxxxxxx〈/b〉〈/h〉,〈b〉 and 〈h〉

There are overlapping areas between tags, and in XML it is strictly forbidden to have such staggered notation, and tokens must appear in a regular order.

The value of the attribute must be enclosed by the "" number. As in the first example, "1.0", "gb2312", "RMB". They are all enclosed in a "" number and cannot be omitted.

The control tag, instruction, and attribute names are case-sensitive in English. Unlike HTML, tags like 〈b〉 and 〈b〉 are the same in HTML, and in XML, tags like 〈name〉, 〈name〉, or 〈name〉 are different.

We know that in HTML files, if we want the browser to display exactly what we've entered, we can put it in the middle of a 〈pre〉〈/pre〉 or 〈xmp〉〈/xmp〉 tag. This is essential for us to create HTML-instructional Web pages because the source code for HTML is displayed in the Web page. In XML, to implement such a function, you must use a CDATA tag. The information in a CDATA tag is passed intact to the application by the parser and does not resolve any control tags in that segment of information. CDATA areas are by: "! [cdata[] is the start tag, with ">〉" as the closing tag. For example: Example 2 in the source code, in addition to "! [cdata[' and ' >〉 ' symbols, the remainder of the content parser will be handed back to the downstream application, even if the start and end blanks and newline characters in the CDATA area are also forwarded (note that CDATA is uppercase).

Example 2

〈! [cdata[flying xml〉〉〉〉〉,:-)
Oooo〈〈〈〈〈〈〈
>〉

The XML processing whitespace character is not the same as HTML. The HTML standard stipulates that, regardless of the number of blanks, it is treated as a blank, whereas in XML, the parser is faithfully handed over to the downstream application for the whitespace outside all tags. In this way, we sometimes have to discard the habit of writing HTML files, because the indentation of the space, the parser to deal with. Such as:

Author John 〈/author
And
Author
Tom
〈/author

The above content is different for the parser (the latter includes two newline tokens and the text indent symbol of "John" in addition to the John character in the author 〉〈/author). So the parser will have different processing results when it passes the information to the application after the tag is removed.

If we want to explicitly tell the XML program that the whitespace in the tag has a clear meaning, and do not remove it casually (as in some poems, the space has its specific meaning), you can add an XML built-in attribute--xml:space to the tag. such as (note the case of attribute names and values):

Poetry xml:space= "Preserver"
Motherland Ah! Motherland!
My motherland!
〈/poetry

In addition, in an XML file, if you want to use the special characters of table 1, you must replace them with the corresponding symbols.

Table 1

Special character substitution symbol
&& &
<
> >
" "
' '

Here's a summary: XML files that conform to the above requirements are well-formed XML files. This is the most basic requirement for writing XML files. You can see that the syntax rules for XML files are much stricter than HTML. Because of this strict rules, software engineers to write XML parser is much easier, unlike the writing of HTML language parser, must try to adapt to different Web pages, improve their browser adaptability. In fact, this is a good thing for us beginners. How to do, do not have to be like the original to doubt all kinds of HTML writing.

We see that in XML files, most of them are custom tags. But consider, if two companies in the same industry A and B to Exchange data with the XML file, a company to use the "price" tag to express their products price information, and B may use the "price" to express price information. If an XML application is to read the information in their respective XML file, if it only knows the price information in the price tag, then B Company's price information will not be read out, will produce errors. Obviously, for entities that want to use XML files to exchange information, they must have a convention-that is, what tags to write an XML file, what child elements can be included in the parent element, the order in which the elements appear, and how the attributes in the elements are defined. This allows them to exchange data with XML in order to be unimpeded. This Convention is referred to as a DTD (document Type definition, Documentation format definition). You can think of DTDs as templates for writing XML files. For XML data interchange between industries, there is a fixed DTD that will be much more convenient. For example, if the XML pages of the major electronic malls on the Internet follow the same DTD, then we can easily write an application based on the DTD to automatically grab what we are interested in online. In fact, there have been several well-defined DTDs, such as MathML, SMIL, and so on, as described above.

If an XML file is well-formed, and it is correct based on a DTD, then this XML file is called: Validating XML file. The corresponding parser is called: Validating Parser.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.