What is XML

Source: Internet
Author: User
Tags processing instruction xml parser

XML is the abbreviation of EXtensible Markup Language. XML is a simple data storage Language that uses a series of simple tags to describe data. These tags can be conveniently created, although XML occupies more space than binary data, XML is extremely easy to master and use.

Different from databases such as Access, Oracle, and SQL Server, XML provides powerful data storage and analysis capabilities, such as data indexing, sorting, searching, and consistency, XML only displays data. In fact, the biggest difference between XML and other data forms is that it is extremely simple. This is an advantage that looks a little refined, but it makes XML different.

The simplicity of XML makes it easy to read and write data in any application, which makes XML quickly the only public language for data exchange. Although different application software also supports other data exchange formats, but soon they will support XML, which means that the program can be more easily combined with information generated on Windows, Mac OS, Linux, and other platforms, then it is easy to load XML data into the program, analyze it, and output the results in XML format.

The predecessor of XML isSGML(The Standard Generalized Markup Language), developed since IBM in 1960sGML(Generalized Markup Language)

SameHTMLSimilarly, XML (Extensible Markup Language) is a subset of the General Markup Language Standard (SGML), which is a standard for describing the data content and structure on the network. Even so, XML is not like HTML. HTML only provides a general method for displaying information on pages (without context-related and dynamic functions), and XML gives context-related functions to Data, it inherits most of SGML functions, but uses less complex technologies ..

To make SGML more user-friendly, XML redefined some of sgml's internal values and parameters, removing a large number of rarely used functions. These complex features complicate SGML when designing websites. XML retains the structured functions of SGML, so that website designers can define their own document types. XML also introduces a new document type, so that developers do not need to define document types.

Because XML is developed by W3C, XML standardization is undertaken by W3C XML working groups composed of experts from various places and industries, they emailed their opinions on the XML standard and put forward their opinions (Www.w3.org/TR/WD-xml). Because XML is a public format (it does not belong to any company), you don't have to worry that XML technology will become a profitable tool for a few companies. XML is not a language that is attached to a specific browser.

XML (Extensible Markup Language) is derived from an older language called SGML (Standard Generic Markup Language. The main purpose of SGML is to define the syntax that uses tags to represent the markup language of data.

A tag consists of text enclosed by a smaller sign (<) and a greater sign (>), for example, <tag>. The start tag indicates the start of a specific region, for example, <start>; the end tag defines the end of a region, except for a slash (/) followed by a smaller sign, it is basically the same as the starting label, for example, </end>. SGML also defines the attributes of tags, which are the values defined between minor signs and greater than signs, such as the src feature in . If you think it looks familiar, you should know that the most famous implementation of the sgml-based language is the original HTML.

SGML is often used to define Document Type Definitions (DTD) for HTML. It is also often used to write DTD for XML. The problem with SGML is that it allows for some strange syntax, which makes creating an HTML Parser a big problem:

Some start tags cannot contain end tags, such as tags in HTML. An error occurs when the end tag is included.

Some start tags can selectively show end tags or hide end tags, such as <p> tags in HTML. When another <p> tag or some other tag appears, assume that there is an end label before this.

Some start tags must contain end tags, such as <script> tags in HTML.

Tags can be nested in any order. Even if the end tag does not appear in the reverse order of the start tag, for example, <B> This is a <I> sample </B> string </I> is correct.

Some features must contain values, such as the src feature in .

Some features do not require certain values, such as the nowrap feature in <td nowrap>.

You can add double quotation marks on both sides of the definition feature. Therefore, both and are allowed.

These problems make creating an SGML parser a daunting task. The difficulty in determining when to apply the above rules causes the definition of SGML language to be stuck. Starting from these problems, XML gradually enters our field of view.

XML removes the casual Syntax of SGML, which previously headaches many developers. In XML, the following syntax is used:

Any start tag must have an end tag.

You can use another simplified syntax to indicate both the start and end labels in a tag. This syntax is followed by a slash (/) before it is greater than the symbol, for example, <tag/>. The XML Parser translates it into <tag> </Tag>.

Labels must be nested in the appropriate order, so the end labels must match the start labels in the image order, for example, <B> This Is A <I> sample </I> string </B>. This is like viewing the start and end labels as the left and right brackets in mathematics: the outer brackets cannot be closed before all internal parentheses are closed.

All features must have values.

All features must be enclosed by double quotation marks.

These rules make developing an XML Parser much easier, and remove the effort spent parsing SGML to determine when and where to apply those strange syntax rules. The first six years after the emergence of XML came out of a variety of different languages, including MathML, SVG, RDF, RSS, soap, XSLT, XSL-FO, while also improving HTML to XHTML.

If you want to compare SGML and XML, see the W3C annotation:Http://www.w3.org/TR/NOTE-sgml-xml.html

Currently, XML is one of the fastest growing technologies in the world. Its main purpose is to use text to represent data in a structured manner. In some ways, XML files are similar to databases, providing a structured view of data. Here is an example of an XML file:

Each XML document starts with the XML preface. The first line in the previous code is the XML preface. <? XML version = "1.0"?>. This line of code tells the parser and the browser that the file should be parsed according to the XML rules discussed earlier. The second line of code, <books>, is the document element, which is the most out-of-the-box label in the file (we think the element) is the content between the start tag and the end tag ). All other labels must be included in the tag to form a valid XML file. The second line of the XML file does not necessarily contain document elements. If there are comments or other content, the document elements can appear later.

The third line of code in the sample file is annotation, and you will find that it is the same as the annotation style used in HTML. This is one of the syntax elements that XML inherits from SGML.

The <DESC> tag contains some special syntaxes. <! [CDATA []> code is used to indicate text that does not need to be parsed. It allows special characters such as greater than or less than signs to be contained in the text, without worrying about cracking the XML syntax. The text must appear in <! [CDATA [and]> can be properly resolved. Such text is called character data section, or CDATA section.

The following line is before the definition of the second book:

<? Page render multiple authors?>

Although it looks like an XML preface, it is actually a different type of syntax called Processing Instruction. The purpose of a processing command (PI) is to provide additional information to programs that process pages (such as XML parsers. Pi is usually not in a fixed format, and the only requirement is that there must be at least one letter following the first question mark. After that, pi can contain any string sequence except minor signs and greater than signs.

The most common pi is used to specify the style sheet of an XML file:

This pi is usually directly placed after the preface of XML, which is usually used by Web browsers to display XML data in special styles.

If you are interested in XML and want to learn more about it and its application, please refer to the "Basic tutorial on XML and Dom" to be published by the People's post and telecommunications Publishing House.

References:Http://www.kutime.com/web/wyzz/XML/CHAR

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.