Getting Started with XML technology

Source: Internet
Author: User
Tags format define definition end functions new set version valid
Xml

Application and development


Editor's note: XML (extensible Markup Language, extensible Subscript language) is a standard published in February 1998 by the World Wide Web Consortium, Internet Consortium, Like HTML is a simplified subset of SGML (Standard generalized Markup Language, Standard Universal Markup Language). Because it combines the rich features of SGML with the ease of use of HTML in the Web application, since its inception, the rapid access to software developers and developers of the love of the program, showing a strong vitality.

Because XML solves the problem that HTML can't express data content, it has been widely used in government, finance, securities, post and telecommunications, insurance, taxation, judicature, publishing and electronic commerce.

This newspaper has published numerous articles in XML programming, and many readers have a certain understanding of it. But because of the independence of the article, it is difficult for readers to grasp XML technology as a whole. To further help readers better understand and master XML technology, this newspaper and Microsoft (China) Company, especially invited the Beijing posts and Telecommunications Zhang, jointly run the "XML Technology" series lectures. Starting from this issue, this newspaper will be about 8 sessions in the "Technical Lectures" column detailing the development and application of XML technology.

XML vs. html comparison

Internet provides a global network interconnection and communication functions, the development of web technology is rapid, and its rich information resources to people's learning and life has brought great convenience. In particular, the emergence of HTML (Hypertext Markup Language), simple and easy to learn, flexible general features, so that people publish, retrieve, exchange information have become very simple, so that the web has become the largest global information resource database. However, the full rise of web-based emerging fields such as electronic commerce, electronic publishing and distance education makes the traditional Web resources more complicated and diversified, and the data volume is more and more large, which demands higher transmission ability of the network. At the same time, people's demand for Web service functions has reached a higher standard, such as: users need to intelligently search the web and the data according to different needs for diversification display personalized services; Companies and enterprises to create and distribute a large number of valuable document information to customers to reduce production costs, as well as the different platforms, different formats of data integration and data transformation, and so on, these requirements are increasingly widespread and urgent.

Traditional HTML can not solve these problems effectively because of its limitation: as a simple expressive language, it can only display content and cannot express data content. This is precisely the E-commerce, intelligent search engine necessary. In addition, the HTML language can not describe the vector graphics, mathematical formulas, chemical symbols and other special objects, in the data display of the description of the ability is not satisfactory. Most importantly: HTML is only an instantiated subset of SGML (Standard generalized Markup Language, standard generic Markup Language), which is poor scalability, and users simply cannot customize meaningful labeling for others to use. All this has become a barrier to the further development of web technology.

SGML is a common document structure descriptor language, which provides an exceptionally powerful tool for grammatical labeling and is extremely extensible, so it is useful in data classification and indexing. However, SGML is too complex to be used for the day-to-day use of the network, coupled with the high cost of development, not supported by mainstream browsers, and so on, the spread of SGML on the web is hampered. In this case, it is imperative to develop a language that combines the power, scalability, and simplicity of HTML with SGML. Thus the XML language was born.

XML (extensible Markup Language, extensible Subscript language) is a standard published by the Consortium in February 1998. It is also a simplified subset of SGML, which combines the richness of SGML with the ease of use of HTML in Web applications, defines the data structure in an open self-describing way, and highlights the description of the structure while describing the content, thus reflecting the relationship between the data. This organizes the data to be friendly and operable for both the application and the user.

Advantages and disadvantages of XML

One of the advantages of XML is that it allows organizations and individuals to build a set of sets that fit their needs, and they can be quickly put into use. This feature enables XML to be used in E-commerce, government documents, judicial, publishing, CAD/CAM, insurance agencies, manufacturers and intermediary organizations in the field of information exchange, for different systems, vendors to provide unique independent solutions.

The biggest advantage of XML is that its data storage format is not constrained by the display format. In general, a document consists of three elements: data, structure, and presentation. For HTML, the display is embedded in the data, so that when you create text, always consider the output format, if the same content for different styles of display when the requirements are different, to create a new document from scratch, duplication of work is heavy. Furthermore, HTML lacks a description of the data structure, which is inconvenient for the application to understand the content of the document and extract the semantic information.

XML separates the three elements of the document and processes them separately. First, separate the display format from the data content, and save it in the style Sheet, so that if you need to change the way the document is displayed, you can just modify the style sheet file. The self-describing nature of XML can well represent many complex data relationships, enabling xml-based applications to search for relevant data content accurately and efficiently in XML files, ignoring other unrelated parts. XML has many other advantages, such as it facilitates the exchange of information between different systems, can act as an Internet language, and is expected to become a standard mechanism for data and document exchange.

Of course, XML as a new set of standards, there are many deficiencies: it emphasizes the data structure, while the semantic expression ability is slightly insufficient, such as the definition of < address > Such a subscript, if not in the document to actually define the content, We cannot know whether to express the home address or the e-mail address. In addition, some of the XML technology has not yet formed a unified standard, fully supporting the application of XML is very small, even the browser's support for XML is limited.

Therefore, XML can not completely replace the HTML, after all, HTML is the most convenient and efficient way to publish online information. Moreover, HTML is the language that describes the display of data, and XML is the language that describes the data and its structure, and the two are functionally different.

The future of XML usage

In any case, the application of the web will be more exciting with the development of XML:

1. Business Automation Processing

XML's rich labeling can fully describe different types of documents, such as letters of credit, insurance policies, claims, and various invoices. The data that is sent to the Web by structured XML documents can be encrypted and easily appended with a digital signature. Therefore, XML has the hope to promote the large-scale application of EDI (Electronic Data Interchange) technology in the field of electronic commerce. Interested readers can visit the website http://www.xmledi.org.

2. Information release

Information release plays an important role in the competitive development of enterprises. The server only needs to issue an XML file, and customers can choose and produce different applications to process the data according to their own needs. With the help of the XSL (extensible Stylesheet Language), extensive, general-purpose distributed computing is possible.

3. Intelligent Web applications and data integration

XML can more accurately express the true content of information, its strict syntax reduces the burden of the application, but also makes the development of intelligent tools more convenient. Data from different applications can also be translated into the unified framework of XML for interaction, transformation, and further processing.

The advantage of XML is very noticeable, its development is in the ascendant, the future web will be the web! of XML

Development tools FOR XML

There are many tools available for developing XML:

Notepad: The most direct and simplest text editing tool is available in Windows attachments.

Microsoft XML Notepad: Microsoft's editing software, designed specifically for XML documents, enables it to verify the validity of XML documents, detailing and downloading addresses: http://msdn.microsoft.com/xml/NOTEPAD/intro.asp.

Visual InterDev: The software is designed to develop Web applications, not just XML, but also ASP, HTML, XSL style sheets, and so on.

Microsoft XML tree Viewer: This software enables you to display the contents of an XML document in the form of a structure: http://msdn.microsoft.com/xml/demos/default.asp.

Microsoft XML Validator: The software can check whether the XML document is "well-formed" and its validity, and warns of errors, its download address is: http://msdn.microsoft.com/xml/demos/default.asp.

Microsoft XSL Debugger: The complexity of the style sheet files makes it easy for developers to write errors that help users debug style sheet files, displaying complex, tedious debugging processes with a visual interface. Download Address: http://msdn.microsoft.com/xml/_archive/xsl-debugger/xsl-debugger.htm.

WordPerfect: A text processor that provides advanced support for developing XML and SGML, has WYSIWYG development environments, provides wizards, automated control inserts, and automated document generation, and is a fee-based commercial product that supports Windows 95/98/ 2000 and Linux platform. Please visit http://www.corel.com/for more information.

Sixpack: Provides a concise interface for XML parsing and development, supports the Macintosh platform, and exposes source code, please visit http://www.trafficstudio.com/sixpack/info.htm for details.

Xray: An XML editor with real-time error checking. It allows users to create well-formed XML documents or validate documents based on a DTD or XML Schema, and supports multiple-document editing, a free software that supports Windows 95/98/nt/2000 platforms. Please visit http://architag.com/xray/for more information.

Document Format FOR XML

First, the basic unit of the content of the XML document-the element, its syntax format is as follows:

"Label" Text content 〈/label

Elements are made up of start tags, element content, and end tags. The user places the data object to be described between the start tag and the end tag. For example:

< name > Wang Ping </name >

Regardless of how long or complex the text content is, there are other elements in the XML element that can be nested so that the relevant information forms a hierarchical structure. In the following example, the elements of <employees> include all the staff information, and each employee is described by the <employee> element in the,?lt;employee> element, which is nested in <name> and < salary> element.

Example 1:

<employees>

<employee>

<name>lars peterson</name>

<salary>25000</salary>

</employee>

<employee>

<name>charlotte M. cooper</name>

<salary>34500</salary>

</employee>

</employees>

In addition to elements, the valid objects that can appear in an XML document are processing instructions, annotations, root elements, child elements, and attributes.

Processing instructions

The processing instruction provides the XML parser with information so that it can correctly interpret the contents of the document, and its initial identification is "?>", and the end tag is "". A common XML declaration is a processing instruction:

<?xml version= "1.0"?>

Processing instructions can also be used for other purposes, such as defining whether a document is encoded in GB or Unicode encoding, or applying a stylesheet file to an XML document for display.

Comments

Comments are character data that is used as an interpretation in an XML file, and the XML processor does not do anything with them. Annotations are caused by "<!--" and "-->" and can appear anywhere between XML elements, but they cannot be nested:

<!--This is a note-->

root element and child element

If an element starts from the preamble of the header to the end of the file and contains all the data information in the file, we call it the root element.

XML elements can be nested, so elements that are nested are called child elements. In the previous example,<employee> is the child element of <employees>.

Property

Property provides further descriptive information to the element, which must appear in the start tag. Properties appear as name/value pairs, property names cannot be duplicated, names are separated from values by the equals sign "=", and values are enclosed in quotation marks. For example:

<salary currency= "US $" > 25000 </salary>

The properties in the example above indicate that the monetary unit of the salary is in dollars.

Syntax FOR XML

The basic structure of an XML document consists of a preamble and a root element. The preamble includes XML declarations and DTDs (or XML schemas), DTDs (document type Define, file-defined types), and XML Schemas are used to describe the structure of an XML document, which describes how elements and attributes are linked together.

For example, a complete XML document is formed in front of the document in Example 1, preceded by the following preambular section:

<?xml version= "1.0"?>

<! DOCTYPE employees SYSTEM "EMPLOYEES.DTD" >

There is only one root element in an XML document, all other elements are its child elements, and,<employees> is the root element in example 1.

An XML document should first be "well-formed" (well-formed), the formal definition of which is located at:

Http://www.w3.org/TR/REC-xml

In addition to satisfying the unique attributes of the root element, a well-formed XML document includes:

The start and end tags should match: The end tag is essential;

Case should be consistent: XML is sensitive to the case of letters,<employee> and <Employee> are completely different two tags, so the end tag in the match must pay attention to the same case;

Elements should be nested correctly: child elements should be fully included in the parent element, and the following example is a nested error:

<A>

<B>

</A>

</B>

The correct nesting method is as follows:

<A>

<B>

</B>

</A>

Attributes must be included in quotation marks;

The attributes in the element are not allowed to be duplicated.

The "validity" of an XML document means that an XML document should comply with a DTD file or schema, and that "valid" XML documents must be "well-formed", and we will elaborate on them later.

Namespaces FOR XML

XML documents are likely to define many elements or attributes with the same name and different meanings, especially when the different XML documents are merged into each other, creating conflicts more easily. Namespaces are proposed to solve this problem. It distinguishes it with a URI (uniform Resource indicator, Uniform resource indicator), a collection of all names that appear in the elements and attributes of an XML file. The following example:

<pr:payment xmlns:pr= "http://www. Microsoft.com/payroll ">

<pr:employee>lars peterson</pr:employee>

<pr:description>reimburse expenses</pr:description>

<pr:total>199.76</pr:total>

</pr:payment>

With a namespace, the user can guarantee that the name used in the file is unique. The definition of an element's property xmlns means that a namespace is specified for the element. The namespace_name must be a valid URI.

If Local_prefix (local prefix) is omitted, then the default namespace is formed: <payment xmlns= "Http://www.microsoft.com/acct" >

<customer>1234</customer>

<amount>500.00</amount>

<date_received>12-03-2000</date_received> </payment>

If a default namespace is defined for an element, then the element and its child elements, including their properties, are automatically part of the namespace, not to be marked in front of each element and attribute one by one.

At the beginning of this article, we made an overview of XML as a new technology, analyzed its advantages and disadvantages, and prospected its good application prospects. In the latter part of this paper, the related syntax and format of XML document are briefly described, and several useful development tools are introduced. In the future, we will be fully developed, in-depth into the XML technology, to explore this wonderful world!



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.