With the rapid development and popularization of the Internet, people can connect with the Internet through the computer, from all over the world to receive and send a large number of up-to-date information, but in the process of information exchange there is a prominent problem, is a variety of data formats, to the effective use of information has brought obstacles. So in the information age, how to obtain the necessary information in the most convenient, reliable and effective way is a great trouble. People expect to be able to find a data format that can describe any logical relationship to unify the storage of electronic data so that it is no longer plagued and confused by the data format's not being unified. At present, the ability to assume this is the XML (Extensible Markup Language, extensible symbolic language).
It can be said that the advent of XML has brought about a revolution in data exchange; the advent of XML is by far the smartest of a symbolic language; XML is the cornerstone of the next generation of network development.
The background of XML birth
SGML Introduction
When it comes to XML, we must first understand SGML (Standard Generalized Markup Language). SGML was originally developed by IBM as a symbolic language for typography, called GML. After several years of development, the International Association for Standardization (ISO) began discussions on this proposal in 1984 and formally recognized SGML as the International Standard (iso8879) in 1986.
Graph: XML as a unified format for electronic data interchange
SGML is actually a common document structure description symbolic language, which is mainly used to define the logical and physical structure of the literature model. An SGML language file consists of three parts: a syntax definition, a file type definition DTD (definition type document), and a file instance. The syntax Definition section defines the file type definition and the syntax structure of the file instance; The File type definition section defines the structure of the file instance and the element type of the constituent structure; The file instance is the main part of the SGML language program.
In the actual use of SGML, each particular DTD defines a type of file. Thus, it is customary to refer to the SGML language, which has a particular DTD, as a certain symbolic language. So SGML becomes the meta language of those derived languages.
The appearance of HTML
Tim Bernas Li Hu, an information specialist at CERN, the European Physics Quantum Laboratory, invented the Hypertext link language in 1989, which makes it easy to connect text or graphics in a file to other files, which is the predecessor of HTML. Tim Bernas Li Hu The first specification of the HTML language at CERN in 1991, and then became the symbolic language specification designed by the World Wide Web Organization for publishing information specifically on the Internet. It can be said that HTML (Hypertext Markup Language) is an example of SGML, and its DTD is fixed as a standard. Therefore, HTML cannot be used as a meta language for defining other symbolic languages.
As part of the World Wide Web, the HTML language developed very quickly, and in just a few years it has gone through multiple versions of html1.0, html2.0, html3.0, html4.0, and DHTML (Dynamic), vhtml (virtual), sHTML and so on also rapid development. HTML with simple and concise syntax, easy to grasp the universality and ease of learning, so that Web pages can be close to every ordinary people, the Internet so that the popularity of the development of today's brilliant.
However, the current HTML is not stable, different browsers will produce different display effects. In addition, because of the lack of HTML support for hyperlinks, and lacks of space stereo description, processing graphics, images, audio, video and other multimedia capabilities are weak, graphic mixed-row function is simple, can not express the synchronization of a variety of media and other shortcomings, but also affect the large-scale application of HTML and complex multimedia data processing
The birth of XML
XML is a standard published by the Consortium in February 1998. It is also a simplified subset of SGML, which combines the richness of SGML with the ease of use of HTML in Web applications, defining data structures in an open, self-describing way. When describing the content of the data, the description of the structure can be highlighted, which reflects the relationship between the data. This organizes the data to be friendly and operable for both the application and the user. After that, the consortium uses XML to design a language equivalent to the html4.01 function, called xhtml1.0 (extensible Hyper Text Markup Language), which is compatible with HTML.
XML is a subset of SGML, and strictly speaking, XML is also SGML. Unlike HTML, where XML has a DTD, it can also be used as a meta language to define other file systems, or other symbolic languages, as SGML does. If the symbolic language is divided into a meta-symbolic language and an instance-symbolic language, SGML and XML are both meta-symbolic languages, while HTML and XML-derived XHTML are instances of the symbolic language.
So it can be said that the advent of XML, so that the problems of HTML can be well resolved.
The characteristics of XML
XML is applied to many systems, such as BtoB, Web services,. NET and so on, as a compelling core technology. XML is an indispensable key word in the IT world conversation. Therefore, XML is an essential knowledge for computer engineers.
XML can be used as a unified format for electronic data interchange
For more than 10 years, the Web Storage format language, from HTML to DHTML, to today's newest XML, every step of the way, is designed to meet the needs of Web application development. Although XML was originally a document description language designed for publishing, a symbolic language derived from SGML, it was also designed for the exchange of data in the Internet. Thus, XML is not only a document defined by SGML, but also makes it possible to exchange data in various fields, such as E-commerce.
The reason that XML can be applied to various fields, is that XML has so far other methods do not have the characteristics of data description, control information is not a unique form of application software, but the use of anyone can understand the form of markup, so XML is the most suitable for data exchange standards, which is The reason why XML is being watched.
The use of XML to define the data relationship to form a unique standard, so all walks of life in the establishment of their own industry standards, to apply to the network to deal with E-commerce, the background system through the Web site performance. XML can also be used as data warehousing, an XML file is a small database, through the definition of the relationship between the relationship, the attributes of data, to achieve data exchange, contextual retrieval, multimedia transmission.
XML has characteristics of data description that other methods do not have
XML is a file format that is described in text form because XML is described in text form, it is suitable for data exchange of various platform environments. Also because of the use of text to describe content, you can cross the barriers of different platforms for normal data exchange. However, the text form can also cause unreadable problems because of the different text code, at which point XML has a very perfect solution.
XML uses meaningful markup (tag) in XML The file is made up of parts called elements (Element). Use tags (tag) to describe the elements. Because of the use of the tag description method, the original data can be maintained and constructed on the Internet for data exchange, which can maintain the flexibility of data exchange between different systems.
Why do you have the above advantages? The first is to use tags to describe the data, you can specify the start element (start tag), end element (end tag), between the start and end elements is the element data to be represented. This is the method of using elements to represent data.
Tokens can be used as child elements to form data nesting as elements that are marked with markup as child element XML.
The tag names and relationships of XML can be freely defined, as described in many articles, "XML is a markup-descriptive language," and the hierarchy of XML tag name tags can be defined by the user. That is, according to the XML syntax, a new symbolic language can be defined by defining a collection of tags for the user's special purpose. This is the source of the XML "expandable (extensible)" name.
It can be said that XML is "the language of definition language", is also a meta language. Because XML has the function of meta language, it can become the basic language of various data application languages, such as electronic commerce data, multimedia demo data, mathematical formula and so on. Here are some examples of the application of XML-generated data description languages (also known as XML application languages), namely, Publishing Media: Open ebook (Electronic books), NEWSML (news media); Science: MathML (mathematical expression), CML (chemistry) ; e-commerce: cxml (e-commerce), FPML (finance), Multimedia: SMIL (Multimedia demo), BML (satellite data transmission).
In addition, as with the natural language we use, more people speak the language as a tool for the initial communication between people they don't know. Similarly, it is valuable to generalize a set of tags within a particular enterprise or in the industry, that is, to agree to use a specific set of XML application languages as a communication tool. However, many XML users are using the industry or the group to standardize the future of the application language, few people make a new set of XML application language.
The deficiencies of XML
XML is excellent as a data description language, but not all electronic data is the most efficient conversion to XML. For example, the text expression of XML, symbol of markup and so on will lead to XML data than binary performance method of data volume increase, especially when the data volume is very large, it becomes a big problem. In other words, the import of XML should be based on the specific needs, compare its advantages and disadvantages, in the field of full play to the advantages of XML.
Although XML is an excellent feature of the universal Data Description language, XML is not a programming language, and ultimately it is a technique for data description. XML-related technologies such as the display of XML files, the change of file structure, and the operation of applications are also very important.
Major related technical classifications of XML
XML is a data format that describes content, and it requires many related techniques, such as the display of XML data, printing, and the change of data structure when using XML data. If these related technologies are also standardized, developing applications on different platforms can use the same approach, making development less difficult.
When working with XML data, be sure to use XML processor (that is, XML parser), and then give XML data structure checking to XML parser.
There are many software products about xmlprocessor, of course, there are a lot of free. The development engineer simply gives the XML data check to Xmlprocessor, concentrating on developing the application section, reducing the burden.
The application handles the XML tree received by Xmlprocessor and provides services to the user. At this point, whether it is e-commerce, or knowledge management, are the definition of XML data structure of the technology, display the printing of XML information technology, XML data structure change technology, XML database connection integration technology, the application of the XML tree API and other applications of the combination.
Techniques for defining XML data structures
In XML, the user is free to define the tag name and the element and element level associated with the tag, which is the main feature of XML. However, if you are defining a tag that only you can understand, you cannot exchange data with others. In order to exchange XML-formatted data among enterprise groups, the structure of XML data, the name of the element, the data type of the element, and the parent-child relationship of the elements need to be carefully considered, and the language that adults and the system can understand must be designed. The XML data structure designed in this way is called a schema in the XML domain, and the language describing the schema is called the schema language.
The schema language for the most common XML in a DTD is a DTD (document type definition: the definition of documents types). The DTD is a schema language that was used long before SGML, and when the XML syntax was developed in 1998, the DTD describing the schema was followed.
XML Schema in XML, the so-called "DTD" is the schema file. The schema used in DTD description is very extensive. However, with the development of XML applications, the DTD inherited from SGML clearly has many drawbacks. To solve these problems, the consortium has developed a schema language XML schema.
XML schemas have features that are not available in the following DTDs: Multiple schema comps use XML namespaces, descriptions in XML syntax, and data types that define the content of elements and property values in detail.
Because the data structure in an XML schema is also represented by XML data, the amount of data is much larger than the DTD. However, the XML schema is far more expressive than the DTD, supporting not only strings, 10 numbers, floating decimal points, dates, and so on, but also the designation of the elements (m, n, integers) that appear below m back and n back. For example, "Postcode xxxxxx" (X is a character) is the format designation of the postal code element and so on is the content that the DTD cannot express. The XML schema became the recommended norm of the consortium in the May 2001, which had an important influence on the utilization of XML.
Techniques for displaying and printing XML data
XML data definition Print, display typesetting information There are 3 main methods: To define printing and display typesetting information with CSS, to transform the XSLT into HTML for display and print, to display and print with XSLT converted to XSL (formatter object).
The CSS specifies the file typesetting information called the stylesheet, and the language describing the HTML stylesheet is CSS (cascading style sheet). Here is the use of CSS to browse, print XML data methods, in particular, CSS is for each markup element of HTML to specify typesetting information, but also can be used to give XML markup elements to define the display layout method. CSS cannot change the structure of XML, it can only be used when defining typesetting information in a simple way.
XSLT also has a way of using language XSLT (Extensible Stylesheet Language Transformations) that alters XML data structures. Using XSLT, you can change the XML element name, attribute name, element hierarchy, and so on, depending on the XSLT specification, you can convert elements of XML to HTML elements, which can be browsed in a browser. Now, this method of displaying XML data in a browser is the most common method.
The XSL XSL (Extensible Stylesheet Language) is a file that is described in XML. It is a specification that describes the level of commercial typography in detail, and the definition of XSL typesetting information. The method of specifying typesetting information in XSL consists of 2 steps:
The original XML tree to be printed and displayed is transformed, the new tree structure (Xsl-fo tree) is generated by appending the typesetting information, and the new tree is passed to xsl-fo corresponding print and display engine for corresponding operation.
XML Data Structure Transformation technology
In the application of XML, some XML data needs to be converted to the XML data of other structures, for which a language XSLT describing the standardized transformation rules of XML data structure is developed.
XSLT is a specification for displaying and printing XML, which is independent of the XSL specification, originally a language that describes typesetting information, so a program made with XSLT is called a style sheet. However, XSLT can also be used for a variety of purposes other than typesetting.
For example, when XML data is exchanged between enterprise groups that use different schemas (a collection of tokens), data exchange is possible only after the industry-standard format structure that is used by the company's own data format to the two systems is converted. The transformation rules are described not only in XSLT, but also by the execution engine (XSLT processor) of the XSLT style sheet. In order to cross the barriers between the industry to achieve data exchange, other file formats of data exchange will often occur, however, the structure of the transformation of the use of XSLT, can not change the program to change only the style sheet.
With the popularization of XML, the application of XSLT will be more extensive. On Windows, you can create and execute XSLT with IE 5.x and any text editor simply by setting up a simple environment. Even if only a little bit of the way to make XSLT stylesheets, it is very easy to do XML processing, very convenient.
With the continuous development of computer and network technology, the application of XML technology will continue to expand. This technology not only in the traditional data exchange between banks, it is more urgent for the securities companies to use the data of listed companies, the query and retrieval of books, the management of enterprise file files and so on, in the fields of e-commerce, search engine software, automatic intelligent translation, document voice software and so on, will get greater development, In particular, the recent development of mobile communication network services, will enable a variety of information through the format conversion to the PDA, even to reach the handheld telephone, in the future we can through the network data format conversion services so that small mobile phones can read the rich world of information.