XML is an Extensible Markup language that is a subset of the SGML language.
It is a meta-markup language.
Compared with HTML, you can not only describe the appearance of a document, but also describe the content and structure of the document.
It is a text format in itself.
It has two main applications: one is to express the underlying data, such as a configuration file, and one to add metadata to the document.
How does the computer access data?
Data files in a computer fall into two main categories: binary and text files.
binary files can only be read and built in certain programs because it is a bit stream in itself, and only the application that creates the binaries understands the meaning of the bitstream.
For example, a document created by Word with a. doc extension is a binary file. All the information in the document needs to be represented by binary code ( metadata ).
A word processor-generated document cannot necessarily be read by another word processor, because the developers who design these programs have their own format definition data files (and most of the word handlers now have auto-transform capabilities).
Binaries have many advantages: computers are easy to understand, processing is fast, and storage efficiency is high. Cons: You cannot open another application-generated binary file in one application (not just a word processor), and the same application may have compatibility issues on different platforms.
text files are also bit streams, but they are organized in a standard format (and, of course, a lot of standard formats), and they always make up a number, and each number is then mapped into characters. The advantage of this kind of file is that it is easier to share and thus facilitates the popularization of the Internet. But there are shortcomings, it can not save metadata, and the file occupies a larger space.
the way a character is represented using the underlying data stream is referred to as the encoding of the file. the encoding of a file can be learned from the first few bytes of a file, so the application examines these bytes as it opens the file, and uses them to determine how the data is displayed and manipulated. If the previous few bytes are empty, the application interprets the file using the default character set.
For example, when a word-generated document is converted to text format, it is displayed with no additional functionality, but only in plain text.
SGML (Standard Universal Markup Language) was born to combine the commonality of text files with the storage efficiency and rich storage formats of binary files. It is a text-based language that marks data in a self-describing way.
HTML is based on SGML, which can be used for information display and links between different information media. But it has a limitation that it can only be used to display documents in a browser. We have no way to infer from this document what content represents what the specific display is, and it can only be used to display information externally and not to describe itself. Because the tags in the HTML simply tell the browser how to display the content between the tags, it does not provide information about the content between the tags.
XML is also based on SGML, which is fully compatible with SGML. It is used to target data structures so that developers can access them based on the structure of the data.
Programmers have been building their own data in different ways, and each new method of data building has to have a new method of data reading, yet these so-called new data-reading methods have to undergo many experiments and tests before they can be guaranteed to be effective. To simplify development, XML provides a standard way for us to read data without worrying about how data is built.
XML parser:
Why can we get the data easily with XML? This is because there are some programs called parsers that can understand XML syntax and help us read information. Because of the parser, we don't have to deal with the XML directly in the application, and we simply give it to the XML parser to handle it.
When parsing XML, the parser does not need to know where the data resides in the file, it requires only a few tags to handle the data in the tag as required.
Just as any HTML document can be displayed on any Web browser, any XML document can also be read by any XML parser.
About "expandable":
Because we have complete control over how XML documents are created, we can organize the data in any way, so that for a particular application, the corresponding XML will have a specific meaning. The extensible meaning is that anyone can tag data in any way with XML. While HTML is different, we cannot increase the vocabulary of HTML, we must use the tags specified in the HTML specification.
Because of the flexibility of XML, we can build our own vocabulary in development, but if we use a common format, the likelihood that our generated software will be directly compatible with other software is greater.
Overview of HTML and XML:
HTML is used for information representation, while XML is used for information exchange. They have a certain price to pay for their function. HTML can be displayed in almost all Web browsers, but this "all" is built on the expense of the layout accuracy and better display of the HTML document. Similarly, XML developers have to abandon a proprietary format that can reduce the size of XML files in order to make the data format more versatile and flexible.
HTML is designed for a special purpose, which transmits information to people through a Web browser. , and XML itself has good versatility.
If some Web pages are displayed in a browser in the actual run and cannot be displayed in another browser, in most cases it is due to the use of some nonstandard HTML tags on the page.
Any XML parser can read the information in any XML document, but reading the information does not mean that the application can understand the real meaning of the information.
The composition of the XML:
It is not possible to define only one specification to encompass all of the information about the organization in XML, and for this reason, several relevant specifications and recommendation standards together form the entire XML technology.
They are:
- XML1.0 is the lowest-level recommendation, and the XML standard series is built on it. It specifies the syntax that XML documents must obey, the rules that the parser must follow, and so on. In addition, it defines the document type definition (DTD).
- Since we can create our own document structure and element names, the DTD and schema provide us with a way to define the document type.
- namespaces make an XML word different from another XML word. With namespaces, we can organize multiple words into a single document type.
- XPath is a query language that provides the ability to address a subset of the content in an XML document. It allows an application to read a portion of an XML document without having to read the entire document.
- For the display of content in an XML document, we can do it with CSS and XSL.
- An XML parser is not necessarily able to recognize an HTML document because XML has a stricter syntax. For some, there was XHTML. It is an XML version of HTML.
- The XQuery recommendation standard helps us to query data directly from an XML document on the Web.
- The role of the DOM is to make it possible for previous applications to access XML documents.
The opening chapter of the learning process of the "XML Getting Started classic"