Historical evolution of the HTML,XHTML,HTML5 (translation)

Last Update:2018-08-25 Source: Internet

Author: User

Tags deprecated microsoft edge

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Translated from: http://www.thymeleaf.org/doc/articles/fromhtmltohtmlviahtml.html

From HTML to HTML (via HTML)

When you use software such as thymeleaf, it is important to understand the internals of the HTML family of Web standards. At least if you want to know what you're doing.

The problem is that many people know the technology they use to create networks, but they don't really know the source of those technologies. Since the beginning of the first web interface, it has gone a long way, and since then, every new technology has changed our way of web development by abandoning our vast work, especially our knowledge.

Now, with the advent of HTML5, things have become more complicated. What is it? Why is HTML instead of XHTML? Isn't HTML tag soup considered harmful?

Let's take a step backwards and see how we get to where we are now and how we get there.

1 back to the 90 's, here is the HTML ...

... HTML is a standard (and, more correctly, a recommendation) maintained by the World Wide Web Consortium (Wide consortium,w3c). Extending from a language called SGML, HTML defines a markup-based language for writing hypertext documents coupled with Hypertext transfer protocols (Hyper-text Transfer Protocol,http, Used to service hypertext documents and their associated resources over the network).

HTTP uses text headers (headers) to define the services provided to clients and how to service them, one of which is very important: the content type (content-type) header. This header explains to the browser what type of content the service provides, using a language known as the Multipurpose Internet Mail Extension (multipurpose, Internet Mail extensions,mime). The MIME type of the HTML document service is text/html:

Content-Type: text/html

HTML also defines a way to check whether a document is valid (valid). Effective basically means that the document is written according to HTML rules, which specify which attributes the label can have, where the label can appear in the document, and so on.

These validation rules are specified in a single language: the document type definition that defines the structure of the SGML document, which is referred to as doc type definitions, or DTDs. Each version of HTML creates a standard DTD, and the HTML document must declare the DTD (with the HTML version specified), which is confirmed by a clause that needs to appear in the first line, that is, the document type declaration (DOC type Declaration, abbreviation DOCTYPE) Clause:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

2 Document Object model and label soup (tag soup)

HTML is used to display documents in the browser, and in the late 90, browsers were developed by highly competitive vendors who wanted to provide users with the most cool features. Since HTML defines only the rules for document formatting, many other features are left to the imagination of browser developers.

One of the most interesting ideas that appear in the browser is client interactivity (client-side interactivity). This interactivity is done by executing a script inside the browser – written in a language such as javacript, providing the ability for a script to process, modify, and even execute a part of a document. To do this, the browser must model the HTML document as an object's memory tree, with each object having status and events, so the Document object model, called the DOM, was born.

The problem is that good-format HTML rules are very loose, and dom trees are strict hierarchies, which means that different interpretations of HTML markup locations and sequences can lead to different DOM object trees in different browsers. In addition, these different browsers model the DOM node's API in different ways (different names, events, etc.), and you now understand the difficulty of creating cross-browser interactions at that time.

More importantly: Although all this has happened, the browser has become very tolerant to HTML developers, allowing them to write malformed HTML documents (tag soup), which is implemented by automatically correcting errors. This causes the HTML developer to create the document in a worse format, and then the browser allows more formatting errors, a vicious circle. You can then guess that each browser corrects all of these errors in a different way. It's too bad.

The web has finally standardized the scripting language in the DOM API and in the WebBrowser: JavaScript (though for some complicated reason, they insist that it is ECMAScript). However, the damage caused by the label soup world and the fact that browser manufacturers are using these standards completely slowly, because in many cases fear they will compromise backwards compatibility, the impact still affects how we create Web apps today.

3 Entering XML

For some time after HTML became a widely propagated language, the World Wide Web developed a new specification called XML (extensible Markup Language, Extensible Markup Language), which is designed to represent common data (not just the website) in the form of layered markup text.

XML is extensible because it allows you to define purpose-specific languages (tags and their attributes) to meet the needs of a particular scenario. But from an XML point of view, HTML documents are not well-formed, and XML and HTML are actually still incompatible languages. HTML cannot be represented as an XML application.

Because of strict layering and the elimination of the structural ambiguity of HTML, XML documents can be converted more directly into standardized dom trees (called XML parsing processes). Also, since XML is a text-based language, and the text is a technology-independent format (as opposed to binary), XML is particularly well-suited for cross-platform data exchange across the Internet. In fact, it has led to the advent of Web services technologies that are ubiquitous today.

4HTML + XML = XHTML

At some point, because of the obvious usefulness of XML and the fact that it can make Web documents more extensible and interoperable (for example, to generate more predictable DOM in a browser), the website has decided to re-articulate HTML as an XML dialect (or application) rather than SGML, so XHTML was born.

The introduction of XHTML and the conversion of Web documents to well-formed XML are often seen as a step forward because it allows for a higher level of standardization across browsers, fewer authoring errors (which must be corrected in a browser-specific manner), and easier parsing and automatic processing of web pages.

As part of this, XHTML introduces a controversial concept that comes directly from XML, called strict error handling (draconian error handling), This means that any XML interpreter (including the current browser) should fail immediately if any type of format error is found in the XML document being processed. In practice, this means that XHTML developers must create well-formed documents or accept the fact that browsers never (in fact allow) display them.

To verify, the XHTML 1.0 specification defines a set of dtd:xhtml 1.0 Strict, XHTML 1.0 Transitional, and XHTML 1.0 Frameset that can be used in the DOCTYPE clause. The first is for pure (pure) XHTML documents, does not use any deprecated markup from HTML, the second is for staging documents that still use deprecated tags and attributes, and the third is for frameset pages.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

But one of the most important aspects of XHTML is that it also introduces a new MIME type, which is the type that each Web server should use to provide XHTML so that the browser knows that they must be using their XHTML parser and engine instead of their HTML parser and engine. This is application/xhtml+xml:

Content-Type: application/xhtml+xml

5 (XHTML) Bone Sense Reality

After it was launched, everything in XHTML looked great. Our developers should wait for the browser to fully implement it, and the world of web development will suddenly become happier ...

The trouble is, the above never happened.

What happens is that a particular browser simply refuses to implement support for the Application/xhtml+xml content type. Guess which one. Yes, exactly, that's it, ie browser.

When you try to access a document that uses XHTML's own content type, the previous version of IE11 displays the Download dialog box, which means that if you want to be able to display your site to IE users, you won't be able to use that content type. When the problem was corrected, it was too late.

Fortunately or perhaps unfortunately, the XHTML 1.0 specification contains an appendix that declares that XHTML 1.0 content can also be provided using the old text/html content types of the HTML age to facilitate transitions. That's what most of us have done over the last few years: Create XHTML 1.0 content and then provide it as a text/html. Since the XHTML 1.0 specification was released in 2000, the Transition Time (transition) was very long.

But the truth is that when you provide content as HTML instead of XHTML, the browser uses its HTML engine instead of the XHTML-specific engine. Although their HTML engine already supports XHTML, they still have to provide backward compatibility to the old HTML 4 code, making them very tricky software. And what's important is that they lack some of the most XML features of XHTML, first and foremost, strict error handling.

If you do not have strict error handling, you will have a tolerant engine that allows you to provide an incorrectly formatted document that automatically corrects your error. If you know that the browser will correct your error (in a browser-specific way), you may never correct your file ... So the HTML horror Story continues.

Knowing this, consider that you may never really have created a real XHTML site. What you do is provide and display the XHTML documents (possibly malformed) as normal old HTML. What do you think?

But it gets worse because in 2002 XHTML 1.1 removed the possibility of using HTML content types, so only application/xhtml+xml is allowed. The problem is that instead of forcing Internet Explorer to support Application/xhtml+xml, the fact is that this restriction only transforms XHTML 1.1 into a mythical creature like the Loch Ness Monster. Almost no one has ever used it.

In 2009, the text/html again allowed the use of XHTML 1.1, but it was too late.

6 toward HTML5: A Tale of parting

At some point (especially in 2004), some browser vendors realize that existing XHTML specifications are growing too slowly to cope with the growing demands of the web (video, audio, richer application interfaces ...). , and it is increasingly pushing them toward a more rigorous interpretation of the document, which could eventually result in a large number of (malformed) existing code being invalidated.

They want to enhance their web apps with features such as video, audio, local storage, or advanced forms processing, in fact, they can be implemented by adding these features in a browser-specific way, but they don't want to go through incompatible paths. They need standards to evolve and incorporate these new features.

However, there was a problem with the standard (that is, XHTML) at the time: there are still many sites and apps that still rely on legacy HTML, and if those cool new features are standardized by strict XHTML, all of these apps will never be able to use new features unless they are completely rewritten. Everyone wants a more interoperable and standard network, but not at the cost of losing millions of of web developers for years.

So the vendors (and some members) presented the idea of developing HTML with all (or most) existing HTML and XHTML code as new HTML, while providing powerful new functionality for Web applications, and it was important to explicitly define how error handling should be done.

The latter point means that the browser does not make an error on the first issue, but rather that the specification knows how to correct the errors created by the web developer, and therefore responds to them in exactly the same way, effectively translating the HTML code (whether or not in XML format), which is completely cross-browser. You are still advised to create XML-formatted code for the new site, but if you don't like it or you still have a lot of old legacy html (typically), you can still participate. Look at that old HTML website? Let's add some video! It all sounds sensible.

But the truth is that all of this didn't sound so good in the 2004, they rejected the proposal and decided to stick to the XHTML approach. HTML is dead for them, there is no reason to resurrect it, and XHTML 2.0 is the future.

This led to a separate parting. Supporters of the new HTML concept, including members from the Opera Software,mozilla Foundation and Apple, have left the web and set up hypertext application Technology working Group (WHATWG), the purpose is to define the HTML5 we know today.

Finally, in 2007, the consortium created a next-generation HTML working group that later accepted the collaboration with WHATWG to effectively adopt HTML5 as their working norm and future goals. The WHATWG has now banded together to create HTML5, and in 2009, the consortium just let XHTML 2.0 die from the team that closed its spec.

HTML5 is now the only future for web standards.

7 So what is HTML5?

HTML5 is a set of standards-still in development until 2011-from the current HTML 4 and XHTML specifications, and is designed to:

Add advanced new features to HTML to effectively move Web development from a document-oriented concept to a more application-oriented concept. These features are called HTML5 functions, and in some cases they are defined by the standard in addition to the HTML5 core functionality. HTML5 Features-including: Video, audio, drawing canvas, geolocation, local storage, offline support and advanced form-related features.
Provides painless paths for migrating from HTML and XHTML, which makes HTML5 less or no rewrite code at all.
Provides a standard way to handle code errors so that malformed HTML5 code executes in the same predictable manner across all browsers.

From a practical point of view, this means that you only need to change your doctype to HTML5 corresponding to the current HTML and XHTML code (possibly all) as a valid HTML5:

<!DOCTYPE html>

Content Type text/html provides:

Content-Type: text/html

Here you might think: Why doesn't doctype specify a DTD at all? Because there is no. HTML5 does not have a DTD, because rules that define whether a document is valid are defined in the specification as human-readable text, but cannot be represented in a DTD language.

However, this does not mean that the HTML5 parser and engine cannot be validated. It can. It only needs to be a program specifically for HTML5 parsing, including specific code that is programmed to perform the rules involved in validating HTML5 (rather than reading these rules from the DTD file). Even though the specification is now very flexible, it is still a specification that you must abide by it.

But if there is no DTD, why should there be a DOCTYPE clause? Because a DOCTYPE clause is required to enable the browser to display the document in standard mode (standards mode) (instead of the weird pattern quirks modes). <! DOCTYPE html> is probably the smallest and most effective DOCTYPE statement, which is exactly what we need. It's just a switch.

8 can I use the HTML5?

Most of them yes. Although (as of 2016) there are no browsers that fully implement the entire HTML5 feature set, most of the common feature sets do achieve most of the functionality. Therefore, you should be fine in most cases as long as your users do not experience Internet Explorer that is very old (and now does not exist).

Also, be aware that browser support actually evolves over time, not only because the browser releases the new version quickly, but also because the specification itself is still in progress.

For a list of HTML5 features and corresponding browser support, check can I use ... Site. It is worth noting that all of the HTML5 features have a list of categories: http://caniuse.com/#cats =HTML5

9 about XHTML5? Does it exist?

Theoretically speaking, yes. XHTML5 is only HTML5 by serving as follows:

Content-Type: application/xhtml+xml

Note that this feature is not supported before IE11 (Microsoft Edge Support). Again, consider your user's browser capabilities.

Note that the difference between HTML5 and XHTML5 is only the content type, because the XML well-formed HTML5 document is actually a fully valid HTML5 document. This is completely different from the relationship between HTML4 and XHTML 1.0/1.1, which is an incompatible language.

Historical evolution of the HTML,XHTML,HTML5 (translation)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More