Accurate use of language information in XHTML and HTML

Source: Internet
Author: User
Tags add reserved rfc xmlns
Notes on XHTML translation when I started XHTML 1.1, I never knew what to write on Xml:lang, I want to use Chinese, is it the value of EN, zh-cn/zh-cn or gb2312/gbk/gb18030 or UTF8? I usually have problems with the first Google Chinese, but also can not find the answer. I almost believed it when I saw some authoritative websites using gb2312, but based on my experience with using Linux to set up my language, I intuitively told me it was wrong. So began to narrow the scope to the global consortium to Google, found tutorial:using language information in XHTML, HTML and CSS (DRAFT), read, and finally out of the misunderstanding, is willing to share experience with you.

Still translation, but this article is too long, and there are many we do not need information, this time I chose only part of the hope that the problem can be said clearly.

Declaring documents and text languages
Why to declare a language
Information about the language of the document is important for screen readers and ease of use, and is advantageous from the outset. These programs need to understand whether they can generate output from text, or whether they need to go to a different language mode.

Markup language information is also good for applying appropriate style changes. For example, you need to change the font to adjust different characters, based on the language to generate unused quotes and so on.

Some browsers use language information for Chinese Simplified, Chinese Traditional, Japanese and Han Wenlai to detect suitable fonts. However, in a page that uses Unicode encoding, these languages may share the same ideographic character within the code. People who speak these languages may differ in the small details used in these characters.

Markup language information also allows you to extract elements of the specified language using a script. For example, use the XSLT lang () function to extract the text of a specified language from a file, or to apply a language-specific style when XSL-FO conversion.

In many cases, the first time you develop content, you may not be aware of the importance of these applications, although they are generally very easy to add when they are created, which can be problematic when you need a style makeover.

In addition, some programs for language tagging are still in early development or lack, but from now on you should add language information to your content to reap the benefits of the future as the technology matures.

Always declare a language for a document in the HTML documents should generally declare the language of the document, which can be achieved by adding the lang attribute to the HTML tag. For example, a document using Canadian French (Canadian French) is declared below:

Later, we'll talk more specifically about specifying values for language attributes.

When you put XHTML servo as text/html, you should use the lang attribute and the Xml:lang attribute in the HTML element. The Xml:lang property is the standard use for determining language information in XML. Here's how you should mark the previous example of XHTML 1.0 as text/html Servo:

http://www.w3.org/1999/xhtml ">
The Xml:lang property is not useful when working with HTML files, but inheriting from the lang attribute means you want the document to be treated as XML by the script or validator.

If you use XML (for example, MIME types like applications/xhtml+xml) or XHTML 1.1来 servo XHTML, you no longer need the lang attribute because it is separated from the HTML language. A separate Xml:lang property is sufficient.


Always declare language changes to text
In a different text from the main language of the content, you should indicate the language of the text. method is always the same as the section on declaring the language in the
<p>the French for <em>Cat</em> are <em lang= "FR" >chat</em>.</p>
The lang attribute can be used on any HTML element other than applets, base, Basefont, BR, Frame, frameset, IFRAME, Param, and script.

Also, with the text/html servo XHTML 1.0, you can use two properties together, such as:

<p>the title in Chinese is <span lang= "en"
xml:lang= "ZH-CN" > Document Information Center of Chinese Academy of Sciences </span>.</p>
Note that in the last example, there is no tag around the Chinese text that allows us to attach language information, so we introduce a SPAN element to achieve our goal. (Please check the source code for this section--translator Note)

If you are using XML servo XHTML, as described in the previous section, you should use only the Xml:lang attribute.

Specify the value of the language attribute
Using RFC 3066 rules
RFC 3066 is a standard that defines how language tags are used to identify languages.

The language tag is separated by a main subtag, followed by 0 or more attached subtag, divided by hyphens.

The main subtag represents a language (there are two exceptions, I and X, discussed below), and any trailing subtag serves to modify the dialect or usage of the language. The subtag in the back generally represent the state, dialect or writing system.

The following example shows that the document is not only in English but also in British English, that is, written in English relative to American English.

Subtag is sensitive to case, including letters and numbers from a to Z,a to z,0 to 9, and not more than 8 characters.

Note that the HTML specification still recommends using RFC 1766来 to determine the language. RFC 3066 is an upgrade of RFC 1766 and is vastly exceeded, and there is a plan for the wrong table in the HTML specification, so you should use RFC 3066 regardless of how the HTML specification is explained at this stage.

Main Subtag
All initial subtag must be 1,2 or 3-letter lengths. All 2 and 3-letter subtag are the language codes in the ISO 639 part 2 that define the code to represent the language. The 1-letter subtag is an i-or X-prefix, which we will describe later.

Although the code is case sensitive, they are often lowercase, but this is just a convention.

Also note that ISO offers 2-letter and 3-letter code choices when you should choose 2 letters. This ensures that a unique code is used as quickly as possible for each language, with a slightly outdated 2-letter code (based on RFC 1766, which does not allow 3-letter-length code) to be changed. At the same time, the question of which 3-letter code should be avoided is not a problem, since all the few languages that have two different 3-letter codes also have 2-letter codes.

Attached subtag
The addition of subtag can indicate geographical area, dialect, text system, or other improvements to the main (language) subtag. The primary subtag can be followed by any number of subtag, although more than one is uncommon.

RFC 3066 indicates that any 2-letter subtag located at the secondary location is an ISO 3166 country code. There are no rules for using subtag in any third or next position.

The 2-letter ISO code used to represent the country is usually capitalized, but it is only a convention.

Special Master Subtag.
RFC 3066 defines some examples that may not start with the ISO language code.

The language tags starting with I are reserved for the IANA register language tags (iana-registered language tags). For some examples:

I-mingo
I-klingon
I-tao
The X-Start language tag provides a widget for user-defined language tags. The label on the secondary position must be more than one letter and cannot be reserved for the following SUBTAG:AA, QM-QZ, Xa-xz, and ZZ.

Of course, these language-aware methods do not need to be used when the ISO code based on 2-letter or 3-letter is available. These methods are used to limit or prevent confusion of interoperable languages.

IANA Register language label
The IANA language tag can be registered by the email submission program referred to in RFC 3066. These tags can have 3 to 8 letters long in the secondary position code.

Registering the IANA code is better than using user-defined code because it minimizes the possibility of confusion because the IANA code is dominant for others. On the other hand, the IANA label is an ISO standard statement disapproved

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.