Objective
Recently joined the new project team responsible for front-end technology research and selection, which involves a familiar and unfamiliar needs-internationalization and localization. Familiar with the previous project has played, unfamiliar is the previous implementation of just stay in the "there" stage. Take advantage of this opportunity to learn to tidy up, for the following technical selection to prepare.
This article will elaborate on the concepts of internationalization and localization, as well as one of the important concepts of--language tag (also known as Language Code or Culture).
What is internationalization?
Internationalization I think that is the application of support for multilingual and cultural practices (numbers, currency, date and character comparison algorithms, etc.), while localization is the application of the identification of the user's own cultural practices automatically adapted to the corresponding language and culture version.
It used to be that internationalization was the replacement of strings--like "Hello!" Replace with "What ' s Up, man!", in fact, is divided into the following 5 aspects:
- String substitution
If "你好!"
replaced by "What's up, man!"
.
- Digital representation
For example 1200.01
, the English representation is 1,200.01
, while French is 1 200,01
, and German is 1.200,01
.
- Currency representation
such as the renminbi ¥1,200.01
, the United States dollar is $1,200.01
the way, but English is the euro ?1,200.01
, German is the euro 1.200,01 ?
.
Note: There is no exchange rate here.
- Date Representation method
For example 2016年9月15日
, the English representation is 9/15/2016
, while French is 15/9/2016
, German is 15.9.2016
.
Character comparison algorithm
For example, English and German are both in the front and in the ä
z
ä
z
Swedish language z
ä
.
Localization key--language Tag
Since you want to automatically adapt to the language and cultural version of the user, you have to have a basis to recognize it? I think we should be zh-CN
familiar with and so on, and they are the en
basis of our needs! When we use an existing i18n library for internationalization/localization, the following documents will be written
{"
en": {"name": "Enter Name"},
"ZH-CN": {"name": "Enter Name"}
}
But en
zh-CN
are there any other keys besides and? What are the rules of their composition? Let's take a little deeper look at these language tags!
Grammar rules
Note the following ABNF language description (ABNF syntax please refer to the syntax specification: BNF and ABNF)
Language-tag = Langtag
/privateuse
/grandfathered
Langtag = Language
["-" script]
["-" Region]
* ("-" variant)
* ("-" extension)
["-" privateuse]
Can see Language-Tag
divided into langtag
, privateuse
and grandfatherd
three subclasses, below we first understand the general situation to use not two!
Privateuse
The meaning of the label is not defined by the Subtag registry, but is defined, maintained and used by the team used.
Format:
Privateuse = "x" 1* ("-" (1*8alphanum))
Example: x-zh-CN
is privateuse, the meaning does not necessarily zh-CN
coincide with language.
Note: Only as a small group of internal use can, must not be applied in a wide range.
Grandfathered
For backward compatibility. Because the label before RFC 4646 does not exactly match the current registry label syntax and meaning, it provides backward-compatible features through grandfathered.
Grammar:
grandfathered = irregular/regualr irregular = "en-gb-oed"; irregular tags do not match/"I-ami"; th E ' langtag ' production and/"I-BNN"; would not otherwise to be/"I-default"; Considered ' well-formed '/"I-enochian"; These tags are all valid,/"I-hak"; But most are deprecated/"i-klingon"; In favor of the more modern/"I-lux";
Subtags or Subtag/"I-mingo"/"I-navajo"/"I-pwn"/"I-tao"/"I-tay"/"I-tsu"/"SGN-BE-FR" /"SGN-BE-NL"/"sgn-ch-de" regular = "Art-lojban"; These tags match the ' langtag '/"cel-gaulish"; Production, but their subtags/"No-bok"; are not extended language/"No-nyn"; Or variant subtags:their meaning/"Zh-guoyu"; is defined by their registration/"Zh-hakka"; And all of these are deprecated/"zh-min"; In favor's a more modern/"Zh-min-nan"; Subtag or sequence of subtags/"Zh-xiang"
Note: Almost all grandfarthered tags can be replaced by the current registry label and its group (as i-tao
can be tao
replaced), so use the current label if no surprises.
The following is our play Langtag, first we look at the Langtag under the first subtag--language.
Primary language Subtag
Like en
This is the primary language subtag, which identifies the language that the resource corresponds to.
Grammar:
Language = 2*3alpah
["-" Extlang]
/4ALPHA
/5*8alpha
Extlang = 3ALPHA
*2 ("-" 3ALPHA)
See language there are three forms, which makes me more curious is the first one 2*3ALPHA ["-" extlang]
. The front of this form 2*3ALPHA
, called Macrolanguage, is used to indicate a sum of resources corresponding to a language, while a specific language/dialect is specified by Extlang. The language, which contains the Extlang part, is also known as encompassed language.
such as zh-cmn
and zh-yue
is encompassed language, which zh
is Macrolanguage, cmn
and yue
then is Extlang.
The interesting thing here is that we think Mandarin and Cantonese are all dialects of Chinese, but the West thinks Putonghua and Cantonese are not a language at all, so it is zh-cmn
zh-yue
set up as redundant in the specification and recommended for direct use cmn
and yue
so on. However, because of historical reasons, we still use the zh-CN
representative cmn-CN
.
In addition, there are only 7 tags available as macrolanguage (,,,,, ar
kok
ms
sw
uz
, zh
and sgn
)
Several other similar subtags with CMN are as follows
CMN Mandarin (Mandarin, Mandarin)
Wuu Wu (Zhejiang dialect, Shanghai dialect)
Czh Hui language (Huizhou dialect, Yanzhou dialect, Wu-hui Yan pian)
Hak Hakka
Yue Cantonese (Cantonese)
nan Minnan dialect (Fujian dialect, Taiwanese dialect)
cpx Pu Xian dialect (Putian dialect, Xinghua language)
CDO Eastern
mnp North Fujian dialect
zco minzhong
gan gan dialect (Jiangxi dialect)
hsn Xiang (Hunan dialect)
Cjy Jin Language (Shanxi dialect, northern Shaanxi dialect)
Note: generally use all lowercase
Script subtag
Used to specify the language and dialect in which the handwriting or text system resources belong.
Grammar:
Script = 4ALPHA
Note: The general use of the first letter uppercase, subsequent letters all lowercase
Region subtag
Specify the language/dialect culture corresponding to the country or region.
Grammar:
Region = 2ALPHA
/3DIGIT
Note: Generally used in all capitals
Variant subtag
Specify additional information that other subtag cannot provide
Grammar:
Variant = 5*8alphanum
/(DIGIT 3alphanum)
Example: de-CH-1996
1996 of these are variant subtag, the overall meaning of which is used in Switzerland from 1996 modified German.
Extension subtag
Provides a mechanism for us to extend Langtag
Grammar:
Extension = Singleton 1* ("-" (2*8alphanum))
singleton = DIGIT/
%x41-57
/%x59-5a
/%x61-77
/% x79-7a
Only values that are Sigleton are now supported u
.
Example: de-DE-u-co-phonebk
This means that the contents are sorted by the way of the phone-book check.
For more information on Language-tag please refer to bcp 47
How to choose Language Tag
Bite the bullet to the content of so many norms, but I do not know how to combine the appropriate language-tag: (in fact, the selection and composition of the principle is only one
Keep Language-tag short and lean enough to distinguish the other language-tag in the current context
Example 1: The following Mandarin, Cantonese coexist
<p lang= "CMN" >
Xiao Chen: "Old man, how to go Oriental square?" "The old
man replied:" <span lang= "Yue" > You say Baa also ah? I can't hear you very well. </span> "
</p>
Example 2: The following Chinese people speak English, Hong Kong people speak Mandarin and Americans speak English
<p lang= "CMN" >
Xiao Chen: "<span lang=" EN-CN ">hi, where are you come from?</span>"
said Mr. Li: "<span lang= "CMN-HK" > Your English is as common as my mandarin, haha! </span> "
Simon said:" <span lang= "en" >hey, what ' s up!</span> "
</p>
So now the other question is, how do we know which values each subtag specifically defines?
Specifically defined in the IANA Language subtag registry.
If you find it is still inconvenient, then use language subtag Lookup tool!
In addition, if you do not know the language or dialect used in various countries, can be viewed through the Ethnologue, directly click on the area of the map to obtain the corresponding subtag information.
Summarize
Now we have a more comprehensive understanding of internationalization and localization, but also to language tag has a more in-depth understanding, now is not eager to pull up the sleeve of the code? Please look forward to the next "JS Magic Hall: Incomplete Internationalization & Localization Manual of the actual combat chapter"
Thank
Should the statement on the head of the page be lang= "en" or "lang=" ZH-CN?
Language subtag Registry
BCP 47
Language on the Web
Choosing a Language Tag
Language tags in HTML and XML