The theory of JS incomplete internationalization & Localization manual, JS incomplete Internationalization
Preface
Vivo has recently joined a new project team in charge of frontend technology pre-Research and selection, involving a familiar and unfamiliar demand-internationalization & localization. I am familiar with the previous projects, but I am unfamiliar with the fact that the previous implementation only stays in the "have" stage. Take this opportunity to study and organize it to prepare for the subsequent technical selection.
This article describes the concepts of internationalization and localization, as well as a very important concept-Language tag (also called Language code or Culture ).
What is globalization?
Internationalization I think is the application of multi-language and cultural customs (numbers, currency, date and character comparison algorithms, etc ), localization means that applications can automatically adapt to the corresponding language and culture versions to identify the cultural customs of users.
Tokens used to think that internationalization is a replacement of strings-for example, "Hello! "Replace with" What's up, man! ", In fact, is divided into the following five aspects:
- String replacement
Example"Hello! "
Replace"What's up, man!"
.
- Numeric representation
Example1200.01
In English1,200.01
While the French is1 200,01
, German is1.200,01
.
- Currency Representation
RMB¥1,200.01
, USD is expressed$1,200.01
And the euro in English is?1,200.01
The German euro is1.200,01 ?
.
Note: The exchange rate has not been calculated yet.
- Date Representation
ExampleSeptember 15, 2016
In English9/15/2016
While the French15/9/2016
, German15.9.2016
.
-
Character comparison algorithm
Exampleä
Andz
When compared, both English and German areä
Rankedz
In Swedishz
Rankedä
Front.
The key to localization-Language TagSince NLP must be automatically adapted to the user's language and culture version, it must have a basis for identification? I think you shouldzh-CN
Anden
No stranger, and they are exactly what we need! When we use an existing i18n library for internationalization/localization, We will write the following documents:
{"En": {"name": "Enter Name"}, "zh-CN": {"name": "Enter name "}}
Excepten
Andzh-CN
Are there other keys? What are their composition rules? Let's take a closer look at these Language tags!
Syntax Rules
Note that the following uses the ABNF language description (for the ABNF syntax, see the syntax specifications: BNF and ABNF)
Language-Tag = langtag / privateuse / grandfatheredlangtag = language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse]
We can see thatLanguage-Tag
Dividedlangtag
,privateuse
Andgrandfatherd
Three sub-classes. Let's take a look at two sub-classes that are generally unavailable!
Privateuse
The intent label is defined by the subtag registry, but is defined, maintained, and used by the teams in use.
Region format:
privateuse = "x" 1*("-" (1*8alphanum))
Example:x-zh-CN
Is privateuse, which does not necessarily mean languagezh-CN
.
Note: it can only be used within a small group and cannot be applied in a wide range.
Grandfathered
Backward compatibility. Because tags before RFC 4646 cannot fully match the syntax and meaning of the current registry tag, grandfathered provides backward compatibility.
Merge Syntax:
grandfathered = irregular / regualrirregular = "en-GB-oed" ; irregular tags do not match / "i-ami" ; the 'langtag' production and / "i-bnn" ; would not otherwise be / "i-default" ; considered 'well-formed' / "i-enochian" ; These tags are all valid, / "i-hak" ; but most are deprecated / "i-klingon" ; in favor of more modern / "i-lux" ; subtags or subtag / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" / "i-tay" / "i-tsu" / "sgn-BE-FR" / "sgn-BE-NL" / "sgn-CH-DE"regular = "art-lojban" ; these tags match the 'langtag' / "cel-gaulish" ; production, but their subtags / "no-bok" ; are not extended language / "no-nyn" ; or variant subtags: their meaning / "zh-guoyu" ; is defined by their registration / "zh-hakka" ; and all of these are deprecated / "zh-min" ; in favor of a more modern / "zh-min-nan" ; subtag or sequence of subtags / "zh-xiang"
Note: almost all grandfarthered labels can be replaced by the current registry labels and their combinations (suchi-tao
Can betao
.
Next we will see the first subtag-language in langtag.
Primary language subtag
Photoen
This is the Primary language subtag used to identify the language of the resource.
Merge Syntax:
language = 2*3ALPAH ["-" extlang] / 4ALPHA / 5*8ALPHAextlang = 3ALPHA *2("-" 3ALPHA)
There are three forms of language, which make me curious: the first one.2*3ALPHA ["-" extlang]
. In this form2*3ALPHA
Called macrolanguage, it is used to indicate the summary of a resource corresponding to a language, and a specific language/dialect is specified through extlang. The language that contains the extlang part is also called encompassed language.
For examplezh-cmn
Andzh-yue
Is encompassed language, wherezh
Is macrolanguage, andcmn
Andyue
Is extlang.
The interesting thing here is that we think Mandarin and Cantonese are both Chinese dialects, but the West thinks Mandarin and Cantonese are not a language at all.zh-cmn
Andzh-yue
It is set to redundant in the specification. We recommend that you use it directly.cmn
Andyue
. However, we still usezh-CN
Representativecmn-CN
.
Then, only seven labels (ar
,kok
,ms
,sw
,uz
,zh
Andsgn
)
Other subtags similar to CEN are as follows:
CEN mandarin (Mandarin and Mandarin) wuu wuyu (Jiangzhe dialect, Shanghai dialect) czh huiyu (Huizhou dialect, Yanzhou dialect, wuyu-Huiyan film) hak Hakka yue Cantonese (Cantonese) nan Min nan (Fujian dialect, Taiwan dialect) cpx Pu Xian dialect (Putian dialect, Xinghua dialect) cdo min Dong Mandarin mnp min bei dialect zco min Zhong dialect gan dialect (Jiangxi dialect) hsn Xiang dialect (Hunan dialect) cjy (Shanxi dialect and northern Shaanxi dialect)
Note: Generally, all lowercase letters are used.
Script subtag
The identifier is used to specify the language and dialect to which the handwriting or text system resources belong.
Merge Syntax:
script = 4ALPHA
Note: Generally, uppercase letters are used, and all subsequent letters are in lowercase.
Region subtag
Specifies the language/dialect culture corresponding to the country or region.
Merge Syntax:
region = 2ALPHA / 3DIGIT
Note: Generally, all uppercase letters are used.
Variant subtag
Additional information that cannot be provided by other subtags
Merge Syntax:
variant = 5*8alphanum / (DIGIT 3alphanum)
Example:de-CH-1996
Among them, 1996 is the variant subtag, which means that Switzerland uses the German language improved from 1996.
Extension subtag
Token provides a mechanism for us to expand the langtag.
Merge Syntax:
extension = singleton 1*("-" (2*8alphanum))singleton = DIGIT / %x41-57 / %x59-5A / %x61-77 / %x79-7A
Currently onlyu
As the value of sigleton.
Example:de-DE-u-co-phonebk
Sort the content by phone book check.