Unicode and JavaScript

Last Update:2017-09-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reference article:

Http://www.ruanyifeng.com/blog/2014/12/unicode.html

Unicode comes from a very simple idea: to include all the characters of the world in a single set, the computer can display all the characters as long as it supports this character set, and no more garbled.

It starts at 0 and assigns a number to each symbol, which is called a code point.

Null

U+ indicates that the hexadecimal number immediately following the code point is Unicode.

The JavaScript language uses the Unicode character set, but only one encoding method is supported.

JavaScript uses ucs-2!.

Since JavaScript can only handle UCS-2 encoding, all characters are 2 bytes in the language, and if they are 4-byte characters, they are treated as two double-byte characters. JavaScript's character functions are affected by this and cannot return the correct results.

The next version of JavaScript, ECMAScript 6 (abbreviated ES6), greatly enhanced Unicode support, which basically solves this problem.

(1) correct character recognition

The ES6 can automatically identify 4-byte code points. As a result, traversing strings is much simpler.

 for (let S of String) {  //  ...}

However, in order to maintain compatibility, the length property is the original behavior. In order to get the correct length of the string, you can use the following method.

Array.from (string). length

(2) Code point notation

JavaScript allows code points to be used to represent Unicode characters, which are "backslash +u+ code points".

// true

However, this notation is not valid for 4-byte code points. ES6 fixed this problem, as long as the code point in the curly braces, it can be correctly identified.

(3) String processing function

ES6 has added several functions that specialize in handling 4-byte code points.

String.fromcodepoint (): Returns the corresponding character from a Unicode code point
String.prototype.codePointAt (): Returns the corresponding code point from the character
String.prototype.at (): Returns the character of the given position of the string

(4) Regular expressions

ES6 provides a U modifier to add support for 4-byte code points to regular expressions.

(5) Unicode Normalization

Some characters have additional symbols in addition to the letters. For example, the ǒ of Hanyu Pinyin, the tones above the letters are attached symbols. For many European languages, the tone symbol is very important.

Unicode provides two methods of representation. One is a single character with an additional symbol, that is, a code point for a character, such as the code point of Ǒ is U+01D1, the other is the additional symbol as a code point, and the main character compound display, that is, two code points to represent a character, such as Ǒ can be written O (u+004f) +ˇ (u+030c).

// method One ' \u01d1 '//  ' ǒ '//  method Two ' \u004f\u030c '//  ' ǒ '

These two representations, both visual and semantic, should be treated as equivalent situations. However, JavaScript cannot be distinguished.

' \u01d1 ' = = = ' \u004f\u030c '  //false

ES6 provides a normalize method that allows "Unicode normalization", which turns both methods into the same sequence.

' \u01d1 '. Normalize () = = = ' \u004f\u030c '. Normalize  ()//  true

Unicode and JavaScript

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Unicode and JavaScript

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support