Unicode and JavaScript

Source: Internet
Author: User

Reference article:

Http://www.ruanyifeng.com/blog/2014/12/unicode.html

Unicode comes from a very simple idea: to include all the characters of the world in a single set, the computer can display all the characters as long as it supports this character set, and no more garbled.

It starts at 0 and assigns a number to each symbol, which is called a code point.

Null

U+ indicates that the hexadecimal number immediately following the code point is Unicode.

The JavaScript language uses the Unicode character set, but only one encoding method is supported.

JavaScript uses ucs-2!.

Since JavaScript can only handle UCS-2 encoding, all characters are 2 bytes in the language, and if they are 4-byte characters, they are treated as two double-byte characters. JavaScript's character functions are affected by this and cannot return the correct results.

The next version of JavaScript, ECMAScript 6 (abbreviated ES6), greatly enhanced Unicode support, which basically solves this problem.

(1) correct character recognition

The ES6 can automatically identify 4-byte code points. As a result, traversing strings is much simpler.

 for (let S of String) {  //  ...}

However, in order to maintain compatibility, the length property is the original behavior. In order to get the correct length of the string, you can use the following method.

Array.from (string). length

(2) Code point notation

JavaScript allows code points to be used to represent Unicode characters, which are "backslash +u+ code points".

// true

However, this notation is not valid for 4-byte code points. ES6 fixed this problem, as long as the code point in the curly braces, it can be correctly identified.

(3) String processing function

ES6 has added several functions that specialize in handling 4-byte code points.

    • String.fromcodepoint (): Returns the corresponding character from a Unicode code point
    • String.prototype.codePointAt (): Returns the corresponding code point from the character
    • String.prototype.at (): Returns the character of the given position of the string

(4) Regular expressions

ES6 provides a U modifier to add support for 4-byte code points to regular expressions.

(5) Unicode Normalization

Some characters have additional symbols in addition to the letters. For example, the ǒ of Hanyu Pinyin, the tones above the letters are attached symbols. For many European languages, the tone symbol is very important.

Unicode provides two methods of representation. One is a single character with an additional symbol, that is, a code point for a character, such as the code point of Ǒ is U+01D1, the other is the additional symbol as a code point, and the main character compound display, that is, two code points to represent a character, such as Ǒ can be written O (u+004f) +ˇ (u+030c).

// method One ' \u01d1 '//  ' ǒ '//  method Two ' \u004f\u030c '//  ' ǒ '

These two representations, both visual and semantic, should be treated as equivalent situations. However, JavaScript cannot be distinguished.

' \u01d1 ' = = = ' \u004f\u030c '  //false

ES6 provides a normalize method that allows "Unicode normalization", which turns both methods into the same sequence.

' \u01d1 '. Normalize () = = = ' \u004f\u030c '. Normalize  ()//  true

Unicode and JavaScript

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.