Effective JavaScript String Encoding Item 7

Source: Internet
Author: User

this series as effective JavaScript 's reading notes.

filedUnicode, maybe a lot of programmers will find this thing cumbersome, but in essence,Unicodeis not complicated. Every word in every language in the world is represented by an integer value, and the range is0to the1114111, this value isUnicodeThe term is calledCode Point. On the mapping of characters to shaping values,Unicodeand other coding methods such asASCIIthere is no difference.

However, Unicode There are many ways to encode, and ASCII there is only one way:

Character

Encoding method

ascii

ascii Encoding, e.g.

Unicode

UTF-8, UTF-16, UTF-32, etc

So why Unicode How many encodings do you have? Because the time and space requirements for the operation are not the same in different situations.

and at the beginning of the design,Unicodeestimate all theCode Pointscan be2of the -the second party, i.e.65536to express. This is the way to encodeUCS-2, it is the original forUnicodeof the -bit encoding method. By this way, each of theCode Pointcan use a -is represented by the value of the bit, which is referred to asCode Unit. The advantage of this representation is that theUnicodethe index operation of a string can be done in constant time, because all characters are -bit, i.e.2bits of the expression.

because of the convenience of this coding method, some platforms such as Java , JavaScript have adopted it. As a result,each character of aJavaScript string is represented by 2 bits.

and asUnicodethe extension of the character set,65536has not satisfied the demand, at presentUnicodethe number of characters in the character set has exceeded2of the -The second party. As a result, the newly added parts are organized into -a2of the -Sub-range consisting of the second party. (* 2^16 = 1114112, so the currentUnicodeof theCode PointRange is0-1114111)

the first child range to accommodate the original UCS-2 in the character set, it is also known as Basic Multilingual Plane (BMP) . The remainder of the range is called supplementary Planes.

in order to represent more characters, UCS-2 's successor UTF-16 , is designed like this:

forCode Pointgreater than or equal65536the characters, by a pair of -bit ofCode Unitrepresentation. ForCode Pointless than65536character, or just need to1a -bit ofCode Unitrepresentation. Therefore,UTF-16is a variable-length coding method, so theCode PointdoIndexingThe operation is not a constant time. It usually needs to be searched backwards from the beginning of the string.

forJavaScript, the string'slengthProperties,charAtas wellcharCodeAtmethods, all inCode Uniton the basis of work rather thanCode Point. Therefore, whenJavaScriptneed to be expressed forSupplementary Planein theCode Point, it will use twoCodeUnitto say, in short:

JavaScript string is by - bit of Code Unit composed of.

So, when you need to deal with BMP outside the Code Point can cause problems, because you can't rely on length Properties, charAt as well charCodeAt method. It is time to consider using some mature third-party libraries.

Summarize:

  1. Java Script 16 bit code Unit composition, Instead of unicode Code point the composition.
  2. 65536 code point javascript code Units surrogate Pair
  3. Surrogate Pair length charat charcodeat The way it works.
  4. processing Code Point more than 65535 string, consider using a third-party library and consult its documentation.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.