How to determine whether a character occupies two characters in Javascript

Source: Internet
Author: User
You will often encounter this problem. The length of a string is different from the actual number of bytes occupied, because it contains Chinese characters with full-width characters.

(1) An escape function can be used to obtain a hexadecimal Unicode code of a character, for example:
% U code (when the code length is 2, u can be omitted. If U is added, 0 must be filled in front to ensure the length of 4 characters)
Use Unescape to get the character corresponding to the hexadecimal code
Output on the webpage: You can also use the & # decimal code; to output characters corresponding to a specific code.

(2) In JavaScript, you can also use string. charcodeat (I) in a string to obtain the 10-digit Unicode code of the character,

You can use string. fromcharcode (decimal code) to obtain the character corresponding to this code.

However, I still did not find a reasonable way to determine how many bytes a character occupies.

Some people say that the character with a decimal Unicode code greater than 255 occupies two bytes. I don't know if there is any evidence.

(3) In VBScript, I judge whether the ASC code is negative or not to determine whether it occupies two bytes. In this case, is it correct (ASC () to obtain the ASC code of a character, CHR () obtains the characters corresponding to an ASC code)

(4) How can I obtain the ASC code in JavaScript? Please help me with this!
Is the Unicode code the same as the ASC code? I don't think

(5) I found that ASC generally ranges from 32 to 128, followed by a negative number. From 0 to 31? What is it after 129? I don't know.
The negative number ASC, we all think it occupies two bytes of Chinese characters, full-angle characters and special characters.
The unicode format is the same as that of ASC. While some of the ASC Numbers From Unicode 129 to more are negative, and some do not have ASC. So I think it is possible to think that Unicode occupies two characters from more than 129?

That is, there is a conflict between the two-byte characters. If the ASC code is a negative number, it is assumed that the Unicode code above 129 is a double byte character, what is circulating on the internet is
String. Prototype. Len = function ()
{
Return this. Replace (/[^/x00-/xFF]/g, "**"). length;
}
That is, more than 256 characters are considered double byte characters. 129-255 is considered to be single-byte characters.

Let's discuss it.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.