Javascript computes the number of bytes in a string

Source: Internet
Author: User

Recent projects have a need to use JS to calculate a string of strings written into the memory of the localstorage, it is well known that JS is encoded using Unicode. There are n implementations of Unicode, most of which are UTF-8 and UTF-16. Therefore, this article only discusses these two types of coding.

The following definition is excerpted from Wikipedia (Http://zh.wikipedia.org/zh-cn/UTF-8) and has been partially abridged.

UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode that can represent any character in the Unicode Standard, and the first byte in its encoding is still compatible with ASCII. Encode each character using one to four bytes

The encoding rules are as follows:

    1. Character codes between 000000–00007f, encoded with one byte;
    2. The characters between 000080–0007ff with two bytes;
    3. Three bytes between 000800–00d7ff and 00e000–00ffff, note: Unicode has no characters in the range d800-dfff;
    4. The 010000–10ffff between the two is 4 bytes.

While UTF-16 is a fixed-length character encoding, most characters use two-byte encoding, and the character code exceeds 65535 using four bytes, as follows:

    1. 000000–00ffff of two bytes;
    2. 010000–10ffff of four bytes.

At first, since the page is UTF-8 encoded, the Localstorage string should also be encoded with UTF-8. But later, the test found that the size of the figure is less than 5MB, deposited localstorage but thrown abnormal. Think about it, the code of the page can be changed. If localstorage the string according to the encoding of the page, isn't it a mess? Browsers should all be encoded using UTF-16. The 5MB string was calculated using the UTF-16 code, and it was written in a smooth. Over then failed.

Well, attach the code implementation. The rule of calculation is written above, in order to calculate the speed, the two for loop is written separately.

/** * Calculates the number of bytes of memory in a string, which is computed by default using UTF-8 encoding, or UTF-16 * UTF-8 is a variable-length Unicode encoding format, encoded with one to four bytes per character * * 000000  -00007F (128 code) 0zzzzzzz (00-7f) One byte * 000080-0007FF (1920 code) 110YYYYY (C0-DF) 10zzzzzz (80-BF) Two bytes * 000800-00d7ff 00e000-00ffff (61,440 codes) 1110xxxx (E0-EF) 10yyyyyy 10zzz ZZZ Three bytes * 010000-10FFFF (1,048,576 codes) 11110www (F0-F7) 10xxxxxx 10yyyyyy 10zzzzzz Four bytes * * Note: Uni Code does not have any characters in range d800-dfff * {@link http://zh.wikipedia.org/wiki/UTF-8} * * UTF-16 most uses two byte encoding, encoding exceeds 65535 usage     Four bytes * 000000-00ffff two bytes * 010000-10ffff Four bytes * * {@link http://zh.wikipedia.org/wiki/UTF-16} * @param {string} str * @param {string} charset Utf-8, utf-16 * @return {Number} */var sizeof = Functio        N (str, charset) {var total = 0, charcode, I, Len; CharSet = CharSet?  Charset.tolowercase (): ";      if (charset = = = ' Utf-16 ' | | charset = = = ' Utf16 ') {for (i = 0, len = str.length; i < Len; i++) {                CharCode = Str.charcodeat (i);                if (charcode <= 0xffff) {total + = 2;                }else{Total + = 4; }}}else{for (i = 0, len = str.length; i < Len; i++) {charcode = Str.charc                Odeat (i);                if (charcode <= 0x007f) {total + = 1;                }else if (charcode <= 0x07ff) {total + = 2;                }else if (charcode <= 0xffff) {total + = 3;                }else{Total + = 4;    }}} return total; }
reprinted from alloyteam:http://www.alloyteam.com/2013/12/js-calculate-the-number-of-bytes-occupied-by-a-string/

Javascript computes the number of bytes in a string

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.