JS computes the number of bytes in a string

Source: Internet
Author: User

Recent projects have a need to use JS to calculate a string of strings written into the memory of the localstorage, it is well known that JS is encoded using Unicode. There are n implementations of Unicode, most of which are UTF-8 and UTF-16. Therefore, this article only discusses these two types of coding.

The following definition is excerpted from Wikipedia (Http://zh.wikipedia.org/zh-cn/UTF-8) and has been partially abridged.

Originally from: http://www.alloyteam.com/2013/12/js-calculate-the-number-of-bytes-occupied-by-a-string/

UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode that can represent any character in the Unicode Standard, and the first byte in its encoding is still compatible with ASCII. Encode each character using one to four bytes

The encoding rules are as follows:

Character codes between 000000–00007f, encoded with one byte;

The characters between 000080–0007ff with two bytes;

Three bytes between 000800–00d7ff and 00e000–00ffff, note: Unicode has no characters in the range d800-dfff;

The 010000–10ffff between the two is 4 bytes.

While UTF-16 is a fixed-length character encoding, most characters use two-byte encoding, and the character code exceeds 65535 using four bytes, as follows:

000000–00ffff of two bytes;

010000–10ffff of four bytes.

At first, since the page is UTF-8 encoded, the Localstorage string should also be encoded with UTF-8. But later, the test found that the size of the figure is less than 5MB, deposited localstorage but thrown abnormal. Think about it, the code of the page can be changed. If localstorage the string according to the encoding of the page, isn't it a mess? Browsers should all be encoded using UTF-16. The 5MB string was calculated using the UTF-16 code, and it was written in a smooth. Over then failed.

Well, attach the code implementation. The rule of calculation is written above, in order to calculate the speed, the two for loop is written separately.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061 /** * 计算字符串所占的内存字节数,默认使用UTF-8的编码方式计算,也可制定为UTF-16 * UTF-8 是一种可变长度的 Unicode 编码格式,使用一至四个字节为每个字符编码 * * 000000 - 00007F(128个代码)      0zzzzzzz(00-7F)                             一个字节 * 000080 - 0007FF(1920个代码)     110yyyyy(C0-DF) 10zzzzzz(80-BF)             两个字节 * 000800 - 00D7FF   00E000 - 00FFFF(61440个代码)    1110xxxx(E0-EF) 10yyyyyy 10zzzzzz           三个字节 * 010000 - 10FFFF(1048576个代码)  11110www(F0-F7) 10xxxxxx 10yyyyyy 10zzzzzz  四个字节 * * 注: Unicode在范围 D800-DFFF 中不存在任何字符 * {@link <a onclick="javascript:pageTracker._trackPageview(‘/outgoing/zh.wikipedia.org/wiki/UTF-8‘);" href="http://zh.wikipedia.org/wiki/UTF-8">http://zh.wikipedia.org/wiki/UTF-8</a>} * * UTF-16 大部分使用两个字节编码,编码超出 65535 的使用四个字节 * 000000 - 00FFFF  两个字节 * 010000 - 10FFFF  四个字节 * * {@link <a onclick="javascript:pageTracker._trackPageview(‘/outgoing/zh.wikipedia.org/wiki/UTF-16‘);"href="http://zh.wikipedia.org/wiki/UTF-16">http://zh.wikipedia.org/wiki/UTF-16</a>} * @param  {String} str * @param  {String} charset utf-8, utf-16 * @return {Number} */var sizeof = function(str, charset){    var total = 0,        charCode,        i,        len;    charset = charset ? charset.toLowerCase() : ‘‘;    if(charset === ‘utf-16‘ || charset === ‘utf16‘){        for(i = 0, len = str.length; i < len; i++){            charCode = str.charCodeAt(i);            if(charCode <= 0xffff){                total += 2;            }else{                total += 4;            }        }    }else{        for(i = 0, len = str.length; i < len; i++){            charCode = str.charCodeAt(i);            if(charCode <= 0x007f) {                total += 1;            }else if(charCode <= 0x07ff){                total += 2;            }else if(charCode <= 0xffff){                total += 3;            }else{                total += 4;            }        }    }    return total;}

JS computes the number of bytes in a string

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.