JS computes the number of bytes in a string

Last Update:2015-06-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recent projects have a need to use JS to calculate a string of strings written into the memory of the localstorage, it is well known that JS is encoded using Unicode. There are n implementations of Unicode, most of which are UTF-8 and UTF-16. Therefore, this article only discusses these two types of coding.

The following definition is excerpted from Wikipedia (Http://zh.wikipedia.org/zh-cn/UTF-8) and has been partially abridged.

Originally from: http://www.alloyteam.com/2013/12/js-calculate-the-number-of-bytes-occupied-by-a-string/

UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode that can represent any character in the Unicode Standard, and the first byte in its encoding is still compatible with ASCII. Encode each character using one to four bytes

The encoding rules are as follows:

Character codes between 000000–00007f, encoded with one byte;

The characters between 000080–0007ff with two bytes;

Three bytes between 000800–00d7ff and 00e000–00ffff, note: Unicode has no characters in the range d800-dfff;

The 010000–10ffff between the two is 4 bytes.

While UTF-16 is a fixed-length character encoding, most characters use two-byte encoding, and the character code exceeds 65535 using four bytes, as follows:

000000–00ffff of two bytes;

010000–10ffff of four bytes.

At first, since the page is UTF-8 encoded, the Localstorage string should also be encoded with UTF-8. But later, the test found that the size of the figure is less than 5MB, deposited localstorage but thrown abnormal. Think about it, the code of the page can be changed. If localstorage the string according to the encoding of the page, isn't it a mess? Browsers should all be encoded using UTF-16. The 5MB string was calculated using the UTF-16 code, and it was written in a smooth. Over then failed.

Well, attach the code implementation. The rule of calculation is written above, in order to calculate the speed, the two for loop is written separately.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061 /** * 计算字符串所占的内存字节数，默认使用UTF-8的编码方式计算，也可制定为UTF-16 * UTF-8 是一种可变长度的 Unicode 编码格式，使用一至四个字节为每个字符编码 * * 000000 - 00007F(128个代码) 0zzzzzzz(00-7F) 一个字节 * 000080 - 0007FF(1920个代码) 110yyyyy(C0-DF) 10zzzzzz(80-BF) 两个字节 * 000800 - 00D7FF 00E000 - 00FFFF(61440个代码) 1110xxxx(E0-EF) 10yyyyyy 10zzzzzz 三个字节 * 010000 - 10FFFF(1048576个代码) 11110www(F0-F7) 10xxxxxx 10yyyyyy 10zzzzzz 四个字节 * * 注: Unicode在范围 D800-DFFF 中不存在任何字符 * {@link <a onclick="javascript:pageTracker._trackPageview(‘/outgoing/zh.wikipedia.org/wiki/UTF-8‘);" href="http://zh.wikipedia.org/wiki/UTF-8">http://zh.wikipedia.org/wiki/UTF-8</a>} * * UTF-16 大部分使用两个字节编码，编码超出 65535 的使用四个字节 * 000000 - 00FFFF 两个字节 * 010000 - 10FFFF 四个字节 * * {@link <a onclick="javascript:pageTracker._trackPageview(‘/outgoing/zh.wikipedia.org/wiki/UTF-16‘);"href="http://zh.wikipedia.org/wiki/UTF-16">http://zh.wikipedia.org/wiki/UTF-16</a>} * @param {String} str * @param {String} charset utf-8, utf-16 * @return {Number} */var sizeof = function(str, charset){ var total = 0, charCode, i, len; charset = charset ? charset.toLowerCase() : ‘‘; if(charset === ‘utf-16‘ || charset === ‘utf16‘){ for(i = 0, len = str.length; i < len; i++){ charCode = str.charCodeAt(i); if(charCode <= 0xffff){ total += 2; }else{ total += 4; } } }else{ for(i = 0, len = str.length; i < len; i++){ charCode = str.charCodeAt(i); if(charCode <= 0x007f) { total += 1; }else if(charCode <= 0x07ff){ total += 2; }else if(charCode <= 0xffff){ total += 3; }else{ total += 4; } } } return total;}

JS computes the number of bytes in a string

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

JS computes the number of bytes in a string

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

JS computes the number of bytes in a string

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support