Background
JS is encoded using Unicode. There are n implementations of Unicode, most of which are UTF-8 and UTF-16.
UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode that can represent any character in the Unicode Standard, and the first byte in its encoding is still compatible with ASCII. Encode each character using one to four bytes
The encoding rules are as follows:
Character codes between 000000–00007f, encoded with one byte;
The characters between 000080–0007ff with two bytes;
Three bytes between 000800–00d7ff and 00e000–00ffff, note: Unicode has no characters in the range d800-dfff;
The 010000–10ffff between the two is 4 bytes.
While UTF-16 is a fixed-length character encoding, most characters use two-byte encoding, and the character code exceeds 65535 using four bytes, as follows:
000000–00ffff of two bytes;
010000–10ffff of four bytes.
/** * computes the number of bytes of memory in a string, which is calculated by default using UTF-8 encoding, or as utf-16 * UTF-8 is a variable-length Unicode encoding format that uses one to four bytes for each character encoding * * 000000 - 00007f (128 codes) 0zzzzzzz (00-7f) One byte &NBSP;&NBSP;*&NBSP;000080&NBSP;-&NBSP;0007FF (1920 code) &NBSP;&NBSP;&NBSP;&NBSP;&NBSP;110YYYYY (C0-DF) 10zzzzzz (80-bf) two bytes * 000800 - 00D7FF 00E000 -&NBSP;00FFFF (61,440 Code) 1110xxxx (E0-EF) 10yyyyyy 10zzzzzz three bytes of * 010000 - 10ffff (1,048,576 code) 11110www ( F0-f7) 10xxxxxx 10yyyyyy 10zzzzzz four bytes * * Note: unicode does not have any characters in range D800-DFFF * {@link http://zh.wikipedia.org/wiki/UTF-8} * * UTF-16 most use two byte encoding, encode out of 65535 use four bytes * 000000 - 00ffff two bytes * 010000 - 10FFFF Four bytes * * {@link http:// zh.wikipedia.org/wiki/utf-16} * @param {String} str * @param {String} charset utf-8, utf-16 * @reTurn {number} */ var sizeof = function (Str, charset) { var total = 0, charCode, i, len; charset = charset ? charset.tolowercase () : "; if (charset === ' utf-16 ' | | charset === ' UTF16 ') { for (i = 0, len = str.length; i < len; i++) { charCode = Str.charcodeat (i); &Nbsp; if (charCode <= &NBSP;0XFFFF) { total += 2; }else{ total += 4; } } }else{ for (i = 0, len = str.length; i < len; i++) { charCode = str.charcodeat (i); if (charcode <= 0x007f) { total += 1; }else if (charCode < =&NBSP;0X07FF) { total += 2; }else if (CHARCODE&NBSP;<=&NBSP;0XFFFF) { total += 3; }else{ total += 4; } } } return total; }
JS computes the number of bytes in a string