In practice, there is a big problem, in JavaScript, there may be some Chinese strings, we want to be binary stream encoding, we need to convert it to UTF8 encoding.
In other words, the input is a string: ‘呆滞的慢板今天挣了100块钱‘
.
The output is a sequence of bytes:
[229,145,134,230,187,158,231,154,132,230,133, 162, 230, 157, 191, 228, 187, 138, 229, 164, 169, 230, 140, 163, 228, 186, 134, 49, 48, 48, 229, 157, 151, 233, 146, 177]
Or is this a single-byte string:
"\xe5\x91\x86\xe6\xbb\x9e\xe7\x9a\x84\xe6\x85 \xa2\xe6\x9D \xbf\xe4\xBB \x8a\xe5\xA4 \xa9\xe6\x8C \xa3\xe4\xBA \x86100\xe5\x9D\x97\xe9\x92\ XB1 ""??????????? ¢??????? ¤??? £@@facesymbol@@ 100 É?± "
After unremitting toss, finally got to understand, there are two kinds of solutions:
1. Support
window.TextEncoder()
The situation
function str2utf8(str) { encoder = new TextEncoder(‘utf8‘); return encoder.encode(str);}
This returns an array of integers.
2. Use
encodeURI
Code to replace the
The principle is, if you use encodeURI(str)
, where if you encounter Chinese characters and the like, will follow the UTF8 encoding after %E5%91
the appearance, we use this, after the end of the %
replacement \x
, we get a single byte string.
function str2utf8(str) { return eval(‘\‘‘+encodeURI(str).replace(/%/gm, ‘\\x‘)+‘\‘‘);}
3. Integrated use
Therefore, we synthesize to define a compatible scenario:
var Str2utf8 = window. Textencoder?function(str) {var encoder = new textencoder ( UTF8 '); var bytes = Encoder.encode (str); var result = "; for (var i = 0; i < bytes.length; + +i) {result + = string.fromcharcode (Bytes[i]);} return result; : function (str) {return eval (" \ "+ encodeuri (str). replace (/%/gm, \\x ') +
"Reprint Please attach" is willing to this merit, return to >>
Original link: http://www.huangwenchao.com.cn/2015/09/javascript-utf8-encoding.html "javascript string for UTF8 encoding method"
JavaScript string UTF8 encoding method (GO)