Javascript strings are all string objects. You can use the length attribute of the string object to obtain the length. However, the minimum length unit is 1 for Chinese characters, full-angle characters, and English characters, this is not the same as strlen () in php.
Copy codeThe Code is as follows: function strlen (str ){
Var s = 0;
For (var I = 0; I <str. length; I ++ ){
If (str. charAt (I). match (/[u0391-uFFE5]/) {
S + = 2;
} Else {
S ++;
}
}
Return s;
}
If each character is captured and matched with a full-width character or a Chinese character, it is counted as two characters, and the other is counted as one character.Copy codeThe Code is as follows: <script>
Alert (fucCheckLength ("China "));
Function fucCheckLength (strTemp)
{
Var I, sum;
Sum = 0;
For (I = 0; I <strTemp. length; I ++)
{
If (strTemp. charCodeAt (I)> = 0) & (strTemp. charCodeAt (I) <= 255 ))
Sum = sum + 1;
Else
Sum = sum + 2;
}
Return sum;
}
</Script>
The result is: 5. What is the length of the byte? Note the differences between byte and character. The length of the byte is related to the encoding. For example, "China a", gbk/gb2312 encoding is five bytes, but if it is UTF-8, it is 7 bytes (UTF-8 is usually a Chinese character in 3 bytes ).
We can convert all the characters to gbk before performing operations, instanceCopy codeThe Code is as follows: function Utf8ToUnicode (strUtf8)
{
Var bstr = "";
Var nTotalChars = strUtf8.length; // total chars to be processed.
Var nOffset = 0; // processing point on strUtf8
Var nRemainingBytes = nTotalChars; // how many bytes left to be converted
Var nOutputPosition = 0;
Var iCode, iCode1, iCode2; // the value of the unicode.
While (nOffset <nTotalChars)
{
ICode = strUtf8.charCodeAt (nOffset );
If (iCode & 0x80) = 0) // 1 byte.
{
If (nRemainingBytes <1) // not enough data
Break;
Bstr + = String. fromCharCode (iCode & 0x7F );
NOffset ++;
NRemainingBytes-= 1;
}
Else if (iCode & 0xE0) = 0xC0) // 2 bytes
{
ICode1 = strUtf8.charCodeAt (nOffset + 1 );
If (nRemainingBytes <2 | // not enough data
(ICode1 & 0xC0 )! = 0x80) // invalid pattern
{
Break;
}
Bstr + = String. fromCharCode (iCode & 0x3F) <6) | (iCode1 & 0x3F ));
NOffset + = 2;
NRemainingBytes-= 2;
}
Else if (iCode & 0xF0) = 0xE0) // 3 bytes
{
ICode1 = strUtf8.charCodeAt (nOffset + 1 );
ICode2 = strUtf8.charCodeAt (nOffset + 2 );
If (nRemainingBytes <3 | // not enough data
(ICode1 & 0xC0 )! = 0x80 | // invalid pattern
(ICode2 & 0xC0 )! = 0x80)
{
Break;
}
Bstr + = String. fromCharCode (iCode & 0x0F) <12) |
(ICode1 & 0x3F) <6) |
(ICode2 & 0x3F ));
NOffset + = 3;
NRemainingBytes-= 3;
}
Else // 4 or more bytes -- unsupported
Break;
}
If (nRemainingBytes! = 0)
{
// Bad UTF8 string.
Return "";
}
Return bstr;
}
How to solve this problem. This article introduces how to use js to get the length of Chinese Text
First, we define a new function getBytes () to get the number of bytes of a string. In javascript, this function is a standard function.Copy codeThe Code is as follows: String. prototype. getBytes = function (){
Var cArr = this. match (/[^ x00-xff]/ig );
Return this. length + (cArr = null? 0: cArr. length );
}
Function paramCheck (cur ){
If (cur. value. getBytes ()> 64 ){
Alert ("more than 64 characters ");
Return false;
}
Return true;
}
GetBytes uses a regular expression to determine the number of Chinese characters contained in a string. All the Chinese characters contained are placed in the array cArr. In this way, the length of cArr is the total number of Chinese characters. The getBytes method returns the length plus the number of Chinese characters, which is the total number of bytes.
Only use the [^ x00-xff], this is a bit disgusting, some special characters can also be matched, such.
But if you use [^ u4E00-u9FA5], it cannot match Chinese ......
You can test the following methods:
One type:Copy codeThe Code is as follows: function _ length (str ){
Var len = 0;
For (var I = 0; I <str. length; I ++ ){
If (str. charAt (I)> '~ ') {Len + = 2;} else {len ++ ;}
}
Return len;
}
Two types:Copy codeThe Code is as follows: String. prototype. gblen = function (){
Var len = 0;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len + = 2;
} Else {
Len ++;
}
}
Return len;
}
String. prototype. gbtrim = function (len, s ){
Var str = '';
Var sp = s | '';
Var len2 = 0;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len2 + = 2;
} Else {
Len2 ++;
}
}
If (len2 <= len ){
Return this;
}
Len2 = 0;
Len = (len> sp. length )? Len-sp.length: len;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len2 + = 2;
} Else {
Len2 ++;
}
If (len2> len ){
Str + = sp;
Break;
}
Str + = this. charAt (I );
}
Return str;
}
Var str1 = 'World's best #%& World's Best #% ';
Document. write ('str1 = '+ str1 +'
');
Document. write ('length = '+ str1.gblen () +'
');
Document. write ('gbtrim (10) = '+ str1.gbtrim (10) +'
');
Document. write ('gbtrim (10 ,\'... \ ') =' + Str1.gbtrim (10 ,'... ') +'
');
Document. write ('gbtrim (12, \ '-\') = '+ str1.gbtrim (12,'-') +'
');
// Gbtrim (len truncation length, which is calculated based on the length of English bytes. The omitted characters After s truncation, such "... ")
// Note: Chinese characters are calculated as two lengths. Therefore, when len in gbtrim is 10, a maximum of five Chinese characters are displayed.
// When the number of Chinese characters is greater than 5, because "…" is added after the truncation, Therefore, only four Chinese characters are displayed.