Javascript function code used to determine the length of Chinese Characters

Source: Internet
Author: User
Tags 0xc0

Javascript strings are all string objects. You can use the length attribute of the string object to obtain the length. However, the minimum length unit is 1 for Chinese characters, full-angle characters, and English characters, this is not the same as strlen () in php.

Copy codeThe Code is as follows: function strlen (str ){
Var s = 0;
For (var I = 0; I <str. length; I ++ ){
If (str. charAt (I). match (/[u0391-uFFE5]/) {
S + = 2;
} Else {
S ++;
}
}
Return s;
}

If each character is captured and matched with a full-width character or a Chinese character, it is counted as two characters, and the other is counted as one character.Copy codeThe Code is as follows: <script>
Alert (fucCheckLength ("China "));
Function fucCheckLength (strTemp)
{
Var I, sum;
Sum = 0;
For (I = 0; I <strTemp. length; I ++)
{
If (strTemp. charCodeAt (I)> = 0) & (strTemp. charCodeAt (I) <= 255 ))
Sum = sum + 1;
Else
Sum = sum + 2;
}
Return sum;
}
</Script>

The result is: 5. What is the length of the byte? Note the differences between byte and character. The length of the byte is related to the encoding. For example, "China a", gbk/gb2312 encoding is five bytes, but if it is UTF-8, it is 7 bytes (UTF-8 is usually a Chinese character in 3 bytes ).
We can convert all the characters to gbk before performing operations, instanceCopy codeThe Code is as follows: function Utf8ToUnicode (strUtf8)
{
Var bstr = "";
Var nTotalChars = strUtf8.length; // total chars to be processed.
Var nOffset = 0; // processing point on strUtf8
Var nRemainingBytes = nTotalChars; // how many bytes left to be converted
Var nOutputPosition = 0;
Var iCode, iCode1, iCode2; // the value of the unicode.
While (nOffset <nTotalChars)
{
ICode = strUtf8.charCodeAt (nOffset );
If (iCode & 0x80) = 0) // 1 byte.
{
If (nRemainingBytes <1) // not enough data
Break;
Bstr + = String. fromCharCode (iCode & 0x7F );
NOffset ++;
NRemainingBytes-= 1;
}
Else if (iCode & 0xE0) = 0xC0) // 2 bytes
{
ICode1 = strUtf8.charCodeAt (nOffset + 1 );
If (nRemainingBytes <2 | // not enough data
(ICode1 & 0xC0 )! = 0x80) // invalid pattern
{
Break;
}
Bstr + = String. fromCharCode (iCode & 0x3F) <6) | (iCode1 & 0x3F ));
NOffset + = 2;
NRemainingBytes-= 2;
}
Else if (iCode & 0xF0) = 0xE0) // 3 bytes
{
ICode1 = strUtf8.charCodeAt (nOffset + 1 );
ICode2 = strUtf8.charCodeAt (nOffset + 2 );
If (nRemainingBytes <3 | // not enough data
(ICode1 & 0xC0 )! = 0x80 | // invalid pattern
(ICode2 & 0xC0 )! = 0x80)
{
Break;
}
Bstr + = String. fromCharCode (iCode & 0x0F) <12) |
(ICode1 & 0x3F) <6) |
(ICode2 & 0x3F ));
NOffset + = 3;
NRemainingBytes-= 3;
}
Else // 4 or more bytes -- unsupported
Break;
}
If (nRemainingBytes! = 0)
{
// Bad UTF8 string.
Return "";
}
Return bstr;
}

How to solve this problem. This article introduces how to use js to get the length of Chinese Text

First, we define a new function getBytes () to get the number of bytes of a string. In javascript, this function is a standard function.Copy codeThe Code is as follows: String. prototype. getBytes = function (){
Var cArr = this. match (/[^ x00-xff]/ig );
Return this. length + (cArr = null? 0: cArr. length );
}
Function paramCheck (cur ){
If (cur. value. getBytes ()> 64 ){
Alert ("more than 64 characters ");
Return false;
}
Return true;
}

GetBytes uses a regular expression to determine the number of Chinese characters contained in a string. All the Chinese characters contained are placed in the array cArr. In this way, the length of cArr is the total number of Chinese characters. The getBytes method returns the length plus the number of Chinese characters, which is the total number of bytes.
Only use the [^ x00-xff], this is a bit disgusting, some special characters can also be matched, such.
But if you use [^ u4E00-u9FA5], it cannot match Chinese ......

You can test the following methods:
One type:Copy codeThe Code is as follows: function _ length (str ){
Var len = 0;
For (var I = 0; I <str. length; I ++ ){
If (str. charAt (I)> '~ ') {Len + = 2;} else {len ++ ;}
}
Return len;
}

Two types:Copy codeThe Code is as follows: String. prototype. gblen = function (){
Var len = 0;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len + = 2;
} Else {
Len ++;
}
}
Return len;
}
String. prototype. gbtrim = function (len, s ){
Var str = '';
Var sp = s | '';
Var len2 = 0;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len2 + = 2;
} Else {
Len2 ++;
}
}
If (len2 <= len ){
Return this;
}
Len2 = 0;
Len = (len> sp. length )? Len-sp.length: len;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len2 + = 2;
} Else {
Len2 ++;
}
If (len2> len ){
Str + = sp;
Break;
}
Str + = this. charAt (I );
}
Return str;
}
Var str1 = 'World's best #%& World's Best #% ';
Document. write ('str1 = '+ str1 +'
');
Document. write ('length = '+ str1.gblen () +'
');
Document. write ('gbtrim (10) = '+ str1.gbtrim (10) +'
');
Document. write ('gbtrim (10 ,\'... \ ') =' + Str1.gbtrim (10 ,'... ') +'
');
Document. write ('gbtrim (12, \ '-\') = '+ str1.gbtrim (12,'-') +'
');

// Gbtrim (len truncation length, which is calculated based on the length of English bytes. The omitted characters After s truncation, such "... ")
// Note: Chinese characters are calculated as two lengths. Therefore, when len in gbtrim is 10, a maximum of five Chinese characters are displayed.
// When the number of Chinese characters is greater than 5, because "…" is added after the truncation, Therefore, only four Chinese characters are displayed.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.