Javascript function code used to determine the length of Chinese Characters

Last Update:2018-12-08 Source: Internet

Author: User

Tags 0xc0

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Javascript strings are all string objects. You can use the length attribute of the string object to obtain the length. However, the minimum length unit is 1 for Chinese characters, full-angle characters, and English characters, this is not the same as strlen () in php.

Copy codeThe Code is as follows: function strlen (str ){
Var s = 0;
For (var I = 0; I <str. length; I ++ ){
If (str. charAt (I). match (/[u0391-uFFE5]/) {
S + = 2;
} Else {
S ++;
}
}
Return s;
}

If each character is captured and matched with a full-width character or a Chinese character, it is counted as two characters, and the other is counted as one character.Copy codeThe Code is as follows: <script>
Alert (fucCheckLength ("China "));
Function fucCheckLength (strTemp)
{
Var I, sum;
Sum = 0;
For (I = 0; I <strTemp. length; I ++)
{
If (strTemp. charCodeAt (I)> = 0) & (strTemp. charCodeAt (I) <= 255 ))
Sum = sum + 1;
Else
Sum = sum + 2;
}
Return sum;
}
</Script>

The result is: 5. What is the length of the byte? Note the differences between byte and character. The length of the byte is related to the encoding. For example, "China a", gbk/gb2312 encoding is five bytes, but if it is UTF-8, it is 7 bytes (UTF-8 is usually a Chinese character in 3 bytes ).
We can convert all the characters to gbk before performing operations, instanceCopy codeThe Code is as follows: function Utf8ToUnicode (strUtf8)
{
Var bstr = "";
Var nTotalChars = strUtf8.length; // total chars to be processed.
Var nOffset = 0; // processing point on strUtf8
Var nRemainingBytes = nTotalChars; // how many bytes left to be converted
Var nOutputPosition = 0;
Var iCode, iCode1, iCode2; // the value of the unicode.
While (nOffset <nTotalChars)
{
ICode = strUtf8.charCodeAt (nOffset );
If (iCode & 0x80) = 0) // 1 byte.
{
If (nRemainingBytes <1) // not enough data
Break;
Bstr + = String. fromCharCode (iCode & 0x7F );
NOffset ++;
NRemainingBytes-= 1;
}
Else if (iCode & 0xE0) = 0xC0) // 2 bytes
{
ICode1 = strUtf8.charCodeAt (nOffset + 1 );
If (nRemainingBytes <2 | // not enough data
(ICode1 & 0xC0 )! = 0x80) // invalid pattern
{
Break;
}
Bstr + = String. fromCharCode (iCode & 0x3F) <6) | (iCode1 & 0x3F ));
NOffset + = 2;
NRemainingBytes-= 2;
}
Else if (iCode & 0xF0) = 0xE0) // 3 bytes
{
ICode1 = strUtf8.charCodeAt (nOffset + 1 );
ICode2 = strUtf8.charCodeAt (nOffset + 2 );
If (nRemainingBytes <3 | // not enough data
(ICode1 & 0xC0 )! = 0x80 | // invalid pattern
(ICode2 & 0xC0 )! = 0x80)
{
Break;
}
Bstr + = String. fromCharCode (iCode & 0x0F) <12) |
(ICode1 & 0x3F) <6) |
(ICode2 & 0x3F ));
NOffset + = 3;
NRemainingBytes-= 3;
}
Else // 4 or more bytes -- unsupported
Break;
}
If (nRemainingBytes! = 0)
{
// Bad UTF8 string.
Return "";
}
Return bstr;
}

How to solve this problem. This article introduces how to use js to get the length of Chinese Text

First, we define a new function getBytes () to get the number of bytes of a string. In javascript, this function is a standard function.Copy codeThe Code is as follows: String. prototype. getBytes = function (){
Var cArr = this. match (/[^ x00-xff]/ig );
Return this. length + (cArr = null? 0: cArr. length );
}
Function paramCheck (cur ){
If (cur. value. getBytes ()> 64 ){
Alert ("more than 64 characters ");
Return false;
}
Return true;
}

GetBytes uses a regular expression to determine the number of Chinese characters contained in a string. All the Chinese characters contained are placed in the array cArr. In this way, the length of cArr is the total number of Chinese characters. The getBytes method returns the length plus the number of Chinese characters, which is the total number of bytes.
Only use the [^ x00-xff], this is a bit disgusting, some special characters can also be matched, such.
But if you use [^ u4E00-u9FA5], it cannot match Chinese ......

You can test the following methods:
One type:Copy codeThe Code is as follows: function _ length (str ){
Var len = 0;
For (var I = 0; I <str. length; I ++ ){
If (str. charAt (I)> '~ ') {Len + = 2;} else {len ++ ;}
}
Return len;
}

Two types:Copy codeThe Code is as follows: String. prototype. gblen = function (){
Var len = 0;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len + = 2;
} Else {
Len ++;
}
}
Return len;
}
String. prototype. gbtrim = function (len, s ){
Var str = '';
Var sp = s | '';
Var len2 = 0;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len2 + = 2;
} Else {
Len2 ++;
}
}
If (len2 <= len ){
Return this;
}
Len2 = 0;
Len = (len> sp. length )? Len-sp.length: len;
For (var I = 0; I <this. length; I ++ ){
If (this. charCodeAt (I)> 127 | this. charCodeAt (I) = 94 ){
Len2 + = 2;
} Else {
Len2 ++;
}
If (len2> len ){
Str + = sp;
Break;
}
Str + = this. charAt (I );
}
Return str;
}
Var str1 = 'World's best #%& World's Best #% ';
Document. write ('str1 = '+ str1 +'
');
Document. write ('length = '+ str1.gblen () +'
');
Document. write ('gbtrim (10) = '+ str1.gbtrim (10) +'
');
Document. write ('gbtrim (10 ,\'... \ ') =' + Str1.gbtrim (10 ,'... ') +'
');
Document. write ('gbtrim (12, \ '-\') = '+ str1.gbtrim (12,'-') +'
');

// Gbtrim (len truncation length, which is calculated based on the length of English bytes. The omitted characters After s truncation, such "... ")
// Note: Chinese characters are calculated as two lengths. Therefore, when len in gbtrim is 10, a maximum of five Chinese characters are displayed.
// When the number of Chinese characters is greater than 5, because "…" is added after the truncation, Therefore, only four Chinese characters are displayed.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More