http://www.mr3g.net/?p=220
Refer to the JS version of Sina, the most time is Java to ansii code judgment, JS is directly isascii () function can be achieved, Java also to find a way to finally reference two articles, only the whole out of this judgment
JS Code:
-(int) Sinacountword: (nsstring*) s
{
int I,n=[s length],l=0,a=0,b=0;
Unichar C;
for (i=0;i<n;i++) {
C=[s CHARACTERATINDEX:I];
if (Isblank (c)) {
b++;
}else if (Isascii (c)) {
a++;
}else{
l++;
}
}
if (a==0 && l==0) return 0;
return l + (int) Ceilf ((float) (A+B)/2.0);
}
refer to JS version, Java implementation code: /** * Sina Weibo content counter * *
@params *
@return */
Public
Static
int Sinacountword (String s) {
intI, n = s.length (), L = 0, a = 0, b = 0;
CharC;
for(i = 0; i < n; i++) { C = S.charat (i);
if(Character.
Iswhitespace(c)) { b++; }
Else
if(c >= 0 && C <= 127) { //} else if (! Character.isletter (c)) {a++; }
Else{l++; } }
if(A = = 0 && L = = 0)
return0;
returnL + (
int) Math.
Ceil ((
float) (A + B)/2.0); }
Reference article Links:Two the principle of judgment is the same, but one implementation is in decimal judgment, the other is 16 binary Java to determine whether the string is ASCII//This judgment is achieved by using numbers to determine the decimal value of the 16 binary 0x00if (ch>=127| | CH<0) return false;CSDN-Function of counting the number of words in UTF8 encoding method//This decision is achieved by comparing 16 binary (0x00) content with bytes#define UTF8_ASCII (Byte) (((unsigned char) (byte) >=0x00) && ((unsigned char) (byte) <=0x7f )
Java implementation Sina Weibo content counter