1,
Copy Code code as follows:
const char *STR = "test tests";
while (*STR)
{
Here you only need to judge that the first byte is greater than 0x80, provided that you enter a valid GBK string
The reason is that if the first byte is greater than 0x80, then it must form a Chinese character together with the following byte
So there's no need to judge the back byte.
Again, the prerequisite is to enter a valid GBK string
if (*str > 0x80)
{
Chinese characters, counter + +
str = 2;//is the Chinese character naturally it should be directly + 2
}
Else
{
str++;
}
}
2,
Refer to the string conversion function below.
Copy Code code as follows:
/**
* with GetBytes (encoding): Returns a byte array of strings
* When B[0] is 63 o'clock, it should be a transcoding error
* A, not garbled character string:
* 1, encoding with GB2312, each byte is negative;
* 2, encoding with Iso8859_1, B[i] all is 63.
* B, garbled character string:
* 1, encoding with iso8859_1, each byte is also negative;
* 2, encoding with GB2312, B[i] most of the 63.
* C, English string
* 1, encoding with Iso8859_1 and GB2312, each byte is greater than 0;
* Summary: Given a string, with GetBytes ("Iso8859_1")
* 1, if b[i] has 63, does not have the transfer code; A-2
* 2, if b[i] all greater than 0, then the English string, not transcoding; B-1
* 3, if b[i] is less than 0, then already garbled, to transcoding. C-1
*/
private static string toGb2312 (String str) {
if (str = null) return null;
String retstr = str;
BYTE b[];
try {
b = Str.getbytes ("Iso8859_1");
for (int i = 0; i < b.length; i++) {
BYTE B1 = b[i];
if (B1 = 63)
Break 1
else if (B1 > 0)
Continue;//2
else if (B1 < 0) {//It is not possible for 0,0 to be a string terminator
Retstr = new String (b, "GB2312");
Break
}
}
catch (Unsupportedencodingexception e) {
E.printstacktrace ();
}
return retstr;
}
3,
Copy Code code as follows:
unsigned char *str = "test tests";
int length;
int i;
Length = strlen (str);
for (i = 0; i < length-1; i++)
{
if (*str >= 0x81 && *str <= 0xFE
&& * (str + 1) >= 0x40 && * (str + 1) <= 0xFE)
{
Chinese characters
}
}
Unsignedchar*str= "Test tests";//change String to "Han a" try, the result is 2
It is said that "a GBK Chinese character takes up two char spaces (two bytes) and the value in the first byte is less than 0." Can be judged according to whether it is Chinese characters. ”
1, why the value of the first byte is less than 0?
2, if only by judging the first byte if less than 0, then the byte and the next byte will constitute a Chinese character, this logic is not insurance?
3, because also see someone said, GBK encoded Chinese characters have high and low two, the first is low bar? Need the first byte between 160-254, the second byte between 64-254, so is it safer than the method mentioned in 2?
4, if the character set in db is simplified Chinese_china. ZHS16GBK, is this the GBK character set? GBK compatible GB2312
It seems that some characters in some character sets account for three bytes
"By determining if the first byte is less than 0, the byte and the next byte form a Chinese character."
GBK the inner code range of Chinese characters
81-A0, 40-7e 80-fe
AA-AF, 40-7e 80-a0
B0-d6, 40-7e 80-fe
D7, 40-7e 80-f9
D8-f7, 40-7e 80-fe
F8-fe, 40-7e 80-a0
For example://81-a0, 40-7e 80-fe
The ASCII code that represents the character should be within the three intervals of 129-160,64-126,128-254
4,
In the work, encountered to intercept the string displayed on the screen, because the string with Chinese characters, if the interception is not good, will cause garbled, wrote the following function
The test can be passed under Uclinux and VC6.0.
View Plaincopy to Clipboardprint?
Copy Code code as follows:
/* Intercept string
Name: the string to intercept
Store: A string to store
Len: the length to intercept
*/
void Split_name (char * name, char * store, int len)
{
int i= 0;
Char strtemp[l (Namel)]={0};
if (strlen (name)
{
strcpy (store, name); *name=0;
return;
}
Start with the 1th byte.
while (I < Len)
{
if (name[i]>>7&1 && name[i+1]>>7&1)//if (Name[i] < 0 && Name[i+1] < 0 )
i = i + 2;
Else
i = i + 1;
}
i = i > len? I-3: i-1;
strncpy (store, name, i+1); Intercept front i+1 bit
* (store+i+1) = 0;
strcpy (strtemp, name + i + 1);
strcpy (name, strtemp);
}