1,
Copy codeThe Code is as follows: const char * str = "test ";
While (* str)
{
// You only need to judge that the first byte is greater than 0x80, provided that the input is a valid GBK string.
// The reason is that if the first byte is greater than 0x80, it must combine with the next byte to form a Chinese character
// There is no need to judge the next byte.
// Emphasize that the prerequisite is to enter a valid GBK string.
If (* str> 0x80)
{
// Chinese characters, counter ++
Str + = 2; // if it is a Chinese character, it should be directly + 2.
}
Else
{
Str ++;
}
}
2,
See the following string conversion functions.
Copy codeThe Code is as follows :/**
* Using getBytes (encoding): returns a byte array of the string.
* When B [0] is 63, it should be a transcoding error.
* A. Non-garbled Chinese character strings:
* 1. When encoding uses GB2312, each byte is negative;
* 2. When encoding uses ISO8859_1, all B [I] values are 63.
* B. garbled Chinese character strings:
* 1. When encoding uses ISO8859_1, each byte is negative;
* 2. When GB2312 is used for encoding, the majority of B [I] is 63.
* C. English string
* 1. When encoding uses ISO8859_1 and GB2312, each byte is greater than 0;
* Conclusion: Given a string, use getBytes ("iso8859_1 ")
* 1. If B [I] has 63, no transcoding; A-2
* 2. If B [I] is greater than 0, it is an English string without transcoding; B-1
* 3. If B [I] has a value smaller than 0, it is garbled and needs to be transcoded. C-1
*/
Private static String toGb2312 (String str ){
If (str = null) return null;
String retStr = str;
Byte B [];
Try {
B = str. getBytes ("ISO8859_1 ");
For (int I = 0; I <B. length; I ++ ){
Byte b1 = B [I];
If (b1 = 63)
Break; // 1
Else if (b1> 0)
Continue; // 2
Else if (b1 <0) {// The string Terminator cannot be 0 or 0.
RetStr = new String (B, "GB2312 ");
Break;
}
}
} Catch (UnsupportedEncodingException e ){
// E. printStackTrace ();
}
Return retStr;
}
3,
Copy codeThe Code is as follows: unsigned char * str = "test ";
Int length;
Int I;
Length = strlen (str );
For (I = 0; I <length-1; I ++)
{
If (* str> = 0x81 & * str <= 0xFE
& * (Str + 1)> = 0x40 & * (str + 1) <= 0xFE)
{
// Chinese Characters
}
}
Unsignedchar * str = "test"; // Replace the string with "Han A". The result is 2.
Someone said: "a gbk Chinese Character occupies two char spaces (two bytes), and the value in the first byte is smaller than 0. You can determine whether it is a Chinese character ."
1. Why is the value of the first byte smaller than 0?
2. If the first byte is less than 0, the byte and the next byte constitute a Chinese character. Is this logic safe?
3. Some people have said that GBK-Encoded chinese characters have two levels: High and Low. is the first one low? The first byte must be between 160-254, and the second byte must be between 64-254. Is this safer than the method mentioned in 2?
4. If the DB character set is SIMPLIFIED CHINESE_CHINA.ZHS16GBK, is this GBK character set? GBK compatible with GB2312
It seems that some Chinese Characters in some character sets occupy three bytes.
"If the first byte is smaller than 0, the byte and the next byte constitute a Chinese character"
// GBK Chinese characters Internal Code Range
// 81-A0, 40-7E 80-FE
// AA-AF, 40-7E 80-A0
// B0-D6, 40-7E 80-FE
// D7, 40-7E 80-F9
// D8-F7, 40-7E 80-FE
// F8-FE, 40-7E 80-A0
Example: // 81-A0, 40-7E 80-FE
It indicates that the ascii code of a character must be in the range of 129-126,128-254.
4,
During work, when a string needs to be captured and displayed on the screen, because the string contains Chinese characters, if it is not captured well, it will cause garbled characters and write the following function
It can be tested in uclinux and VC6.0.
View plaincopy to clipboardprint?
Copy codeThe Code is as follows:/* truncates a string
Name: string to be truncated
Store: string to be stored
Len: The length to be intercepted.
*/
Void split_name (char * name, char * store, int len)
{
Int I = 0;
Char strTemp [L (NAMEL)] = {0 };
If (strlen (name)
{
Strcpy (store, name); * name = 0;
Return;
}
// Starts from 1st bytes
While (I <len)
{
If (name [I]> 7 & 1 & name [I + 1]> 7 & 1) // if (name [I] <0 & name [I + 1] <0)
I = I + 2;
Else
I = I + 1;
}
I = I> len? I-3: I-1;
Strncpy (store, name, I + 1); // capture the first I + 1 bit
* (Store + I + 1) = 0;
Strcpy (strTemp, name + I + 1 );
Strcpy (name, strTemp );
}