The interception _c language of Chinese character string in C + +

Source: Internet
Author: User
Tags strlen

1,

Copy Code code as follows:

const char *STR = "test tests";
while (*STR)
{
Here you only need to judge that the first byte is greater than 0x80, provided that you enter a valid GBK string
The reason is that if the first byte is greater than 0x80, then it must form a Chinese character together with the following byte
So there's no need to judge the back byte.
Again, the prerequisite is to enter a valid GBK string
if (*str > 0x80)
{
Chinese characters, counter + +
str = 2;//is the Chinese character naturally it should be directly + 2
}
Else
{
str++;
}
}

2,

Refer to the string conversion function below.

Copy Code code as follows:

/**
* with GetBytes (encoding): Returns a byte array of strings
* When B[0] is 63 o'clock, it should be a transcoding error
* A, not garbled character string:
* 1, encoding with GB2312, each byte is negative;
* 2, encoding with Iso8859_1, B[i] all is 63.

* B, garbled character string:
* 1, encoding with iso8859_1, each byte is also negative;
* 2, encoding with GB2312, B[i] most of the 63.
* C, English string
* 1, encoding with Iso8859_1 and GB2312, each byte is greater than 0;
* Summary: Given a string, with GetBytes ("Iso8859_1")
* 1, if b[i] has 63, does not have the transfer code; A-2
* 2, if b[i] all greater than 0, then the English string, not transcoding; B-1
* 3, if b[i] is less than 0, then already garbled, to transcoding. C-1
*/
private static string toGb2312 (String str) {
if (str = null) return null;
String retstr = str;
BYTE b[];
try {
b = Str.getbytes ("Iso8859_1");

for (int i = 0; i < b.length; i++) {
BYTE B1 = b[i];
if (B1 = 63)
Break 1
else if (B1 > 0)
Continue;//2
else if (B1 < 0) {//It is not possible for 0,0 to be a string terminator
Retstr = new String (b, "GB2312");
Break
}
}
catch (Unsupportedencodingexception e) {
E.printstacktrace ();
}
return retstr;
}

3,

Copy Code code as follows:

unsigned char *str = "test tests";
int length;
int i;

Length = strlen (str);
for (i = 0; i < length-1; i++)
{
if (*str >= 0x81 && *str <= 0xFE
&& * (str + 1) >= 0x40 && * (str + 1) <= 0xFE)
{
Chinese characters
}
}

Unsignedchar*str= "Test tests";//change String to "Han a" try, the result is 2

It is said that "a GBK Chinese character takes up two char spaces (two bytes) and the value in the first byte is less than 0." Can be judged according to whether it is Chinese characters. ”
1, why the value of the first byte is less than 0?
2, if only by judging the first byte if less than 0, then the byte and the next byte will constitute a Chinese character, this logic is not insurance?
3, because also see someone said, GBK encoded Chinese characters have high and low two, the first is low bar? Need the first byte between 160-254, the second byte between 64-254, so is it safer than the method mentioned in 2?
4, if the character set in db is simplified Chinese_china. ZHS16GBK, is this the GBK character set? GBK compatible GB2312

It seems that some characters in some character sets account for three bytes

"By determining if the first byte is less than 0, the byte and the next byte form a Chinese character."

GBK the inner code range of Chinese characters
81-A0, 40-7e 80-fe
AA-AF, 40-7e 80-a0
B0-d6, 40-7e 80-fe
D7, 40-7e 80-f9
D8-f7, 40-7e 80-fe
F8-fe, 40-7e 80-a0
For example://81-a0, 40-7e 80-fe
The ASCII code that represents the character should be within the three intervals of 129-160,64-126,128-254

4,
In the work, encountered to intercept the string displayed on the screen, because the string with Chinese characters, if the interception is not good, will cause garbled, wrote the following function

The test can be passed under Uclinux and VC6.0.

View Plaincopy to Clipboardprint?

Copy Code code as follows:

/* Intercept string

Name: the string to intercept

Store: A string to store

Len: the length to intercept

*/

void Split_name (char * name, char * store, int len)
{

int i= 0;

Char strtemp[l (Namel)]={0};

if (strlen (name)
{

strcpy (store, name); *name=0;

return;

}

Start with the 1th byte.

while (I < Len)

{

if (name[i]>>7&1 && name[i+1]>>7&1)//if (Name[i] < 0 && Name[i+1] < 0 )

i = i + 2;

Else

i = i + 1;

}

i = i > len? I-3: i-1;

strncpy (store, name, i+1); Intercept front i+1 bit

* (store+i+1) = 0;

strcpy (strtemp, name + i + 1);

strcpy (name, strtemp);

}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.