Chinese characters in the C + + search string

Source: Internet
Author: User
Tags ord

Example: Returns the number of characters in the input string:

int Getchinesecharactercount (char *pstr)

{
int retcnt = 0;
int i=0;
While (pstr[i]!=0)
    {
if (Pstr[i] & 0x80)
        {
retcnt++;
i++; Because of a Chinese character twoBytes
        }
i++;
    }
return retcnt;

}

The following are collected from:

Http://blog.163.com/[email protected]/blog/static/791554782011523103550237/

Ord ($STR) &0x80 to judge Chinese characters80 The corresponding binary code is 1000 0000, the highest bit is one and represents the Chinese character. The Chinese character coding format is commonly known as the 10 format.
a Kanji account is 2 bytes, but only one character is represented

"In Windows, the encoding of the Chinese Simplified character set is expressed in both 1 bytes and 2 bytes. When the high is 0x00~0x7f, it is a byte, and the high position is more than 0x80 with 2 bytes "

Note: The brackets are all 2 binary

when you find that the content of a byte is greater than 0x7f, then it must be a (with another byte pieced together into a) Chinese characters, how to judge certainly greater than 0x7f ?
0x7f (1111111) The next number is 0x80 (10000000), so want to be greater than 0x7f, the highest bit of this byte is certainly 1, we just need to determine whether the highest level is 1 on the line.

Judging Method:
bit and (the same bit is 1 for 1, otherwise 0):
such as: to determine whether the third digit of a number is 1, as long as the 4 (100) and the number of the 2nd bit to determine whether 1 is 2 (10) bit with.
in the same vein, whether the eighth digit is 1 to follow (10000000) is 0x80.

Why not >0x7f here? PHP may be OK, but in other strongly typed languages, the highest bits of 1 bytes are used to indicate negative numbers, and a negative number must not be greater than 0x7f (the largest integer)


Let me give you an example:
A's Assic code is (1100001)
A's Assic code is (1000001)

B's Assic code is 98 (1100010)
B's Assic code is (1000010)

find a rule: a A-Z letter, as long as the lowercase letter, the sixth bit is definitely 1, we can use this to determine the case:
This time just with a letter with 0x20 (100000) to the position and judgment:
if (ord ($a) &0x20) {
//Uppercase
}

How do I change all the letters to uppercase? The sixth bit of 1 is changed to 0 on the line:
$a = ' a ';
$a = Chr (ord ($a) & (~0x20));
echo $a;

Chinese characters in the C + + search string

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.