Determine whether a byte is the first byte of a Chinese character or the last byte.

Source: Internet
Author: User

Determine whether a byte is the first byte of a Chinese character or the last byte.

Determine whether a byte is the first byte of a Chinese character or the last byte.
1> use the isdbcsleadbyte function to determine whether a character is the first byte of the double character. Try again :)
2> _ ismbslead _ ismbstrail
3> gb2312/GBK uses a high-level code (GBK uses an extended code) + a high-level code to form a fully-angle character (dual-byte). because both the Code and bit code are above 128, there is no reliable algorithm to distinguish two bytes (I will explain the reason below). When you need an accurate search, you cannot locate them in the middle, because if one byte is staggered, it will cause garbled characters, in this way, algorithms such as the binary method do not work.
Of course, if you use the same width encoding, such as Unicode, or gb2312/GBK Chinese character (English and punctuation use full-width Chinese characters), there is no problem above, in Windows, dual-byte Unicode is used. The cost is that resources are greatly consumed. In the latter case, full-width unicode is used for all characters.
Note: GBK is an extension of gb2312, and one of the biggest problems of gb2312 is that the Code bit and the code bit rules of gb2312 are exactly the same. The original location of gb2312 is from 1 to 94, BITs are also from 1 to 94, and the inner code of the Protection terminal is the area code + 160. Bits plus 160 each occupies one byte. Therefore, theoretically, the character string starts with a random byte in bytes. There is no reliable algorithm to determine whether the byte is a zone code or bit code, and the strings following it cannot be determined, therefore, the exact search string can only be honestly scanned from the first byte. When we see the first byte above 128, it is regarded as the location of the Chinese character. From this, we start to locate two groups of resolutions, until the first ASCII code is found, it is transferred to the ASCII Analysis in bytes until a character greater than 128 is found ...... this kind of low cycle efficiency is positive.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.