Various character encoding and String Structures

Source: Internet
Author: User

First of all, I am sorry for occupying the block on the home page.

Character encoding and string structure are not the same thing. Many people don't even understand this and ask what is going on with BSTR and ascii ansi Unicode.

Character encoding I understand is a set of tables that map character sets to numbers. For example, in an ASCII encoded table, 97 is a lowercase letter, 98 is B, 65 is A, 66 is B, and 63 is? Number, 64 is the @ symbol.

Common character encodings include the following types:
1) ASCII encoding, with seven binary characters (single-byte) corresponding to 127 characters.
2) asc2 encoding, eight binary characters (single-byte), corresponding to 256 characters. The first 128 characters are the same as ASCII encoding, and some control symbols are extended.
3) ANSI encoding: eight binary characters (single-byte), corresponding to 256 characters. The first 128 characters are also the same as ASCII encoding, and a batch of control symbols are also expanded, however, the position of the symbol is slightly different from that of ascii2.
4) unicode encoding: 16 binary characters (double byte). It corresponds to 65535 characters and has a large range. It can represent Eastern languages, such as Chinese characters, Japanese and Arabic.

Simply put, there are too many strings in the string structure, especially in C ++, which is also one of the reasons why many c ++ consumers are discouraged, because there are too many types of C ++ strings
1) char * can represent an ASCII string. 0 is used as the end identifier of the string and is used together with functions such as strlen () strcmp ().
2) wchar * Indicates a double byte string. It is identified by 00 as the end of the string and used together with functions such as wstrlen () wstrcmp ().
3) tchar * in the adaptive Unicode environment. This is just a macro. Win32 defines lptchar for it.
4) STL: string is the string structure of the standard template library. It is a unicode string. It is a class that does not know whether to use the leading length or suffix Terminator, easy to use
5) cstring is a String Structure in MFC and a unicode string
6) BSTR is the string structure used in the COM component. when calling the COM component, all the passed string parameters should be converted. the string length is represented by a leading number.
In addition, the string of VB represents the String Structure in the form of a leading String Length.

we will record character encoding and string structure in the future. You are also welcome to read this article to add an error correction by leaving a message. I will add or modify it in time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.