Windows core programming (3) character encoding detailed

Last Update:2018-04-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, the character encoding detailed introduction

1. Bytes (byte) is a unit of measurement that is a unit of computer information technology storage capacity

2. Meaning of characters: number of characters text and symbols used in computer text, such as 1,2,3,4,~,@,!,%,^ wait

3. The relationship between characters and bytes in different encodings is different

A. In the ASCLL code, an English letter (case-insensitive) takes up one byte (8bit), and a man occupies two bytes (16bit)

B. UTF-8 encoding, an English character equals one byte, and one Chinese (with traditional) equals three bytes

C. In Unicode encoding, one English two bytes, one Chinese (with traditional) accounting for two bytes

D. The English symbol occupies one byte, the Chinese symbol occupies two bytes

4. Multi-Character Set: National text encoding yo overlapping encoding (encoding conflict) causes garbled

A. First, there is only one character set on the internet------ans's ascll character set, which he uses 7bits to represent a character,

A total of 128 characters, including English letters, numbers, punctuation, and then extended, using 8bits to represent a character

can represent 256 characters, and then add some special characters to the original 7bits base.

B. Subsequently, the voice of the countries to join, ASCLL has been unable to meet the demand, all countries in the ASCLL based on the development of their own

Character sets, these character sets derived from the ANSL standard are customarily referred to as the ANSL character set

official name mbcs (Multi-Byte chactacter system, or multibyte character systems) , each language has its own character set

therefore The unicode Character Set ,

It is fixed using the 16 bits (two bytes, one word) to represent a character , which can represent a total of 65,535 characters, will be the world's

All speech characters commonly used characters are included, (the Unicode standard is called UTF-16), and later in order to enable double-byte Unicode

The ability to transmit correctly on existing processing single-byte systems, UTF-8, and Unicode encoding using MBCS .

UTF-8 is encoded, it belongs to the Unicode character set,

5.Windows defines a number of data types

a.wchar_t is two bytes with a W is this type

B.wchar Unicode characters He's actually wchar_t.

C.pwstr pointer to a Unicode string wchar_t *

D.pcwstr points to a constant Unicode const wchar_t *

E. The corresponding multibyte type is CHAR,LPSTR,LPCSTR

F.asnl/unicode Universal Data Type ,

Tcahr is char in multiple character sets, wchar_t in Unicode

Ptstr in multiple character sets is char *, Unicode is wchar_t *

LPCTSTR a const char * in a multi-character set, and a const wchar_t in Unicode *

F. With a is a multi-character set, W is Unicode (the paragraph character), T is universal

7.Windows multi-character set and Unicode mutual conversion API

A.widechartomultibyte maps A Unicode string to a multibyte string

B.multibytetowidechar mapping a multibyte string to a Unicode string

8. Function using the above function is more complex, you can use the following macro function

The identifier uses_conversion must be declared before use;

A2W: Multi-byte to width byte

uses_conversioncstring str;char* Achar = "ABCDEFG"; wchar_t* wchar = a2w (Achar); str = WChar;

W2A: Convert paragraph bytes to multibyte

uses_conversion;wchar_t* Achar = L "ABCDEFG ah"; char* wchar = W2A (Achar);

T2a:t representative follows system to multibyte

Uses_conversion;  char * pchar= "char to CString"; CString ctemp=a2t (PChar);

T2W: System Type Transfer byte

Ses_conversion;  CString ctemp =_t ("Char to CString"); char * PCHAR=A2T (PCHAR);

9. Use the Macro function transformation above sparingly

A. If you use this function in a loop, it may cause a stack overflow

because you look at the code and find that his function calls Alloc to request memory, He will allocate it in the stack of functions,

VC compiler default is 2M, in a loop call this function will always allocate memory.

B. The best solution is to use WideCharToMultiByte MultiByteToWideChar

With these two APIs, it's convenient to use the two API packages.

10. Use Thar _text to accommodate both Unicode and multibyte character sets

Windows core programming (3) character encoding detailed

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Windows core programming (3) character encoding detailed

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Windows core programming (3) character encoding detailed

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support