DBCS, wide characters and unicode

Source: Internet
Author: User

At the early stage of the development of a small computer, eight bytes have been strictly set up to indicate a maximum of 256 characters, but there are about 21000 hieroglyphics in Japan and South Korea, ASCII cannot be expressed. Therefore, we propose the dual-Byte Character Set DBCS (double-byte character set), which starts from 256 and is only Japanese, simplified Chinese, and Korean, the Windows version provided by traditional Chinese only supports DBCS. The dual character set does not mean that the characters are represented by two bytes. Some characters are represented by one byte, which causes additional programming problems.

Unicode is a better solution. Unicode is a unified 16-bit system, which is sufficient to represent all the languages and texts in the world. Its first 128 characters exactly correspond to ASCII characters, the disadvantage is that the memory usage is twice.

The wide character is used with the C language to support multiple Bytes:

 

Char Data Format

Char c = 'a'; // 8

Char * P = "Hello! ";

 

Wide character

Typdef unsigned short wchar_t; // 16

Wchar_t c = 'a'; in memory: 0x41 0x00 format

Wchar_t * P = l "Hello! "; Sizeof (p) = 14

 

Different wide character library functions

 

Char * Pc = "Hello! ";

Int Len = strlen (PC );

We get the Len value 6, no problem,

However:

Wchar_t * PW = l "Hello! ";

Int lenw = strlen (PW );

We will get:

'Function': incompatible types-from 'unsigned short * 'to 'const char *'

This warning or error prompt indicates that the strlen function parameter indicates that the char type is to be passed in, but the unsigned int type is passed in.

In fact, this situation still exists in many places, but there is no need to worry that the wide character version of the function has been written for us.

For example:

Size_t _ cdecl strlen (const char *);

Wide character version:

Size_t _ cdecl wcslen (const wchar_t *);

Note that the length of the string array obtained by sizeof is as follows:

Char szc [] = "abcdef ";

Wchar_t szw [] = "abcdef ";

Int sizec = sizeof (szc)/sizeof (char );

Int sizew = sizeof (szw)/sizeof (w_char );

 

Solution without switching:

TCHAR

 

# Ifndef _ UNICODE

Typedef char TCHAR

# Define _ tcslen strlen

# Else

Typedef wchar_t TCHAR

# Define _ tcslen wcslen

# Endif

 

These jobs are also done well, just waiting for us to use

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.