ANSI and UINCODE encoding, ANSIUINCODE Encoding

Source: Internet
Author: User

ANSI and UINCODE encoding, ANSIUINCODE Encoding
Brief description:

  • ANSI is a character code. To enable the computer to support more languages, 0x80 ~ is usually used ~ 2 bytes in the 0xFF range to 1 character.
  • Uincode (Uniform Code, universal code, and single code) is an industry standard in the field of computer science, including character sets and encoding schemes. Unicode is generated to address the limitations of traditional character encoding schemes. It sets a uniform and unique binary encoding for each character in each language, to meet the requirements of cross-language and cross-platform text conversion and processing. R & D started in December 1990 and officially announced in December 1994.

 

Advantages and disadvantages:
  • Ansi uses char to indicate a character, occupying one byte storage space. Therefore, the Ansi verification code can contain a maximum of 255 characters, indicating that the English language is acceptable, but it is not enough for Chinese, Japanese, Korean, and other languages.
  • Uincode uses unsigned short to represent a character and is defined as wchar_t, which occupies two bytes of storage space. Therefore, the Uincode encoding code basically supports 90% of the natural language. Disadvantage: the space usage doubles and the network transmission volume increases.

 

 

ANSI code = narrow characters

Uincode = wide character

 

 

 

◆ Verification code data type:

● Ansi: char, char *, const char * C ++

CHAR, (PCHAR, PSTR, LPSTR), and lpcstr vc ++

 

● Unicode: wchar_t, wchar_t *, const wchar_t *

WCHAR, (PWCHAR, PWSTR, LPWSTR), and LPCWSTR

 

● General T types: TCHAR, (TCHAR *, PTCHAR, PTSTR, LPTSTR), and LPCTSTR

 

Above, P indicates the pointer, STR indicates the string, L indicates the long pointer, can be ignored on the WIN32 platform, and C indicates the const constant, W represents the meaning of wide's wide byte. T can be understood as a general type,

The general type is the general character type defined by Microsoft for convenience. In different encoding environments, the _ UINCODE macro is automatically converted to char or wchar_t based on whether it is defined;

 

 

◆ Definition of string objects:

● Ansi: char * pAnsiStr = "hello ";

● Unicode: wchar_t * pUnicodeStr = L "hello ";

● General type: TCHAR * pTStr = _ T ("hello"); or TCHAR * pTStr = _ TEXT ("hello ");

● Dynamic memory application: TCHAR * pszBuf = new TCHAR [100]; // The identifier is very important.

 

Here, _ TEXT and _ T are the same and are defined as follows:

# Define _ T (x) # define _ TEXT (x) _ T (x) // Let's see the final definition of _ T: # ifdef _ UNICODE # define _ T (x) L # x // convert Uincode # else # define _ T (x) x // equals to itself # endif

 

# Indicates the connection.

 

◆ Common string processing functions. For details, see MSDN:

String Length:

● Ansi: strlen (char * str); // obtain the string length. cs stands for cstring, And len stands for length, w, and _ t. Easy to remember. You can also query the MSDN

● Unicode: wcslen (wchar_t * str );

● Common functions: _ tcslen (TCHAR * str );

 

● Ansi: int atoi (const char * str); // converts it to a number, atoi, _ wtoi, _ tstoi. It can be memorized by different colors. String type + to + conversion type.

● Unicode: int _ wtoi (const wchar_t * str );

● Common functions: _ tstoi (const TCHAR * str );

 

String Copy:

● Ansi: strcpy (char * strDestination, const char * strSource); // obtain the string length. cs stands for cstring, and cpy stands for copy, w, and _ t. Easy to remember. You can also query the MSDN

● Unicode: wcscpy (wchar_t * strDestination, const wchar_t * strSource );

● Common functions: _ tcscpy (TCHAR * strDestination, const TCHAR * strSource );

 

The preceding functions are not safe. In vs2003 and later compilers, a warning message "warnning" appears. The following is a security function (not supported by vc ++ 6.0 ):

● Ansi: strcpy_s (char * strDestination, size_t numberOfElements, const char * strSource); // _ s can be understood as the abbreviation of safe and safe.

● Unicode: wcscpy_s (wchar_t * strDestination, size_t numberOfElements, const wchar_t * strSource );

● Common functions: _ tcscpy_s (TCHAR * strDestination, size_t numberOfElements, const TCHAR * strSource );

 

NumberOfElements Size of the destination string buffer. Size of the destination buffer, in bytes, not a character!

 

Size_t unsigned integer: Result of sizeof operator in MSDN, that is, size_t is an unsigned integer, that is, an unsigned integer. Why is there size_t? Because int, long, and other types of bytes in the operating system (32/64) of different platforms do not share the same size, and size_t has different definitions on different platforms. It is somewhat similar to the TCHAR type:

# Ifndef _ SIZE_T_DEFINED # ifdef _ WIN64 typedef unsigned _ int64 size_t; // 8 bytes, 64-bit # else typedef _ W64 unsigned int size_t; // 4 bytes, 32-bit # endif # define _ SIZE_T_DEFINED # endif

 

 

 

◆ Number of bytes occupied by the string:

● Ansi:

Char szStr [] = "abc ";

Method for Calculating the number of bytes used: sizeof (szStr );

 

Char * psz = "defgh ";

Method for Calculating the number of bytes used: strlen (psz) * sizeof (char );

 

● Unicode:

Wchar_t szwStr [] = L "abc ";

Method for Calculating the number of bytes used: sizeof (szwStr );

 

Wchar_t * pwsz = L "defgh ";

Evaluate the number of bytes used: wcslen (pwsz) * sizeof (wchar_t );

 

● General functions:

TCHAR szStr [] = _ T ("abc ");

Method for Calculating the number of bytes used: sizeof (szStr );

 

TCHAR * psz = _ T ("defgh ");

Evaluate the number of bytes used: _ tcslen (psz) * sizeof (TCHAR );

◆ The most fundamental API functions used for conversion:

WideCharToMultiByte enables wide-byte conversion to narrow-byte // function parameters can be queried by yourself on MSDN.

MultiByteToWideChar converts narrow bytes to wide bytes

 

The WideCharToMultiByte code page is used to mark the code page related to the newly converted string;

The code page of MultiByteToWideChar is used to mark the code page related to a multi-byte string,

 

[1] common code pages include CP_ACP and CP_UTF8: CP_ACP code pages are used to convert ANSI to Unicode;

The CP_UTF8 code page is used to realize the conversion between the UTF-8 and Unicode.

[2]. The dwFlags parameter allows us to perform additional control. However, in general, this flag is not used. simply pass 0.

[3], lpDefaultChar and pfuseddefachar CHAR:

The WideCharToMultiByte function uses these two parameters only when the WideCharToMultiByte function encounters a wide-byte character that is not represented in the code page marked by the uCodePage parameter. If the wide byte character cannot be converted, this function uses the character pointed to by the lpDefaultChar parameter. If this parameter is NULL (this is the parameter value in most cases), the function uses the default character of the system. This default character is usually a question mark. This is dangerous for file names, because question marks are wildcards. The pfUsedDefaultChar parameter points to a Boolean variable. if at least one character in a Unicode string cannot be converted to an equivalent multi-byte character, the function sets this variable to TRUE. If all characters are successfully converted, this function sets this variable to FALSE. When the function returns to check whether the wide byte string is successfully converted, you can test the variable.

● Examples of two conversion functions:

char *cctryWideCharToAnsi(wchar_t *pWideChar){if (!pWideChar) return NULL;char *pszBuf = NULL;int needBytes = WideCharToMultiByte(CP_ACP, 0, pWideChar, -1, NULL, 0, NULL, NULL);if (needBytes > 0){pszBuf = new char[needBytes+1];ZeroMemory(pszBuf, (needBytes+1)*sizeof(char));WideCharToMultiByte(CP_ACP, 0, pWideChar, -1, pszBuf, needBytes, NULL, NULL);}return pszBuf;}wchar_t *cctryAnsiCharToWide(char *pChar){if (!pChar) return NULL;wchar_t *pszBuf = NULL;int needWChar = MultiByteToWideChar(CP_ACP, 0, pChar, -1, NULL, 0);if (needWChar > 0){pszBuf = new wchar_t[needWChar+1];ZeroMemory(pszBuf, needWChar+1);MultiByteToWideChar(CP_ACP, 0, pChar, -1, pszBuf, needWChar);}return pszBuf;}

Never forget to release space after use

◆ Macro Conversion

Use of A2W, W2A, T2A, and T2W macros and precautions:
[1] The alloca () function is used to apply for a space. The address space returned by the macro is applied for from the stack, so it does not need to be released in the future. This involves a scope issue, for details, see MSDN,
You can simply understand it as "backward compatibility ".
[2]. Do not use A2W or other character conversion Macros in the loop body of a function, which may cause stack overflow.
For example:

#include <atlconv.h>void func(){    while(true)    {        {            USES_CONVERSION;            testFunc(A2W("abc"));        }    }}

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.