The difference between Unicode and ANSI

Last Update:2018-10-16 Source: Internet

Author: User

Tags coding standards strcmp

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is ANSI and what is Unicode? In fact, this is the two different coding standards, ANSI characters in 8bit, and Unicode characters in 16bit. (for characters that say ANSI holds English characters in single-byte, double-byte for Chinese, and Unicode, both English and Chinese characters are stored in double-byte) Unicode code is also an international standard encoding with two-byte encoding, which is incompatible with ANSI code.

Currently, it is used in networks, Windows systems, and many large software applications. The ANSI code of 8bit can only represent 256 characters, which means that 26 English letters are more than enough, but the non-western characters that have thousands of characters, such as Chinese characters, Korean, and so on, are not sufficient, so the Unicode standard is introduced.

In software development, in particular the use of C language of some of the functions of string processing, ANSI and Unicode is a distinction is used, then the ANSI type of characters and Unicode type of character how to define, how to use it? How does ANSI and Unicode convert?

A Definition part: Ansi:char str[1024]; Available string handler functions: strcpy (), strcat (), strlen (), and so on. unicode:wchar_t str[1024]; string handling functions available

Two Available functions: ANSI: Char, String handler function: Strcat (), strcpy (), strlen (), and so on Str. UNICODE: wchar_t can be used as a string handler function: Wcscat (), wcscpy (), wcslen (), and other functions that begin with WCS.

Three The system supports Windows 98: only ANSI is supported. Windows 2k: Both ANSI and Unicode supported. Windows CE: Supports Unicode only.

Description

1 Unicode is only supported in COM.

2. Windows 2000 the entire OS system is Unicode-based, so using ANSI under Windows 2000 costs a price, although no conversion is required on the encoding, but this conversion is hidden and consumes system resources (CPU, memory).

3 Unicode must be used under Windows 98, you need to manually switch the encoding yourself.

Four How to differentiate:

In our software development, it is often necessary to support the ANSI and Unicode support, it is not possible to change the type of the string in the request type conversion, and the use of the operation function on the string. For this reason, the standard C run-time library and windows provide a way to define macros.

_unicode macros (underlined) are provided in the C language, and Unicode macros (without underscores) are provided in Windows, and as long as _UNICODE macros and Unicode macros are set, the system automatically switches to the Unicode version, otherwise The system compiles and runs in ANSI manner.

Only macros are defined and cannot be converted automatically, and a series of character definition support is required.

1. TCHAR If a Unicode macro is defined, TCHAR is defined as wchar_t. typedef wchar_t TCHAR; Otherwise TCHAR is defined as char typedef char TCHAR;

2. LPTSTR If a Unicode macro is defined, LPTSTR is defined as LPWSTR. (Previously did not know LPWSTR is what east, finally understand) typedef LPTSTR LPWSTR; Otherwise TCHAR is defined as char typedef LPTSTR LPSTR;

Add: UTF-8 is available for true streaming, Unicode is an encoding scheme my understanding is that UTF-8 is a specific implementation of Unicode. Similar implementations are UTF-16 and so on.

The Ansi/unicode character and string TChar.h are String.h modifications that are used to create ansi/unicode universal strings.

Each character of a Unicode string is 16 bits.

Win9x only supports ansi;win2000/xp/2003 support Ansi/unicode;wince only Unicode attached: Some Unicode functions can also be used in Win9x, but unexpected errors can occur.

wchar_t is the data type of the Unicode character.

All Unicode functions start with WCS, and ANSI functions begin with STR;

ANSI C specifies that the C run-time library supports ANSI and Unicode

ANSI Unicode

Char *strcat (char *, const char*)

wchar_t *wcscat (wchar_t *, const wchar_t *)

Char *STRCHR (const char *, int)

wchar_t *WCSCHR (const wchar_t *, int)

int strcmp (const char *, const char *)

int wcscmp (CONST wchar_t *, const wchar_t *)

Char *strcpy (char *, const char *)

wchar_t *wcscpy (wchar_t *, const wchar_t *)

size_t strlen (const char *)

wchar_t wcslen (const wchar_t *)

L "Wash": Used to convert an ANSI string to a Unicode string;

_text ("Wash") is converted based on whether Unicode or _unicode are defined.

Attached: _unicode for C run-in; Unicode for Windows header files.

Ansi/unicode Common data types

Both (ansi/unicode) ANSI Unicode

LPCTSTR LPCSTR LPCWSTR

LPTSTR LPSTR LPWSTR

Pctstr Pcstr Pcwstr

Ptstr PSTR Pwstr

TBYTe (TCHAR) CHAR WCHAR

It is best to provide ANSI and Unicode functions when designing DLLs, and the ANSI function is used only to allocate memory, convert characters to Unicode characters, and then call Unicode functions.

It is best to use operating system functions, less use or not practical C run-time functions

Eg: operating system string Functions (ShlWApi.h) StrCat (), STRCHR (), STRCMP (), StrCpy () Note that they are case-sensitive, and also distinguish between ANSI and Unicode versions

Attached: The ANSI version of the function after the original function to increase the write letter a Unicode function after the original function to increase the write letter W

Become ANSI and UNICODE-compliant functions

? Treats a text string as an array of characters instead of a C h a R S array or a byte array.

? Common data types such as T-C H A R and p T S t r are used for text characters and strings.

? Use explicit data types (such as B y T e and P b y t e) for Byte, byte pointers, and data caches.

? Use the T-E X T macro for literal characters and strings.

? Modify the string arithmetic problem.

such as: sizeof (szbuffer), sizeof (szbuffer)/sizeof (TCHAR) malloc (CharNum), malloc (CharNum * sizeof ( TCHAR))

Functions for Unicode character manipulation are also: (also available in ANSI and Unicode versions) Lstrcat (), lstrcmp ()/Lstrcmpi () [They are internally called comparestring ()], lstrcpy (), Lstrl En () These are implemented as macros.

C Run-time functions Windows functions

ToLower () ptstr charlower (Ptstr pszstring)

ToUpper () ptstr charupper (Ptstr pszstring)

Isalpha () bool Ischaralpha (TCHAR CH) bool Ischaralphanumeric (TCHAR C H

Islower () BOOL ischarlower (TCHAR ch)

Isupper () BOOL ischarupper (TCHAR ch)

Print () wsprintf ()

Convert buffer:

DWORD Charlowerbuffer (Ptstr pszstring, DWORD cchstring) DWORD Charupperbuffer (Ptstr pszstring, DW ORD cchstring)

You can also convert a single character, such as: TCHAR Clowercasechar = Charlower ((ptstr) szstring[0])

Determines whether the character is ANSI or Unicode

BOOL Istextunicode (const VOID * pbuffer,//input buffer to be examined int CB,//size of input bu Ffer lpint LPI//options)

Attached: This function does not implement code in the Win9x system, always returns false

Conversion between Unicode and ANSI

Char sza[40]; WCHAR szw[40]; Normal Sprintf:all string is ANSI

sprintf (SzA, "%s", "ANSI str"); Convert Unicode String to ANSI

sprintf (SzA, "%s", L "Unicode str"); Normal Swprintf:all string is Unicode

Swprinf (SZW, "%s", L "Unicode str"); Convert ANSI String to Unicode

Swprinf (SZW, L "%s", "ANSI str");

int ( uint ucodepage, //code page, 0 dword dwflags, //character-type options, 0 PCSTR pmultibyte, //source string addr int cchmultibyte, //source string byte length pwstr pwidecharstr, //dest string addr int cchwidechar //dest string char Nums )

u C o d e P a G e parameter identifies a code page number associated with a multibyte string. D W F l A G s parameter is used to set another control that can affect a character with a distinguishing marker such as an accent sign. These flags are usually not used and are passed 0 in the D w F l A G s parameter. P M u l t i B y t e S t r parameter is used to set the string to be converted, c c h M u l t i b y T e parameter is used to indicate the length of the string (in bytes). If the c c h M u l t i B y T e parameter is passed-1, then the function is used to determine the length of the source string. The converted U n i c o d e version string will be written to the in-memory cache whose address is specified by the P Wi D e c h a r S t r parameter. The maximum value of the cache (measured in characters) must be set in C c h Wi D e c h a r parameter. If you call M u l t i B y T e to Wi D e c h a R, pass 0 to C c h WI d e c h a r parameter, then this parameter will not perform the conversion of the string, but instead return the cached value needed to make the conversion succeed.

You can convert a multibyte string to a U n i c o d e equivalent string by using the following steps:

1) Call M u l t i B y T e to Wi D e c h a r function, for P Wi D e c h a r S t r parameter pass n u l l, for C c H Wi D e c h a r parameter pass 0.

2) Allocate enough memory blocks to hold the converted U n i c o d e string. The size of the memory block is returned by a call from the front facing m u l t B y T E to Wi D e C h a r.

3) Call again m u l t i b y t e to wi D e c h a R, this time the cached address as P Wi D e c h a r S t r parameter to pass, and pass the first call m u l t i b y t e to wi D e c h a R when the cache size is returned as C c h Wi D e c h a r parameter.

4) Use the converted string.

5) Release the memory block occupied by the U n i c o d e string.

int WideCharToMultiByte (UINT CodePage,//code page

DWORD DwFlags,//performance and mapping flags

LPCWSTR Lpwidecharstr,//Wide-character string

int Cchwidechar,//number of chars in string

LPSTR lpmultibytestr,//buffer for new string

int Cbmultibyte,//size of buffer

LPCSTR Lpdefaultchar,//default for unmappable chars

Lpbool Lpuseddefaultchar//Set when default char used)

Https://www.cnblogs.com/lizhenlin/p/6242483.html

The difference between Unicode and ANSI

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More