The difference between Unicode and ANSI

Source: Internet
Author: User
Tags coding standards strcmp

What is ANSI and what is Unicode? In fact, this is the two different coding standards, ANSI characters in 8bit, and Unicode characters in 16bit. (for characters that say ANSI holds English characters in single-byte, double-byte for Chinese, and Unicode, both English and Chinese characters are stored in double-byte) Unicode code is also an international standard encoding with two-byte encoding, which is incompatible with ANSI code.

Currently, it is used in networks, Windows systems, and many large software applications. The ANSI code of 8bit can only represent 256 characters, which means that 26 English letters are more than enough, but the non-western characters that have thousands of characters, such as Chinese characters, Korean, and so on, are not sufficient, so the Unicode standard is introduced.

In software development, in particular the use of C language of some of the functions of string processing, ANSI and Unicode is a distinction is used, then the ANSI type of characters and Unicode type of character how to define, how to use it? How does ANSI and Unicode convert?

A Definition part: Ansi:char str[1024]; Available string handler functions: strcpy (), strcat (), strlen (), and so on. unicode:wchar_t str[1024]; string handling functions available

Two       Available functions: ANSI: Char, String handler function: Strcat (), strcpy (), strlen (), and so on Str. UNICODE: wchar_t can be used as a string handler function: Wcscat (), wcscpy (), wcslen (), and other functions that begin with WCS.

Three          The system supports Windows 98: only ANSI is supported.          Windows 2k: Both ANSI and Unicode supported. Windows CE: Supports Unicode only.

Description

1 Unicode is only supported in COM.

2. Windows 2000 the entire OS system is Unicode-based, so using ANSI under Windows 2000 costs a price, although no conversion is required on the encoding, but this conversion is hidden and consumes system resources (CPU, memory).

3 Unicode must be used under Windows 98, you need to manually switch the encoding yourself.

Four How to differentiate:

In our software development, it is often necessary to support the ANSI and Unicode support, it is not possible to change the type of the string in the request type conversion, and the use of the operation function on the string. For this reason, the standard C run-time library and windows provide a way to define macros.

_unicode macros (underlined) are provided in the C language, and Unicode macros (without underscores) are provided in Windows, and as long as _UNICODE macros and Unicode macros are set, the system automatically switches to the Unicode version, otherwise The system compiles and runs in ANSI manner.

Only macros are defined and cannot be converted automatically, and a series of character definition support is required.

1. TCHAR If a Unicode macro is defined, TCHAR is defined as wchar_t.        typedef wchar_t TCHAR; Otherwise TCHAR is defined as char typedef char TCHAR;

2. LPTSTR If a Unicode macro is defined, LPTSTR is defined as LPWSTR.       (Previously did not know LPWSTR is what east, finally understand) typedef LPTSTR LPWSTR; Otherwise TCHAR is defined as char typedef LPTSTR LPSTR;

Add: UTF-8 is available for true streaming, Unicode is an encoding scheme my understanding is that UTF-8 is a specific implementation of Unicode. Similar implementations are UTF-16 and so on.

The Ansi/unicode character and string TChar.h are String.h modifications that are used to create ansi/unicode universal strings.

Each character of a Unicode string is 16 bits.

Win9x only supports ansi;win2000/xp/2003 support Ansi/unicode;wince only Unicode attached: Some Unicode functions can also be used in Win9x, but unexpected errors can occur.

wchar_t is the data type of the Unicode character.

All Unicode functions start with WCS, and ANSI functions begin with STR;

ANSI C specifies that the C run-time library supports ANSI and Unicode

ANSI Unicode

Char *strcat (char *, const char*)

wchar_t *wcscat (wchar_t *, const wchar_t *)

Char *STRCHR (const char *, int)

wchar_t *WCSCHR (const wchar_t *, int)

int strcmp (const char *, const char *)

int wcscmp (CONST wchar_t *, const wchar_t *)

Char *strcpy (char *, const char *)

wchar_t *wcscpy (wchar_t *, const wchar_t *)

size_t strlen (const char *)

wchar_t wcslen (const wchar_t *)

L "Wash": Used to convert an ANSI string to a Unicode string;

_text ("Wash") is converted based on whether Unicode or _unicode are defined.

Attached: _unicode for C run-in; Unicode for Windows header files.

Ansi/unicode Common data types

Both (ansi/unicode) ANSI Unicode

LPCTSTR LPCSTR LPCWSTR

LPTSTR LPSTR LPWSTR

Pctstr Pcstr Pcwstr

Ptstr PSTR Pwstr

TBYTe (TCHAR) CHAR WCHAR

It is best to provide ANSI and Unicode functions when designing DLLs, and the ANSI function is used only to allocate memory, convert characters to Unicode characters, and then call Unicode functions.

It is best to use operating system functions, less use or not practical C run-time functions

Eg: operating system string Functions (ShlWApi.h) StrCat (), STRCHR (), STRCMP (), StrCpy () Note that they are case-sensitive, and also distinguish between ANSI and Unicode versions

Attached: The ANSI version of the function after the original function to increase the write letter a Unicode function after the original function to increase the write letter W

Become ANSI and UNICODE-compliant functions

? Treats a text string as an array of characters instead of a C h a R S array or a byte array.

? Common data types such as T-C H A R and p T S t r are used for text characters and strings.

? Use explicit data types (such as B y T e and P b y t e) for Byte, byte pointers, and data caches.

? Use the T-E X T macro for literal characters and strings.

? Modify the string arithmetic problem.

such as: sizeof (szbuffer), sizeof (szbuffer)/sizeof (TCHAR) malloc (CharNum), malloc (CharNum * sizeof ( TCHAR))

Functions for Unicode character manipulation are also: (also available in ANSI and Unicode versions) Lstrcat (), lstrcmp ()/Lstrcmpi () [They are internally called comparestring ()], lstrcpy (), Lstrl En () These are implemented as macros.

C Run-time functions Windows functions

ToLower () ptstr charlower (Ptstr pszstring)

ToUpper () ptstr charupper (Ptstr pszstring)

Isalpha () bool Ischaralpha (TCHAR CH) bool Ischaralphanumeric (TCHAR C H

Islower () BOOL ischarlower (TCHAR ch)

Isupper () BOOL ischarupper (TCHAR ch)

Print () wsprintf ()

Convert buffer:

DWORD Charlowerbuffer (Ptstr pszstring, DWORD cchstring) DWORD Charupperbuffer (Ptstr pszstring, DW ORD cchstring)

You can also convert a single character, such as: TCHAR Clowercasechar = Charlower ((ptstr) szstring[0])

Determines whether the character is ANSI or Unicode

BOOL Istextunicode (const VOID * pbuffer,//input buffer to be examined int CB,//size of input bu Ffer lpint LPI//options)

Attached: This function does not implement code in the Win9x system, always returns false

Conversion between Unicode and ANSI

Char sza[40];         WCHAR szw[40]; Normal Sprintf:all string is ANSI

sprintf (SzA, "%s", "ANSI str"); Convert Unicode String to ANSI

sprintf (SzA, "%s", L "Unicode str"); Normal Swprintf:all string is Unicode

Swprinf (SZW, "%s", L "Unicode str"); Convert ANSI String to Unicode

Swprinf (SZW, L "%s", "ANSI str");

int    (  uint   ucodepage,   //code page,   0    dword    dwflags, //character-type   options,   0   PCSTR   pmultibyte,  //source   string   addr   int   cchmultibyte,   //source    string   byte length    pwstr   pwidecharstr,   //dest string    addr    int   cchwidechar //dest   string char   Nums          )        

u C o d e P a G e parameter identifies a code page number associated with a multibyte string. D W F l A G s parameter is used to set another control that can affect a character with a distinguishing marker such as an accent sign. These flags are usually not used and are passed 0 in the D w F l A G s parameter. P M u l t i B y t e S t r parameter is used to set the string to be converted, c c h M u l t i b y T e parameter is used to indicate the length of the string (in bytes). If the c c h M u l t i B y T e parameter is passed-1, then the function is used to determine the length of the source string. The converted U n i c o d e version string will be written to the in-memory cache whose address is specified by the P Wi D e c h a r S t r parameter. The maximum value of the cache (measured in characters) must be set in C c h Wi D e c h a r parameter. If you call M u l t i B y T e to Wi D e c h a R, pass 0 to C c h WI d e c h a r parameter, then this parameter will not perform the conversion of the string, but instead return the cached value needed to make the conversion succeed.

You can convert a multibyte string to a U n i c o d e equivalent string by using the following steps:

1) Call M u l t i B y T e to Wi D e c h a r function, for P Wi D e c h a r S t r parameter pass n u l l, for C c H Wi D e c h a r parameter pass 0.

2) Allocate enough memory blocks to hold the converted U n i c o d e string. The size of the memory block is returned by a call from the front facing m u l t B y T E to Wi D e C h a r.

3) Call again m u l t i b y t e to wi D e c h a R, this time the cached address as P Wi D e c h a r S t r parameter to pass, and pass the first call m u l t i b y t e to wi D e c h a R when the cache size is returned as C c h Wi D e c h a r parameter.

4) Use the converted string.

5) Release the memory block occupied by the U n i c o d e string.

int WideCharToMultiByte (UINT CodePage,//code page

DWORD DwFlags,//performance and mapping flags

LPCWSTR Lpwidecharstr,//Wide-character string

int Cchwidechar,//number of chars in string

LPSTR lpmultibytestr,//buffer for new string

int Cbmultibyte,//size of buffer

LPCSTR Lpdefaultchar,//default for unmappable chars

Lpbool Lpuseddefaultchar//Set when default char used)

Https://www.cnblogs.com/lizhenlin/p/6242483.html

The difference between Unicode and ANSI

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.