What is ANSI and what is Unicode? In fact, this is the two different coding standards, ANSI characters in 8bit, and Unicode characters in 16bit. (for characters that say ANSI holds English characters in single-byte, double-byte for Chinese, and Unicode, both English and Chinese characters are stored in double-byte) Unicode code is also an international standard encoding with two-byte encoding, which is incompatible with ANSI code.
Currently, it is used in networks, Windows systems, and many large software applications. The ANSI code of 8bit can only represent 256 characters, which means that 26 English letters are more than enough, but the non-western characters that have thousands of characters, such as Chinese characters, Korean, and so on, are not sufficient, so the Unicode standard is introduced.
In software development, in particular the use of C language of some of the functions of string processing, ANSI and Unicode is a distinction is used, then the ANSI type of characters and Unicode type of character how to define, how to use it? How does ANSI and Unicode convert?
A Definition part: Ansi:char str[1024]; Available string handler functions: strcpy (), strcat (), strlen (), and so on. unicode:wchar_t str[1024]; string handling functions available
Two Available functions: ANSI: Char, String handler function: Strcat (), strcpy (), strlen (), and so on Str. UNICODE: wchar_t can be used as a string handler function: Wcscat (), wcscpy (), wcslen (), and other functions that begin with WCS.
Three The system supports Windows 98: only ANSI is supported. Windows 2k: Both ANSI and Unicode supported. Windows CE: Supports Unicode only.
Description
1 Unicode is only supported in COM.
2. Windows 2000 the entire OS system is Unicode-based, so using ANSI under Windows 2000 costs a price, although no conversion is required on the encoding, but this conversion is hidden and consumes system resources (CPU, memory).
3 Unicode must be used under Windows 98, you need to manually switch the encoding yourself.
Four How to differentiate:
In our software development, it is often necessary to support the ANSI and Unicode support, it is not possible to change the type of the string in the request type conversion, and the use of the operation function on the string. For this reason, the standard C run-time library and windows provide a way to define macros.
_unicode macros (underlined) are provided in the C language, and Unicode macros (without underscores) are provided in Windows, and as long as _UNICODE macros and Unicode macros are set, the system automatically switches to the Unicode version, otherwise The system compiles and runs in ANSI manner.
Only macros are defined and cannot be converted automatically, and a series of character definition support is required.
1. TCHAR If a Unicode macro is defined, TCHAR is defined as wchar_t. typedef wchar_t TCHAR; Otherwise TCHAR is defined as char typedef char TCHAR;
2. LPTSTR If a Unicode macro is defined, LPTSTR is defined as LPWSTR. (Previously did not know LPWSTR is what east, finally understand) typedef LPTSTR LPWSTR; Otherwise TCHAR is defined as char typedef LPTSTR LPSTR;
Add: UTF-8 is available for true streaming, Unicode is an encoding scheme my understanding is that UTF-8 is a specific implementation of Unicode. Similar implementations are UTF-16 and so on.
The Ansi/unicode character and string TChar.h are String.h modifications that are used to create ansi/unicode universal strings.
Each character of a Unicode string is 16 bits.
Win9x only supports ansi;win2000/xp/2003 support Ansi/unicode;wince only Unicode attached: Some Unicode functions can also be used in Win9x, but unexpected errors can occur.
wchar_t is the data type of the Unicode character.
All Unicode functions start with WCS, and ANSI functions begin with STR;
ANSI C specifies that the C run-time library supports ANSI and Unicode
ANSI Unicode
Char *strcat (char *, const char*)
wchar_t *wcscat (wchar_t *, const wchar_t *)
Char *STRCHR (const char *, int)
wchar_t *WCSCHR (const wchar_t *, int)
int strcmp (const char *, const char *)
int wcscmp (CONST wchar_t *, const wchar_t *)
Char *strcpy (char *, const char *)
wchar_t *wcscpy (wchar_t *, const wchar_t *)
size_t strlen (const char *)
wchar_t wcslen (const wchar_t *)
L "Wash": Used to convert an ANSI string to a Unicode string;
_text ("Wash") is converted based on whether Unicode or _unicode are defined.
Attached: _unicode for C run-in; Unicode for Windows header files.
Ansi/unicode Common data types
Both (ansi/unicode) ANSI Unicode
LPCTSTR LPCSTR LPCWSTR
LPTSTR LPSTR LPWSTR
Pctstr Pcstr Pcwstr
Ptstr PSTR Pwstr
TBYTe (TCHAR) CHAR WCHAR
It is best to provide ANSI and Unicode functions when designing DLLs, and the ANSI function is used only to allocate memory, convert characters to Unicode characters, and then call Unicode functions.
It is best to use operating system functions, less use or not practical C run-time functions
Eg: operating system string Functions (ShlWApi.h) StrCat (), STRCHR (), STRCMP (), StrCpy () Note that they are case-sensitive, and also distinguish between ANSI and Unicode versions
Attached: The ANSI version of the function after the original function to increase the write letter a Unicode function after the original function to increase the write letter W
Become ANSI and UNICODE-compliant functions
? Treats a text string as an array of characters instead of a C h a R S array or a byte array.
? Common data types such as T-C H A R and p T S t r are used for text characters and strings.
? Use explicit data types (such as B y T e and P b y t e) for Byte, byte pointers, and data caches.
? Use the T-E X T macro for literal characters and strings.
? Modify the string arithmetic problem.
such as: sizeof (szbuffer), sizeof (szbuffer)/sizeof (TCHAR) malloc (CharNum), malloc (CharNum * sizeof ( TCHAR))
Functions for Unicode character manipulation are also: (also available in ANSI and Unicode versions) Lstrcat (), lstrcmp ()/Lstrcmpi () [They are internally called comparestring ()], lstrcpy (), Lstrl En () These are implemented as macros.
C Run-time functions Windows functions
ToLower () ptstr charlower (Ptstr pszstring)
ToUpper () ptstr charupper (Ptstr pszstring)
Isalpha () bool Ischaralpha (TCHAR CH) bool Ischaralphanumeric (TCHAR C H
Islower () BOOL ischarlower (TCHAR ch)
Isupper () BOOL ischarupper (TCHAR ch)
Print () wsprintf ()
Convert buffer:
DWORD Charlowerbuffer (Ptstr pszstring, DWORD cchstring) DWORD Charupperbuffer (Ptstr pszstring, DW ORD cchstring)
You can also convert a single character, such as: TCHAR Clowercasechar = Charlower ((ptstr) szstring[0])
Determines whether the character is ANSI or Unicode
BOOL Istextunicode (const VOID * pbuffer,//input buffer to be examined int CB,//size of input bu Ffer lpint LPI//options)
Attached: This function does not implement code in the Win9x system, always returns false
Conversion between Unicode and ANSI
Char sza[40]; WCHAR szw[40]; Normal Sprintf:all string is ANSI
sprintf (SzA, "%s", "ANSI str"); Convert Unicode String to ANSI
sprintf (SzA, "%s", L "Unicode str"); Normal Swprintf:all string is Unicode
Swprinf (SZW, "%s", L "Unicode str"); Convert ANSI String to Unicode
Swprinf (SZW, L "%s", "ANSI str");
int ( uint ucodepage, //code page, 0 dword dwflags, //character-type options, 0 PCSTR pmultibyte, //source string addr int cchmultibyte, //source string byte length pwstr pwidecharstr, //dest string addr int cchwidechar //dest string char Nums )
u C o d e P a G e parameter identifies a code page number associated with a multibyte string. D W F l A G s parameter is used to set another control that can affect a character with a distinguishing marker such as an accent sign. These flags are usually not used and are passed 0 in the D w F l A G s parameter. P M u l t i B y t e S t r parameter is used to set the string to be converted, c c h M u l t i b y T e parameter is used to indicate the length of the string (in bytes). If the c c h M u l t i B y T e parameter is passed-1, then the function is used to determine the length of the source string. The converted U n i c o d e version string will be written to the in-memory cache whose address is specified by the P Wi D e c h a r S t r parameter. The maximum value of the cache (measured in characters) must be set in C c h Wi D e c h a r parameter. If you call M u l t i B y T e to Wi D e c h a R, pass 0 to C c h WI d e c h a r parameter, then this parameter will not perform the conversion of the string, but instead return the cached value needed to make the conversion succeed.
You can convert a multibyte string to a U n i c o d e equivalent string by using the following steps:
1) Call M u l t i B y T e to Wi D e c h a r function, for P Wi D e c h a r S t r parameter pass n u l l, for C c H Wi D e c h a r parameter pass 0.
2) Allocate enough memory blocks to hold the converted U n i c o d e string. The size of the memory block is returned by a call from the front facing m u l t B y T E to Wi D e C h a r.
3) Call again m u l t i b y t e to wi D e c h a R, this time the cached address as P Wi D e c h a r S t r parameter to pass, and pass the first call m u l t i b y t e to wi D e c h a R when the cache size is returned as C c h Wi D e c h a r parameter.
4) Use the converted string.
5) Release the memory block occupied by the U n i c o d e string.
int WideCharToMultiByte (UINT CodePage,//code page
DWORD DwFlags,//performance and mapping flags
LPCWSTR Lpwidecharstr,//Wide-character string
int Cchwidechar,//number of chars in string
LPSTR lpmultibytestr,//buffer for new string
int Cbmultibyte,//size of buffer
LPCSTR Lpdefaultchar,//default for unmappable chars
Lpbool Lpuseddefaultchar//Set when default char used)
Https://www.cnblogs.com/lizhenlin/p/6242483.html
The difference between Unicode and ANSI