Summary of Unicode, ANSI character set, and related string operations
Q How to display Unicode strings
A
If the program defines _ Unicode macro, directly use
Wchar * STR = l "unicodestring ";
Textout (0, 0, STR );
Otherwise, the conversion type is required.
# Include <comdef. h>
Wchar * STR = l "unicodestring ";
Bstr_t str1 = STR;
Textout (0, 0, (char *) str1 );
Q how to convert ANSI and Unicode
A
Convert ANSI to Unicode
(1) Use the macro L, for example, clsidfromprogid (L "mapi. folder", & CLSID );
(2) Implement conversion through the multibytetowidechar function, for example:
Char * szprogid = "mapi. folder ";
Wchar szwideprogid [128];
CLSID;
Long llen = multibytetowidechar (cp_acp, 0, szprogid, strlen (szprogid), szwideprogid, sizeof (szwideprogid ));
Szwideprogid [llen] = '/0 ';
(3) using the a2w macro, for example:
Uses_conversion;
Clsidfromprogid (a2w (szprogid), & CLSID );
Convert Unicode to ANSI
(1) Use widechartomultibyte, for example:
// Assume that you already have a unicode string wszsomestring...
Char szansistring [max_path];
Widechartomultibyte (cp_acp, wc_compositecheck, wszsomestring,-1, szansistring, sizeof (szansistring), null, null );
(2) Use the w2a macro, for example:
Uses_conversion;
Ptemp = w2a (wszsomestring );
Note the possible problems during conversion:
Because ANSI is converted to Unicode, if a2w or multibytetowidechar (the first parameter is cp_acp) is used, the imported ANSI string is treated as a multi-bytes String Based on the default conversion table, if it is a Chinese character (Windows is a Chinese character by default), a byte greater than 0x87 may be considered as a Chinese character together with the next byte, and then converted to the same UNICODE Chinese character based on the Unicode encoding of the Chinese character, if the corresponding encoding cannot be found, it is generally replaced by a default character (generally the question mark "? "), From this point of view, if you convert a piece of data to another, the conversion is very complex and may be irreversible, and the ANSI code you have encrypted is quite messy, there are many byte> 0x87, the conversion is irreversible.
We recommend that you write the statement as follows:
Char lpansi [count];
Wchar lpunicode [count];
Int I = 0;
While (lpansi [I]! = '/0 '){
Lpunicode [I] = (wchar) lpansi [I];
}
Lpunicode [I] = l'/0 ';
And then convert it back in the same way, because for 0 ~ The Unicode code corresponding to the ANSI string 0x87 is the same 16-bit value. For others, your string is encrypted and there is no need to convert it to display the same character, in the same way, if the strings in the middle do not need to be displayed or otherwise, reutrn (lpwstr) lpansi will be used directly. In the past, you can simply accept it when you accept it.
Q How to make the program support Unicode
A
The kernel of the NT System is Unicode code. Generally, projects created by VC are ANSI code by default (compatible with Win9x ), in NT, when the ANSI program calls the Windows API, the system actually implements an ANSI to Unicode code conversion. For example, move20.wa actually calls movewindoww. if we don't consider Win9x in our program (sooner or later, tomorrow's yellow flowers), we will use Unicode to compile it directly, so the code execution efficiency of the program will certainly increase. details:
(0 ). on the VC compilation option, select "use Unicode Character Set" on the "Character Set" on the property page of the project above vc7.0, which may be a little troublesome in vc6.0, first, copy the Unicode version of the VC Runtime Library to the VC path. the ANSI of LIB corresponds to xxxu. lib, which is not installed when Vc is installed by default.
(0). 1. Change the language definition:
Change _ MBCS to _ Unicode in "Preprocessor definitions" on the "C ++" page of Project Settings
(0). 2. Modify the entry function:
Add/entry: "wwinmaincrtstartup" to "project options" on the "Link" page.
(1) In code, most of the characters are processed by macros in tchar. H. For example, strcpy is replaced by _ tcscpy and tchar is used to represent char,
Use tchar m_mystr [] = _ T ("XXXX") to replace char m_mystr [] = "XXXX ";
(2) When debugging Unicode programs, you need to select all options for VC during installation; otherwise, the dynamic library and the corresponding. Lib file will be missing.
Q How to obtain the number of characters in a string that contains both single-byte and double-byte characters?
A
You can call the Runtime Library of Microsoft Visual C ++ to contain the function _ mbslen to operate multi-byte strings (including single-byte and dual-byte strings.
Calling the strlen function does not really know how many characters are in the string. It only tells you how many bytes are before the end of 0.
Q How do I operate DBCS strings?
A
Function Description
Ptstr charnext (lpctstr); returns the address of the next character in the string
Ptstr charprev (lpctstr, lpctstr); returns the address of the previous character in the string.
Bool isdbcsleadbyte (byte); if this byte is the first byte of the DBCS character, a non-zero value is returned.
Q why Unicode?
A
(1) It is easy to exchange data between different languages.
(2) enable you to allocate a single. EXE file or DLL file that supports all languages.
(3) improve the running efficiency of applications.
Windows 2000 is developed from scratch using Unicode. If you call any windows function and pass it an ANSI string, the system must first convert the string to Unicode, then, the Unicode string is passed to the operating system. If you want the function to return an ANSI string, the system first converts the Unicode string to an ANSI string and then returns the result to your application. To convert these strings, the system time and memory are required. By developing applications with Unicode from the beginning, you can make your applications run more effectively.
Windows CE itself is an operating system that uses Unicode and does not support ANSI Windows functions.
Windows 98 only supports ANSI and can only develop applications for ANSI.
When Microsoft converts com from a 16-bit windows to Win32, the company determines that all the COM interface methods that require strings can only accept Unicode strings.
Q How to Write Unicode source code?
A
Microsoft has designed windowsapi for Unicode to minimize the impact of code. In fact, you can write a single source code file to compile it with or without Unicode. You only need to define two macros (Unicode and _ Unicode) to modify and re-compile the source file.
_ Unicode macro is used for the C Runtime header file, while Unicode macro is used for the Windows header file. When compiling the source code module, these two macros must be defined at the same time.
Q what types of Unicode data are defined in windows?
A
Data Type description
Wchar Unicode Character
Pwstr pointer to Unicode string
Pcwstr pointer to a constant Unicode string
The corresponding ANSI data types are char, lpstr, and lpcstr.
The Common Data Types of ANSI/Unicode are tchar, ptstr, and lpctstr.
Q how to operate Unicode?
A
Character Set feature instance
ANSI operation functions start with str strcpy
Unicode operation functions start with the WCS wcscpy
The MBCS operation function starts with _ MBS _ mbscpy
ANSI/Unicode operation functions start with _ TCS _ tcscpy (C Runtime Library)
ANSI/Unicode operation functions start with lstr lstrcpy (Windows function)
All new and outdated functions have both ANSI and Unicode versions in Windows2000. Functions of the ANSI version end with a, and functions of the Unicode version end with W. Windows will be defined as follows:
# Ifdef Unicode
# Define createmediawex createmediawexw
# Else
# Define createmediawex createmediawexa
# Endif //! Unicode
Q How do I represent a unicode String constant?
A
Character Set instance
ANSI "string"
Unicode L "string"
ANSI/Unicode T ("string") or _ text ("string") if (szerror [0] ==_ text ('J ')){}
Q why should I try to use operating system functions?
A
Secret. Because these functions are used a lot, they may have been loaded into RAM when the application is running.
Such as strcat, strchr, strcmp, and strcpy.
Q How do I write ANSI and Unicode-compliant applications?
A
(1) treat a text string as a character array instead of a chars array or byte array.
(2) Use common data types (such as tchar and ptstr) for text characters and strings.
(3) Use explicit data types (such as byte and pbyte) for byte, byte pointer, and data cache.
(4) use the text macro for the original characters and strings.
(5) perform global replacement (for example, replace pstr with ptstr ).
(6) Modifying string operations. For example, a function usually needs to pass a cached size in characters, rather than bytes. This means that sizeof (szbuffer) should not be passed, but sizeof (szbuffer)/sizeof (tchar) should be passed ). In addition, if you need to allocate a memory block to the string and have the number of characters in the string, remember to allocate memory by byte. That is to say, you should call
Malloc (ncharacters * sizeof (tchar) instead of calling malloc (ncharacters ).
Q How to compare the selected strings?
A
It is implemented by calling comparestring.
Logo meaning
Norm_ignorecase ignores uppercase and lowercase letters
Norm_ignorekanatype does not distinguish hirakana from katakana
Norm_ignorenonspace ignore no delimiter
Norm_ignoresymbols ignore symbols
Norm_ignorewidth does not distinguish between single-byte characters and double-byte characters.
Sort_stringsort uses punctuation marks as common symbols.
Q How can I determine whether a text file is ANSI or Unicode?
A
It is determined that if the first two bytes of the text file are 0xff and 0xfe, It is Unicode, otherwise it is ANSI.
Q How do I determine whether a string is ANSI or Unicode?
A
Use istextunicode for determination. Istextunicode uses a series of statistical and qualitative methods to guess the cached content. Because this is not an exact scientific method, istextunicode may return incorrect results.
Q how to convert a string between Unicode and ANSI?
A
The Windows function multibytetowidechar is used to convert a multi-byte string to a wide string. The function widechartomultibyte converts a wide string to an equivalent multi-byte string.
Q How To Get The Unicode encoding of Chinese Characters
A
# Include "comdef. H"
Char * str1 = "hello ";
_ Bstr_t STR = str1;
Wchar * str2 = STR;
Str2 is the Unicode code you want
Q How to Implement the conversion between the encoding #21592 #24037 #36873 #25321 and Chinese characters?
A
Cstring STR = "#21592 #24037 #36873 #25321 ";
STR + = '#';
Cstring str1 = "";
Wchar str2 [5] = {0, 0, 0 };
Int J = 0;
Do
{
Str1 = Str. Left (Str. Find ('#', 1 ));
STR = Str. mid (Str. Find ('#', 1 ));
Wchar I = 0;
Sscanf (str1, "# % d", & I );
Str2 [J] = I;
J ++;
} While (str1! = "");
_ Bstr_t str3 = str2;