The first problem for beginners in Windows development is to understand the strings.
I. Brief History of character sets
A character is a concept independent of a computer. Without a computer, there are still characters, characters are character symbols, English, Chinese, and Japanese.
A string is an array of characters.
Previously, the computer used a single byte to represent a single character. byte is an 8-bit unsigned number. The single-byte ASCII code can represent uppercase and lowercase English and some punctuation marks.
Later, in order to display Chinese characters, Japanese characters and other Asian characters, the dual-byte encoding was developed, using two or one byte to represent a character.
AndCodePage, two bytes in different code pages, the characters are also different, the simplified Chinese code page is GBK, the traditional Chinese is big5, and so on.
In such a string, there are both single-byte characters and double-byte characters, it is difficult to count the number of characters in a string.
Then there is Unicode, Unicode encoding system in the UTF-16 standard, the provisions will use two bytes to represent a character, this can represent the characters in various languages in the world, do not need to distinguish the code page, they have their own 16-bit encoding in the UTF-16 coding system, although the storage space is increased, but the efficiency is obviously improved.
2. Various string processing function libraries
1 C Runtime Library processing strings:
Str * series: The C Runtime uses strlen strcpy and other Str * series functions to process char strings.
WCS * series: After the C compiler has the built-in wchar_t 16-bit wide byte type, use the new wcslen wcscpy and other functions of the WCS * series to process the wchar_t type strings.
_ TCS * series: The C Runtime Library defines macro definitions of _ tcslen _ tcscpy and other _ TCS * series, during compilation, the _ TCS * series functions are determined based on whether macro _ Unicode is predefined. Otherwise, the * series functions are used.
_ TCS * _ S Series: The latest secure string functions of the C Runtime Library. New functions are defined to prevent Buffer Overflow.
2. The windows development team processes strings:
Windows development team in winnt. h defines the new data types char (char), wchar (wchar_t), tchar, and tchar. during compilation, determine whether the Unicode macro is predefined or not to determine whether it is Char or wchar.
Lstr * A Series: defined in kernel32.dll of the Windows operating system. Processing char strings is actually a layer of packaging for lstr * w.
Lstr * W series: It is defined in kernel32.dll of Windows operating system to process string of the wchar type.
Lstr * series: lstr * series functions are also used to determine whether to use the lstr * a series functions or the lstr * W series functions based on whether Unicode macros are predefined during compilation.
In fact, I usually use the lstr * series, because it is written by the Windows API.ProgramYou do not need to link the C Runtime Library.
Note that the _ Unicode and Windows Unicode macros of the C Runtime Library are either defined at the same time or not defined at the same time. The prefix underline Of the C Runtime library is used to comply with the damn C ++ standard (for macros that do not belong to the C ++ standard with underscores), while Windows does not comply with that standard.
==========================================================
Although the string processing looks messy, it is enough to grasp a single application. Although the currently recommended standard is the _ TCS * _ S series, you can use it as needed, the string type in STL library and MFC library is not introduced, because the two libraries are c ++ library, not C library.