OriginalArticleFor more information, see the author and source.
Initial launch
Http://blog.csdn.net/beyondcode
Http://www.cnblogs.com/beyond-code/
Http://hi.baidu.com/beyondcode
Today, I will start my second article. In this chapter, I will introduce the Unicode and ASCII Encoding Problems in Windows programming.
I don't know if you have such a problem. Create a Windows application.ProgramThe MessageBox function is called to bring up a prompt text. However, the compiler reports an error during compilation and cannot convert const char * or const char [] to const.Wchar_t* And other tips. Many friends who are new to Windows API programming may be stuck here. I don't know how to solve it. In fact, this is the problem of Unicode encoding and ASCII encoding. I will come here one by one
I will not describe how Unicode and ASCII are encoded in detail here. If you need further understanding, there are many articles on this topic, here I will only introduce the Unicode encoding and ASCII encoding related to programming on the Windows platform.
We all know that the biggest difference between Unicode and ASCII is that Unicode uses two bytes to store a single character. Whether it is English, Chinese characters, or text from other countries, it can be encoded in two bytes, ASCII uses one byte to store one character. Therefore, it is sufficient for English encoding. However, some special methods must be used for Chinese character encoding, represents a Chinese character with two ASCII characters.
In the process of writing a program, we are bound to deal with characters. We need to enter, retrieve, and display characters. Whether to choose Unicode or ASCII characters is your own right. However, Unicode encoding is recommended to ensure the universality of the program and to comply with the mainstream trend of the current operating system. Since Unicode characters are twice the space occupied by ASCII characters, the size and memory occupied by compiled programs must be larger, but this is not a big problem. Therefore, Microsoft's current SDK retains two sets of APIS, one for writing a program that uses unicode encoding to process characters, and one for compiling a program that uses ASCII encoding to process characters. For example, the MessageBox we mentioned above is not a function name, but a macro definition. Let's first look at how it is defined and then discuss it.
# Ifdef Unicode
# Define MessageBoxMessageboxw
# Else
# Define MessageBox messageboxa
# Endif
Have you seen it? Is it easy? If the Unicode macro is defined, the MessageBox is defined as messageboxw. If the Unicode macro is not defined, the MessageBox is defined as messageboxa, the W and a behind MessageBox represent the wide bytes (UNICODE) and ASCII. In this way, the functions actually exist in the SDK are messageboxw and messageboxa.
MessageBox is just a macro. Therefore, you can use the three names in the program, but note that if messageboxa is used, you must note that the parameters passed to it must all be single-byte characters, that is, ASCII, Which is Char in the program. If messageboxw is used, all characters must use Unicode, and wchar_t is used in the program. However, it is very inconvenient that, if you use functions of the W extension series, the characters used by your program are Unicode characters encoded, but if you need to use this programSource codeIf you compile a program that uses ASCII encoding, the changes are too large. All operations involving characters must be changed. Is there a better way to use the same method without making any changes?CodeCompile the program of the ASCII version.
Of course, we try to use macro definitions without suffixes during programming. In the above example, MessageBox is used, the parameters do not explicitly use Char or wchar_t, but use the tchar data type defined by Microsoft. Its definition is similar to that defined by the MessageBox function above, both determine whether tchar is defined as Char or wchar_t based on whether the Unicode macro is defined. Therefore, the Data Type of the tchar is variable, it is defined as the final character type according to the settings of the project, so that our program can easily compile another version without any changes. Is it very convenient.
In the previous two articles, there are many plain texts, because many of them are conceptual and need to be understood. I will use some simple API functions together with some small sample programs in the following articles, the related concepts are described together. Therefore, if you are not very familiar with the first two articles, you don't have to worry about it. The impact is not very great. After further study, you will gradually understand the content mentioned above.
By-deathcode