Unicode programming Basics

Source: Internet
Author: User
If you write Program Users in non-English countries, such as China, Japan, Eastern Europe and the Middle East, must be familiar with Unicode character sets. Especially when you use visual c ++/MFC to write programs for users in the above countries and regions, if you want to make your applications more widely used, you must consider Code Unicode compatibility, that is, it runs in Both ASCII and Unicode modes. This article will introduce some basic programming knowledge about Unicode and clarify the fuzzy understanding of many people (including myself) on this issue. For anyone who uses Visual C ++ and/or MFC programming, this article Article It must be worth reading.

What is Unicode?

Unicode is a popular solution to solve the problem of ASCII code 256 characters. As you know, the ASCII character set contains only 256 characters, expressed by numbers between 0 and 255. Including uppercase and lowercase letters, numbers, and a few special characters, such as punctuation marks and currency symbols. For most Latin languages, these characters are enough. However, many Asian and Eastern languages use more than 256 characters. Several thousand. In order to break through the limit on the number of ASCII characters, people try to use a simple method to write computer programs for languages with more than 256 characters. Unicode came into being. Unicode uses double bytes to represent a single character, so that the numeric code is mapped to a character set in multiple languages within a wider range.

Visual c ++ Solution

How can I use Unicode skillfully and effectively as a software developer? If you are writing a program using Visual C ++, Unicode compatibility means whether your program is internationalized, that is, whether your application is for the local market or international market. Once you make a decision, you must implement the specific details in the code. Fortunately, Visual C ++ provides many built-in functions to support Unicode. You can use these functions when creating a project. Before generating the application framework code, Appwizard allows developers to determine whether Unicode is supported. Win32 SDK contains some data types that follow the Unicode encoding rules. MFC provides a macro form to convert general text to Unicode data types. Developers can easily write Unicode-Supported Applications by slightly changing the coding habits.

StringC programmers generally declare a String Array Using the char keyword like the following:

 
Char STR [100];

Declare the original function as follows:

 
Void strcpy (char * Out, char * In );

To change the preceding Declaration to a Unicode Character Set that supports two bytes, you can use the following method:

 
Wchar_t STR [2, 100];

Or

 
Void wcscpy (wchar_t * Out, wchar_t * In );

In addition, Microsoft provides a preprocessing command to implement Unicode. When you create a new project using Visual C ++, Appwizard inserts a preprocessing command into the header file whenever you determine whether another character set is supported. These commands tell the compiler what character set the program wants to support. In this way, when the general data type provided by VC ++ is used, the compiler will replace the general data type with the desired character set. This makes it easy to recompile the code into a program that supports other character sets.
To activate the Unicode Standard in Visual C ++ 6.0, you can do this: After opening the project file, select "project | Settings" from the main menu to open the Project Settings dialog box => then select the "C/C ++" label => Add Unicode or _ Unicode preprocessing in the "Preprocessor definitions" edit box macro command. 1:


Figure 1 Project Settings dialog box

Note what is the difference between Unicode and _ Unicode here? The former is used for Windows header files without underscores. The latter is used for C Runtime header files with a prefix underscore.
In the code, tchar is used to replace all the places with the keyword char; lptstr is used to replace all the places with the char *; string constants defined in double quotation marks (such as "vckbase online journal ") rewrite with text macro:

 
Text ("vckbase online journal ");

When Unicode/_ Unicode preprocessing instruction is defined, the text macro is marked as a dual-byte string. Otherwise, the string is marked as an ANSI string. Text is defined as follows:

 
Text (lptstr string // ANSI or Unicode string); string is a string pointer, pointing to the interpreted Unicode or ANSI string

In this document, Microsoft provides several data types, including general types, which are compatible with ASCII and Unicode. For more information about this, see the Microsoft online documentation section "General Data Types and data types.

Sample Code

Here are some simple examples to further explore Unicode programming.

Use the ASCII character set "Hello, world ":

//*********************************
// "Hello World!" implemented by MFC! "Code
//*********************************

// Hello. CPP # include <afxwin. h> // declare the application classclass chelloapp: Public cwinapp {public: Virtual bool initinstance () ;}; // create an instance of the application classchelloapp helloapp; // declare the main window classclass chellowindow: Public cframewnd {cstatic * cs; public: chellowindow () ;}; // The initinstance function is called each // time the application first executes. bool chelloapp :: Initinstance () {m_pmainwnd = new chellowindow (); m_pmainwnd-> showwindow (m_ncmdshow); m_pmainwnd-> updatewindow (); Return true;} // The constructor for the window classchellowindow :: chellowindow () {// create the window itselfcreate (null, "Hello world! ", Ws_overlappedwindow, crect (0, 0, 200,200); // create a static labelcs = new cstatic (); CS-> Create (" Hello World ", ws_child | ws_visible | ss_center, crect (150,150,), this );}

Modify the code above to support the Unicode character set. The String constant must be changed to the corresponding UNICODE character. The method is to use text macro for string constants. This macro will tell the Preprocessor to check what character standards are used:

// The constructor for the window classchellowindow: chellowindow () {// create the window itselfcreate (null, text ("Hello world! "), Ws_overlappedwindow, crect (0, 0, 200,200); // create a static labelcs = new cstatic (); CS-> Create (text (" Hello world! "), Ws_child | ws_visible | ss_center, crect (150,150,), this );}

When the Preprocessor encounters a common data type, it checks the _ Unicode definition of the afxwin. h header file. Then insert the corresponding data type according to the Unicode definition.

The following example uses the Win32 API function and common data type to set the volume label of the C disk.

//******************
// Set the volume label of drive C
//******************

 
// Drvsvl. CPP # include <windows. h> # include <iostream. h> void main () {bool success; char volumename [max_path]; cout <"Enter the new C disk volume label:"; CIN> volumename; success = setvolumelabel ("C :\\", volumename); If (SUCCESS) cout <"Success \ n"; elsecout <"error code:" <getlasterror () <Endl ;}

By using the tchar data type, declare the character array at the top of the Code as two byte characters. The text macro is used again as a String constant:

Void main () {bool success; tchar volumename [max_path]; cout <text ("enter a new C disk volume:"); CIN> volumename; success = setvolumelabel (text ("C: \"), volumename); If (SUCCESS) cout <text ("Success \ n "); elsecout <text ("error code:") <getlasterror () <Endl ;}

Common Data Types in Visual C ++

Visual c ++ provides several data types dedicated to MFC for creating applications with internationalization features. These definitions are common and can be used in UNICODE, ASCII, DBCS (dubyte Character Set), and MBCS (Multi-byte character set ). Due to space limitations, this article does not intend to cover all the character sets mentioned above. For more information about them, see related materials. MFC provides a transparent way to implement these character sets. The character set to which the common data type is mapped and the ing method is determined based on the project settings. The default value is the ASCII mode. The other options are MBCS, DBCS, and Unicode. This article mainly discusses Unicode, so the following table only lists the ASCII ing between ASCII and Unicode characters:

Table 1:

Common MFC Data Types Maps to ASCII Map to Unicode Note
_ Tchar Char Wchar_t _ Tchar is a ing macro. When Unicode is defined, the data type is mapped to wchar_t. If Unicode is not defined, it is mapped to Char.
_ T or _ text Char constant string Wchar_t constant string Features are the same as macros. in ASCII mode, they are ignored, that is, they are deleted by the pre-processor. However, if Unicode is defined, they convert constant strings to the equivalent Unicode.
Lptstr Char *, lpstr (win32) Wchar_t * A portable 32-Bit String pointer. It maps the character type to the type set by the project.
Lpctstr Const char *, lpcstr (win32) Const wchar_t * Portable 32-bit constant string pointer. It maps character type constants to the type set by the project.

Using the common data types listed in Table 1, developers can ensure that the created Project is always for a character set, which is equivalent to a placeholder, it is replaced by a specific byte during compilation so that the application can run in Both ASCII and Unicode modes. However, it should be noted that the above general data type is Microsoft proprietary and is not compatible with the ANSI standard. For a detailed description of these general data types provided by Microsoft, refer to the msdn library documentation.

Technical notes

To successfully compile an MFC program that supports Unicode, you must use the Unicode version library of MFC. This library is an optional installation item When Visual C ++ is customized and installed.
It is important that the use of Unicode standards does not affect program execution in appearance. That is to say, no matter whether the _ Unicode generation option is set, the Code mentioned above can generate a normal program. This problem occurs only when developers use Win32 API functions of multiple versions.
When multiple versions of Win32 API functions (any Win32 API function with characters or strings as parameters) are used, the compiler determines whether to call the correct function based on whether to set the _ Unicode command. If _ Unicode is not defined, the compiler calls the ASCII function by default.

Conclusion

In summary, we can see that it is not difficult to compile a unicode version program. Just remember the slight changes in function calls when writing code. The expansion provided by Microsoft is the ability of developers to select the character set used in a transparent manner, opening the door to internationalization of application software.
Jeffrey Richter discusses Unicode in a specific chapter in his book windows core programming (translated by Wang Jianhua, Zhang huansheng, Hou likun, and so on. The translation is also good. If you are interested, try again.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.