Favorites: Unicode Programming

Source: Internet
Author: User
Unicode Programming[Date: 16:57:00 | Author: four-year margin]

Sender: Gege (I am making a fortune), email area: C ++
Title: Unicode Programming
Mailing station: Ethereal cloudification room (Thu Apr 25 21:00:30 2002), internal mail

This is a question that many people (including myself) have or are still confused about (Here we only discuss the UTF-16, that is, the dual-byte version ).
1. About Unicode
First, Unicode mainly uses the wchar character type, which is defined as unsigned short. We can see from the definition that this is a double byte type, that is, each character occupies 2 bytes. In this way, up to 60 thousand character types can be represented. All previous ASCII codes are distributed between 0x0000-0x00ff, while Chinese characters (including
Big5) is distributed between 0x4e00 and 0x9fff. Unicode contains almost all the text in the world. For more information about Unicode, see the following webpage.
Http://www.unicode.org/unicode/standard/translations/s-chinese.html
2. Why Unicode?
1) COM: the Unicode type must be specified in the COM specification, which is exactly the cross-platform result that Microsoft fully considers. This is why the BSTR (wchar *) type is often seen in COM.
2) Win2000 and WinNT: in these two platforms, the default character processing method is Unicode. Even if you write a non-Unicode (multibyte) program, the system will still perform a conversion of your characters during execution, which will undoubtedly waste CPU time, unicode can effectively improve the program running efficiency (only used on these two platforms ). Of course, this will happen to XP in the future.
3) versatility: Unicode allows us not to worry about Chinese characters and English characters (both two bytes ).
3. How to use Unicode
1) the recommended type is tchar (general character type ). When you define _ Unicode macro, tchar is wchar. If you do not define this macro, tchar is Char, which is incredible. Let's take a look at the definition of tchar:
# Ifdef Unicode // r_winnt
Typedef wchar tchar, * ptchar;
# Else/* Unicode * // r_winnt
Typedef char tchar, * ptchar;
# Endif /*! _ Tchar_defined */
The above Code comes from winnt. h. I have removed some irrelevant parts. Now everything is clear.
With tchar, we only need the following code:
Tchar tstr [] = _ T ("T code ");
MessageBox (tstr );
Unicode and multibyte versions are supported. _ T macro is used to convert to tchar.
2) about other processing
The first is the commonly used cstring, which itself supports Unicode. The following example illustrates the usage:
Cstring * pfilename = new cstring ("C: // tmpfile.txt ");
# Ifdef _ Unicode
M_hfile = createfile (pfilename-> allocsysstring (),
Generic_read | generic_write,
File_pai_read,
Null,
Open_existing,
File_attribute_normal,
Null );
# Else
M_hfile = createfile (pfilename-> getbuffer (pfilename-> getlength ()),
Generic_read | generic_write,
File_pai_read,
Null,
Open_existing,
File_attribute_normal,
Null );
# Endif
3) when we need to attach a value to a String constant in Unicode mode, we can use an L macro, such:
BSTR wcsstr = l "Unicode ";
Such value attachment is simple, but the string processed by L macro must be Unicode. if you attach it to a multibyte string, the character may be truncated.
In addition, VC also provides some functions such as widechartomultibyte and multibytetowidechar, and some other macros to support conversion. You can refer to msdn.
3. compiler settings:
First, we need to write _ Unicode in Preprocessor on the property page of project-> Settings-> C/C ++, and then select output in category on the Link property page, add wwinmaincrtstartup to entry-point symbol, so that our Unicode project is complete.

Sender: olddog (Wang Wangwang), email area: C ++
Title: Re: Unicode Programming
Mailing station: Ethereal cloud (Thu Apr 25 21:44:26 2002), Email Forwarding

Add:

Character sets include Unicode, acsii, MBCS, etc.

Unicde is an extension of ASCII and is encoded with 16 characters.
MBCS is a substitute for Unicode. One or two bytes (bytes) can be used to represent characters. Use two BYT
E, the first byte is lead-byte, which indicates that the next two bytes represent one character. The
Lead-byte indicates the combination of different character sets (code page). For example, the n1-n2 indicates that it is Japanese, and the n3-n4 indicates
Is a Chinese character.
If the program has been released internationally, MBCS or Unicode should be used, or the program can be modeled in multiple modes.
.
DBCS is the most common case for MBCS.

When using wide characters, pay attention to the following:
1. File name 2. Character operation (delete, right direction key move a character...) 3. String Length
4. program entry functions
CRT and MFC support single-byte, MBCS, and Unicode
String processing functions are generally divided into the following versions:
Str... single-byte
_ MBS MBCS
WCS Unicode
The class member functions of MFC are generally transplanted functions _.....

Portability between three types of Characters
The prefix _ TCS in tchar. h is used to unify the three string processing functions and define different macro Switches during compilation. You can choose the compilation method as needed.
In tchar. H, macro _ tchar is defined. When compiled according to Unicode, It is wchar_t and
_ Code or MBCS is Char during compilation

In general, we use the TCS... function to operate _ tchar.

Differences between Unicode and mcbs;
Unicode cannot be used under 95 on win_nt and win_2k platforms (the string must be 16-bit/character)
Mcbs any Win32 platform (each character can be 1 or 2 bytes)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.