Favorites: Unicode Programming

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Unicode Programming[Date: 16:57:00 | Author: four-year margin]

Sender: Gege (I am making a fortune), email area: C ++
Title: Unicode Programming
Mailing station: Ethereal cloudification room (Thu Apr 25 21:00:30 2002), internal mail

This is a question that many people (including myself) have or are still confused about (Here we only discuss the UTF-16, that is, the dual-byte version ).
1. About Unicode
First, Unicode mainly uses the wchar character type, which is defined as unsigned short. We can see from the definition that this is a double byte type, that is, each character occupies 2 bytes. In this way, up to 60 thousand character types can be represented. All previous ASCII codes are distributed between 0x0000-0x00ff, while Chinese characters (including
Big5) is distributed between 0x4e00 and 0x9fff. Unicode contains almost all the text in the world. For more information about Unicode, see the following webpage.
Http://www.unicode.org/unicode/standard/translations/s-chinese.html
2. Why Unicode?
1) COM: the Unicode type must be specified in the COM specification, which is exactly the cross-platform result that Microsoft fully considers. This is why the BSTR (wchar *) type is often seen in COM.
2) Win2000 and WinNT: in these two platforms, the default character processing method is Unicode. Even if you write a non-Unicode (multibyte) program, the system will still perform a conversion of your characters during execution, which will undoubtedly waste CPU time, unicode can effectively improve the program running efficiency (only used on these two platforms ). Of course, this will happen to XP in the future.
3) versatility: Unicode allows us not to worry about Chinese characters and English characters (both two bytes ).
3. How to use Unicode
1) the recommended type is tchar (general character type ). When you define _ Unicode macro, tchar is wchar. If you do not define this macro, tchar is Char, which is incredible. Let's take a look at the definition of tchar:
# Ifdef Unicode // r_winnt
Typedef wchar tchar, * ptchar;
# Else/* Unicode * // r_winnt
Typedef char tchar, * ptchar;
# Endif /*! _ Tchar_defined */
The above Code comes from winnt. h. I have removed some irrelevant parts. Now everything is clear.
With tchar, we only need the following code:
Tchar tstr [] = _ T ("T code ");
MessageBox (tstr );
Unicode and multibyte versions are supported. _ T macro is used to convert to tchar.
2) about other processing
The first is the commonly used cstring, which itself supports Unicode. The following example illustrates the usage:
Cstring * pfilename = new cstring ("C: // tmpfile.txt ");
# Ifdef _ Unicode
M_hfile = createfile (pfilename-> allocsysstring (),
Generic_read | generic_write,
File_pai_read,
Null,
Open_existing,
File_attribute_normal,
Null );
# Else
M_hfile = createfile (pfilename-> getbuffer (pfilename-> getlength ()),
Generic_read | generic_write,
File_pai_read,
Null,
Open_existing,
File_attribute_normal,
Null );
# Endif
3) when we need to attach a value to a String constant in Unicode mode, we can use an L macro, such:
BSTR wcsstr = l "Unicode ";
Such value attachment is simple, but the string processed by L macro must be Unicode. if you attach it to a multibyte string, the character may be truncated.
In addition, VC also provides some functions such as widechartomultibyte and multibytetowidechar, and some other macros to support conversion. You can refer to msdn.
3. compiler settings:
First, we need to write _ Unicode in Preprocessor on the property page of project-> Settings-> C/C ++, and then select output in category on the Link property page, add wwinmaincrtstartup to entry-point symbol, so that our Unicode project is complete.

Sender: olddog (Wang Wangwang), email area: C ++
Title: Re: Unicode Programming
Mailing station: Ethereal cloud (Thu Apr 25 21:44:26 2002), Email Forwarding

Add:

Character sets include Unicode, acsii, MBCS, etc.

Unicde is an extension of ASCII and is encoded with 16 characters.
MBCS is a substitute for Unicode. One or two bytes (bytes) can be used to represent characters. Use two BYT
E, the first byte is lead-byte, which indicates that the next two bytes represent one character. The
Lead-byte indicates the combination of different character sets (code page). For example, the n1-n2 indicates that it is Japanese, and the n3-n4 indicates
Is a Chinese character.
If the program has been released internationally, MBCS or Unicode should be used, or the program can be modeled in multiple modes.
.
DBCS is the most common case for MBCS.

When using wide characters, pay attention to the following:
1. File name 2. Character operation (delete, right direction key move a character...) 3. String Length
4. program entry functions
CRT and MFC support single-byte, MBCS, and Unicode
String processing functions are generally divided into the following versions:
Str... single-byte
_ MBS MBCS
WCS Unicode
The class member functions of MFC are generally transplanted functions _.....

Portability between three types of Characters
The prefix _ TCS in tchar. h is used to unify the three string processing functions and define different macro Switches during compilation. You can choose the compilation method as needed.
In tchar. H, macro _ tchar is defined. When compiled according to Unicode, It is wchar_t and
_ Code or MBCS is Char during compilation

In general, we use the TCS... function to operate _ tchar.

Differences between Unicode and mcbs;
Unicode cannot be used under 95 on win_nt and win_2k platforms (the string must be 16-bit/character)
Mcbs any Win32 platform (each character can be 1 or 2 bytes)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Favorites: Unicode Programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Favorites: Unicode Programming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support