Conversion between fullwidth and halfwidth

Source: Internet
Author: User

1. Difference Between fullwidth and halfwidth

The characteristics of Chinese characters bring us into two basic and very important concepts: full angle and half angle. In terms of image, when an English character (such as a) occupies a half-width position on a computer screen, the position of a Chinese character is equal to two English characters, so it is called the fullwidth.
In an English input method, whether it is a letter, a symbol, or a number, it always occupies only one English character position, that is, the halfwidth. However, in the Chinese input method, there are two options: full-width and half-width. For Chinese characters, these two options do not affect them, and they always occupy the positions of two English characters, however, the entered symbols, numbers, and English letters in this status are very important, as shown below:
China
China
The former selects the halfwidth, and the latter selects the fullwidth. After selecting the fullwidth, even letters, symbols, and numbers are processed as Chinese characters without exception. Visually, they are quite awkward because they occupy two English characters.

2. Conversion of full and half angles

ANSI multi-byte font set is GBK-encoded in Chinese (Chinese characters and Chinese characters) and ASCII in English (English letters and English characters). Generally, ANSI-encoded text is convenient.Program. We recommend that you use full or half-width pairs before performing operations on strings in the text. For example, the full-width space""(Corresponding GBK encoding a1a1), while the halfwidth Space" "(The corresponding ASCII code is \ x20). If the text is used in combination, when you want to find the space position or split the paragraph by space, it is obvious that the two spaces will cause unnecessary trouble. The content in the text can be divided into the following four types:

Category Explanation Conversion Method
Chinese and Chinese characters For example, Chinese characters such as "China" and "good" and "..." "--" And other special symbols in Chinese. These can only be full-width, and there is no conversion problem. Null
English letters and symbols It refers to the character 33-126 In the ASCII table, for example :! * + 012abc. They have two forms: full angle and half angle, such as China and China. Halfwidth to fullwidth: the first byte A3 is added, and the highest position of the original byte is 1.
Full-width to half-width: discard the first byte, and the second byte is at the highest position of 0
Space The space of the ASCII code is \ x20, and the space of GBK encoding is a1a1 Unlike English letters and symbols, this should be special
Control characters 0-31 characters in the ASCII table, used to control the file format. If you press enter to wrap the line, it corresponds to \ x0d \ x0a (cr lf). This part can only be half-width and does not need to be converted. Null

As a result, the procedure for converting the full-width to the half-width is as follows:

/*  * Author: Hou Kai * Description: mutual conversion between fullwidth and halfwidth * Date:  */  # Include <Iostream> # Include <Fstream> # Include < String > Using STD :: String  ;  Using   Namespace  STD;  Const   Char Sbc_high =- 93 ; //  The first byte of the fullwidth character is A3.  Const   Char Sbc_space =- 95 ; //  The full-width space is a1a1.  //  Fullwidth to halfwidth  String Sbc2dbc ( Const   String & SBC ){  String DBC = ""  ;  Int Len = SBC. Length ();  For ( Int I = 0 ; I <Len; ++ I ){  If (SBC [I]> 0 ) //  It is a single-byte character or a control character.  {DBC. append (  1 , SBC [I]);}  Else  {  If (SBC [I] = sbc_high) //  English letters or English symbols of the fullwidth, such! (A3a1)  {DBC. append (  1 , SBC [I + 1 ] & 0x7f  );}  Else   If (SBC [I] = sbc_space & SBC [I +1 ] = Sbc_space) //  Separate space Processing  {DBC. append (  1 , '   '  );}  Else   //  For Chinese characters and ~...... And other Chinese Characters  {DBC + = SBC. substr (I, 2  );} ++I ;}}  Return  DBc ;} 

The procedure for converting the halfwidth to fullwidth is as follows:

 //  Halfwidth to fullwidth  String Dbc2sbc ( Const   String & DBC ){  String SBC = ""  ;  Int Len = DBC. Length ();  For ( Int I = 0 ; I <Len; ++ I ){  If (DBC [I] < 0 ) //  It is a double byte character or a Chinese character.  {SBC + = DBC. substr (I, 2  ); ++ I ;}  Else   If (DBC [I] ='   ' ) //  Separate space Processing  {SBC + = "  "  ;}  Else  {  If (DBC [I]> = 33 & DBC [I] <= 126 ) // English letters or symbols  {SBC. append (  1  , Sbc_high); SBC. append (  1 , DBC [I] | 0x80  );}  Else  {SBC. append (  1 , DBC [I]); //  Control characters  }}}  Return SBC ;} 

The example program for processing a text conversion is as follows:

 //  A simple example of processing TXT text  Void Processfile ( Const   Char * Filename) {ifstream infile;  String Strline = ""  ;  String Strresult = ""  ; Infile. Open (filename );  If (Infile ){  While (! Infile. EOF () {Getline (infile, strline); strresult + = Strline + "  \ N  "  ;}} Infile. Close (); strresult = Sbc2dbc (strresult ); //  Conversion  Ofstream OUTFILE; OUTFILE. Open (filename); OUTFILE. Write (strresult. c_str (), strresult. Length (); OUTFILE. Flush (); OUTFILE. Close ();} 

3. Unicode Conversion

In actual work, you may encounter Unicode-encoded text. In this case, the conversion process of the halfwidth and fullwidth is the same as that of the preceding method. You only need to process "English letters and English symbols" and "spaces. Of course, in Unicode encoding, all characters are expressed in two bytes, such as the halfwidth space \ x0020, The fullwidth space is \ x3000, (for Unicode encoding, see: http://www.cnblogs.com/houkai/archive/2013/06/04/3116955.html ), this avoids adding or dropping characters and makes processing easier. You can download the Unicode encoding table. Conversion Method:
A. The fullwidth space is 12288, And the halfwidth space is 32.
B. the correspondence between the half-width (33-126) of other characters and the full-width (65281-65374) is as follows: the difference is 65248.
The program implementation is relatively simple. Based on utf8 to the changetxtencoding function in ANSI, the following is an example after Rewriting:

 //  The function of the original function is to convert utf8-encoded szu8 characters to Unicode and Unicode to ANSI.  // Here,  After utf8 to Unicode, add the "full-width to halfwidth program" section.  Char * Changetxtencoding ( Char * Szu8 ){  Int Wcslen =: multibytetowidechar (cp_utf8, null, szu8 ,- 1 , Null, 0  ); Wchar_t * Wszstring = New  Wchar_t [wcslen];: multibytetowidechar (cp_utf8, null, szu8, - 1  , Wszstring, wcslen );  //  Full-width-to-half-width Program  //  Wszstring is unicode encoded.      For ( Int I =0 ; I <wcslen; I ++ ){  If (Wszstring [I] = 12288 ) //  Space  {Wszstring [I] = 32  ;}  If (Wszstring [I]> = 65281 ) & (Wszstring [I] <= 65374 )) // Other characters  {Wszstring [I] -= 65248  ;}}  Int Ansilen =: widechartomultibyte (cp_acp, null, wszstring ,- 1 , Null, 0 , Null, null ); //  Wcslen (wszstring)      Char * Szansi = New   Char  [Ansilen];: widechartomultibyte (cp_acp, null, wszstring, - 1  , Szansi, ansilen, null, null); Delete [] wszstring;  Return  Szansi ;} 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.