Conversion of full-angle and half-angle

Last Update:2016-05-12 Source: Internet

Author: User

Tags control characters

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. The difference between full-width and half-angle

The characteristics of Chinese characters make us encounter two basic and very important concepts, namely full-angle and half-angle. Image said, in the use of English input method, the computer screen, an English character (such as a) occupies a position, people call it half-width, and a Chinese character occupies a position equal to two English characters, it is called full-width.
English Input method, whether the input letters, symbols or numbers, always occupies only one English character position, that is, half-width. However, in the Chinese input method, there will be a full-width half-width of the two choices, to Chinese characters, the two choices have no effect on it, it always occupies two English characters, but it is important to enter the symbols, numbers and English letters in this state, as shown below:
China
China
The former input selects the half angle and the latter is the full angle. After the choice of full-width, even the letters, symbols, numbers are not an exception to be treated as Chinese characters, visually, because they account for two of the position of English characters, appear awkward many.

2. Full-width and half-angle conversion

ANSI Multi-byte font set in Chinese (Chinese characters, Chinese symbols) is the GBK encoding, English (English alphabet, English symbols) ASCII code, in general, ANSI-encoded text to facilitate the processing of programs. It is recommended that you use a uniform full-width or half-width before working with strings in text. such as the full-width space "" (corresponding to the GBK encoding a1a1), and the half-width of the space " " ( the corresponding ASCII encoding is \x20), if the text mixed use, When you want to find a space position or split a paragraph by a space, it is obvious that two types of whitespace can cause unnecessary trouble. The content in the text can be divided into the following four categories:

Classification	Explain	Conversion method
Kanji and Chinese characters	such as: "China", "good" and other Chinese characters and "..." "--" and other Chinese symbols, these can only be full-width, there is no conversion problem	Null
English alphabet and English symbols	Refers to the ASCII table of 33-126 corresponding characters, such as:!*+012abc{} and so on. They have both full-width and half-width forms, such as the China and China	Half angle to full angle: Before adding byte A3, original byte highest position 1 Full-width Turn half-width: Discards first byte, second byte highest position 0
Space	The ASCII code spaces are \X20,GBK encoded with a space of A1A1	Unlike English letters and English symbols, this should be treated with special care.
Control characters	Refers to 0-31 of the characters in the ASCII table and is used to control the file format. If the carriage return line will correspond to \x0d\x0a (CR LF), this part can only be half-angle, do not need to convert	Null

Thus, the full-width turning half-angle program is as follows

/** Houkai * Description: Full-width and half-width conversion * Date: 2013-6-9*/#include<iostream>#include<fstream>#include<string>usingSTD::string;using namespacestd;Const CharSbc_high =- the;//the first byte of a full-width character is A3Const CharSbc_space =- the;//Full-width space is a1a1//full angle turning half anglestringSBC2DBC (Const string&SBC) {    stringDBC =""; intLen =sbc.length ();  for(intI=0; i<len; ++i) {if(Sbc[i] >0)//Is already a single-byte character, or a control character{dbc.append (1, Sbc[i]); }        Else        {            if(Sbc[i] = = Sbc_high)//Full-width English letters or full-width English symbols, such as! (A3A1){dbc.append (1, sbc[i+1]&0x7f); }            Else if(Sbc[i]==sbc_space && sbc[i+1]==sbc_space)//Handling Spaces individually{dbc.append (1,' '); }            Else //for Chinese characters and ... and other Chinese symbols{DBC+ = Sbc.substr (i,2); }            ++i; }    }    returnDBC;}

The procedure for half-width to full angle is as follows

//half angle turn full anglestringDBC2SBC (Const string&DBC) {    stringSBC =""; intLen =dbc.length ();  for(intI=0; i<len; ++i) {if(Dbc[i] <0)//Is already a double-byte character, or it is a Chinese character and an English symbol{SBC+ = Dbc.substr (i,2); ++i; }        Else if(Dbc[i] = =' ')//Handling Spaces individually{SBC+=" 　 "; }        Else        {            if(dbc[i]>= -&& dbc[i]<=126)//a half-width English letter or half-width English symbol{sbc.append (1, Sbc_high); Sbc.append (1, dbc[i]|0x80); }            Else{sbc.append (1, Dbc[i]);//control characters            }        }    }    returnSBC;}

The sample program to process a text conversion is as follows

//A simple example of working with txt textvoidProcessfile (Const Char*filename)    {Ifstream infile; stringStrLine =""; stringStrresult ="";    Infile.open (filename); if(infile) { while(!infile.eof ())              {getline (infile,strline); Strresult+ = strline+"\ n";    }} infile.close (); Strresult= Sbc2dbc (strresult);//Conversionsofstream outfile;    Outfile.open (filename);      Outfile.write (Strresult.c_str (), strresult.length ());      Outfile.flush (); Outfile.close ();}

Conversion under 3.unicode encoding

In the actual work, Unicode-encoded text may be encountered. In this case, the half-angle and full-width conversion process is consistent with the above method, only the "English alphabet and English symbols" and "space" can be processed. Of course, Unicode encoding, all characters are represented by two bytes, such as the half-width of the space is \x0020, the full-width space is \x3000, (Unicode code see: http://www.cnblogs.com/houkai/archive/2013/ 06/04/3116955.html), which avoids the addition or discarding of characters and makes processing easier. For Unicode encoding tables, you can download them. Conversion method:
A. Full-width space is 12288, half-width space is 32
B. Other character half-width (33-126) and full-width (65281-65374) correspondence is: the difference between 65248
The implementation of the program is relatively simple, based on the changetxtencoding function in UTF8 to ANSI, the following example is rewritten as follows:

//the function of the original function is to implement UTF8 encoded SZU8 Word representable unicode,unicode and then turn ANSI//Here,added "full-width to half-width program" section after UTF8 to UnicodeChar* Changetxtencoding (Char*szU8) {      intWcsLen =:: MultiByteToWideChar (Cp_utf8, NULL, SzU8,-1Null0); wchar_t* Wszstring =NewWchar_t[wcslen]; :: MultiByteToWideChar (Cp_utf8, NULL, SzU8,-1, wszstring, WcsLen); //Full-width Turning half-width program//wszstring is a Unicode encoding     for(intI=0; i<wcslen; i++)    {        if(wszstring[i]==12288)//Space{Wszstring[i]= +; }        if((wszstring[i]>=65281) && (wszstring[i]<=65374) )//Other characters{Wszstring[i]-=65248; }    }    intAnsilen =:: WideCharToMultiByte (CP_ACP, NULL, wszstring,-1Null0, NULL, NULL);//wcslen (wszstring)    Char* Szansi =New Char[Ansilen]; :: WideCharToMultiByte (CP_ACP, NULL, wszstring,-1, Szansi, Ansilen, NULL, and NULL);    Delete[] wszstring; returnSzansi; }

Conversion of full-angle and half-angle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More