1. The difference between full-width and half-angle
The characteristics of Chinese characters make us encounter two basic and very important concepts, namely full-angle and half-angle. Image said, in the use of English input method, the computer screen, an English character (such as a) occupies a position, people call it half-width, and a Chinese character occupies a position equal to two English characters, it is called full-width.
English Input method, whether the input letters, symbols or numbers, always occupies only one English character position, that is, half-width. However, in the Chinese input method, there will be a full-width half-width of the two choices, to Chinese characters, the two choices have no effect on it, it always occupies two English characters, but it is important to enter the symbols, numbers and English letters in this state, as shown below:
China
China
The former input selects the half angle and the latter is the full angle. After the choice of full-width, even the letters, symbols, numbers are not an exception to be treated as Chinese characters, visually, because they account for two of the position of English characters, appear awkward many.
2. Full-width and half-angle conversion
ANSI Multi-byte font set in Chinese (Chinese characters, Chinese symbols) is the GBK encoding, English (English alphabet, English symbols) ASCII code, in general, ANSI-encoded text to facilitate the processing of programs. It is recommended that you use a uniform full-width or half-width before working with strings in text. such as the full-width space "" (corresponding to the GBK encoding a1a1), and the half-width of the space " " ( the corresponding ASCII encoding is \x20), if the text mixed use, When you want to find a space position or split a paragraph by a space, it is obvious that two types of whitespace can cause unnecessary trouble. The content in the text can be divided into the following four categories:
Classification |
Explain |
Conversion method |
Kanji and Chinese characters |
such as: "China", "good" and other Chinese characters and "..." "--" and other Chinese symbols, these can only be full-width, there is no conversion problem |
Null |
English alphabet and English symbols |
Refers to the ASCII table of 33-126 corresponding characters, such as:!*+012abc{} and so on. They have both full-width and half-width forms, such as the China and China |
Half angle to full angle: Before adding byte A3, original byte highest position 1 Full-width Turn half-width: Discards first byte, second byte highest position 0 |
Space |
The ASCII code spaces are \X20,GBK encoded with a space of A1A1 |
Unlike English letters and English symbols, this should be treated with special care. |
Control characters |
Refers to 0-31 of the characters in the ASCII table and is used to control the file format. If the carriage return line will correspond to \x0d\x0a (CR LF), this part can only be half-angle, do not need to convert |
Null |
Thus, the full-width turning half-angle program is as follows
/** Houkai * Description: Full-width and half-width conversion * Date: 2013-6-9*/#include<iostream>#include<fstream>#include<string>usingSTD::string;using namespacestd;Const CharSbc_high =- the;//the first byte of a full-width character is A3Const CharSbc_space =- the;//Full-width space is a1a1//full angle turning half anglestringSBC2DBC (Const string&SBC) { stringDBC =""; intLen =sbc.length (); for(intI=0; i<len; ++i) {if(Sbc[i] >0)//Is already a single-byte character, or a control character{dbc.append (1, Sbc[i]); } Else { if(Sbc[i] = = Sbc_high)//Full-width English letters or full-width English symbols, such as! (A3A1){dbc.append (1, sbc[i+1]&0x7f); } Else if(Sbc[i]==sbc_space && sbc[i+1]==sbc_space)//Handling Spaces individually{dbc.append (1,' '); } Else //for Chinese characters and ... and other Chinese symbols{DBC+ = Sbc.substr (i,2); } ++i; } } returnDBC;}
The procedure for half-width to full angle is as follows
//half angle turn full anglestringDBC2SBC (Const string&DBC) { stringSBC =""; intLen =dbc.length (); for(intI=0; i<len; ++i) {if(Dbc[i] <0)//Is already a double-byte character, or it is a Chinese character and an English symbol{SBC+ = Dbc.substr (i,2); ++i; } Else if(Dbc[i] = =' ')//Handling Spaces individually{SBC+=" "; } Else { if(dbc[i]>= -&& dbc[i]<=126)//a half-width English letter or half-width English symbol{sbc.append (1, Sbc_high); Sbc.append (1, dbc[i]|0x80); } Else{sbc.append (1, Dbc[i]);//control characters } } } returnSBC;}
The sample program to process a text conversion is as follows
//A simple example of working with txt textvoidProcessfile (Const Char*filename) {Ifstream infile; stringStrLine =""; stringStrresult =""; Infile.open (filename); if(infile) { while(!infile.eof ()) {getline (infile,strline); Strresult+ = strline+"\ n"; }} infile.close (); Strresult= Sbc2dbc (strresult);//Conversionsofstream outfile; Outfile.open (filename); Outfile.write (Strresult.c_str (), strresult.length ()); Outfile.flush (); Outfile.close ();}
Conversion under 3.unicode encoding
In the actual work, Unicode-encoded text may be encountered. In this case, the half-angle and full-width conversion process is consistent with the above method, only the "English alphabet and English symbols" and "space" can be processed. Of course, Unicode encoding, all characters are represented by two bytes, such as the half-width of the space is \x0020, the full-width space is \x3000, (Unicode code see: http://www.cnblogs.com/houkai/archive/2013/ 06/04/3116955.html), which avoids the addition or discarding of characters and makes processing easier. For Unicode encoding tables, you can download them. Conversion method:
A. Full-width space is 12288, half-width space is 32
B. Other character half-width (33-126) and full-width (65281-65374) correspondence is: the difference between 65248
The implementation of the program is relatively simple, based on the changetxtencoding function in UTF8 to ANSI, the following example is rewritten as follows:
//the function of the original function is to implement UTF8 encoded SZU8 Word representable unicode,unicode and then turn ANSI//Here,added "full-width to half-width program" section after UTF8 to UnicodeChar* Changetxtencoding (Char*szU8) { intWcsLen =:: MultiByteToWideChar (Cp_utf8, NULL, SzU8,-1Null0); wchar_t* Wszstring =NewWchar_t[wcslen]; :: MultiByteToWideChar (Cp_utf8, NULL, SzU8,-1, wszstring, WcsLen); //Full-width Turning half-width program//wszstring is a Unicode encoding for(intI=0; i<wcslen; i++) { if(wszstring[i]==12288)//Space{Wszstring[i]= +; } if((wszstring[i]>=65281) && (wszstring[i]<=65374) )//Other characters{Wszstring[i]-=65248; } } intAnsilen =:: WideCharToMultiByte (CP_ACP, NULL, wszstring,-1Null0, NULL, NULL);//wcslen (wszstring) Char* Szansi =New Char[Ansilen]; :: WideCharToMultiByte (CP_ACP, NULL, wszstring,-1, Szansi, Ansilen, NULL, and NULL); Delete[] wszstring; returnSzansi; }
Conversion of full-angle and half-angle