Conversion of full-angle and half-angle

Source: Internet
Author: User
Tags control characters

1. The difference between full-width and half-angle

The characteristics of Chinese characters make us encounter two basic and very important concepts, namely full-angle and half-angle. Image said, in the use of English input method, the computer screen, an English character (such as a) occupies a position, people call it half-width, and a Chinese character occupies a position equal to two English characters, it is called full-width.
English Input method, whether the input letters, symbols or numbers, always occupies only one English character position, that is, half-width. However, in the Chinese input method, there will be a full-width half-width of the two choices, to Chinese characters, the two choices have no effect on it, it always occupies two English characters, but it is important to enter the symbols, numbers and English letters in this state, as shown below:
China
China
The former input selects the half angle and the latter is the full angle. After the choice of full-width, even the letters, symbols, numbers are not an exception to be treated as Chinese characters, visually, because they account for two of the position of English characters, appear awkward many.

2. Full-width and half-angle conversion

ANSI Multi-byte font set in Chinese (Chinese characters, Chinese symbols) is the GBK encoding, English (English alphabet, English symbols) ASCII code, in general, ANSI-encoded text to facilitate the processing of programs. It is recommended that you use a uniform full-width or half-width before working with strings in text. such as the full-width space "" (corresponding to the GBK encoding a1a1), and the half-width of the space " " ( the corresponding ASCII encoding is \x20), if the text mixed use, When you want to find a space position or split a paragraph by a space, it is obvious that two types of whitespace can cause unnecessary trouble. The content in the text can be divided into the following four categories:

Classification Explain Conversion method
Kanji and Chinese characters such as: "China", "good" and other Chinese characters and "..." "--" and other Chinese symbols, these can only be full-width, there is no conversion problem Null
English alphabet and English symbols Refers to the ASCII table of 33-126 corresponding characters, such as:!*+012abc{} and so on. They have both full-width and half-width forms, such as the China and China Half angle to full angle: Before adding byte A3, original byte highest position 1
Full-width Turn half-width: Discards first byte, second byte highest position 0
Space The ASCII code spaces are \X20,GBK encoded with a space of A1A1 Unlike English letters and English symbols, this should be treated with special care.
Control characters Refers to 0-31 of the characters in the ASCII table and is used to control the file format. If the carriage return line will correspond to \x0d\x0a (CR LF), this part can only be half-angle, do not need to convert Null

Thus, the full-width turning half-angle program is as follows

/** Houkai * Description: Full-width and half-width conversion * Date: 2013-6-9*/#include<iostream>#include<fstream>#include<string>usingSTD::string;using namespacestd;Const CharSbc_high =- the;//the first byte of a full-width character is A3Const CharSbc_space =- the;//Full-width space is a1a1//full angle turning half anglestringSBC2DBC (Const string&SBC) {    stringDBC =""; intLen =sbc.length ();  for(intI=0; i<len; ++i) {if(Sbc[i] >0)//Is already a single-byte character, or a control character{dbc.append (1, Sbc[i]); }        Else        {            if(Sbc[i] = = Sbc_high)//Full-width English letters or full-width English symbols, such as! (A3A1){dbc.append (1, sbc[i+1]&0x7f); }            Else if(Sbc[i]==sbc_space && sbc[i+1]==sbc_space)//Handling Spaces individually{dbc.append (1,' '); }            Else //for Chinese characters and ... and other Chinese symbols{DBC+ = Sbc.substr (i,2); }            ++i; }    }    returnDBC;}

The procedure for half-width to full angle is as follows

//half angle turn full anglestringDBC2SBC (Const string&DBC) {    stringSBC =""; intLen =dbc.length ();  for(intI=0; i<len; ++i) {if(Dbc[i] <0)//Is already a double-byte character, or it is a Chinese character and an English symbol{SBC+ = Dbc.substr (i,2); ++i; }        Else if(Dbc[i] = =' ')//Handling Spaces individually{SBC+="   "; }        Else        {            if(dbc[i]>= -&& dbc[i]<=126)//a half-width English letter or half-width English symbol{sbc.append (1, Sbc_high); Sbc.append (1, dbc[i]|0x80); }            Else{sbc.append (1, Dbc[i]);//control characters            }        }    }    returnSBC;}

The sample program to process a text conversion is as follows

//A simple example of working with txt textvoidProcessfile (Const Char*filename)    {Ifstream infile; stringStrLine =""; stringStrresult ="";    Infile.open (filename); if(infile) { while(!infile.eof ())              {getline (infile,strline); Strresult+ = strline+"\ n";    }} infile.close (); Strresult= Sbc2dbc (strresult);//Conversionsofstream outfile;    Outfile.open (filename);      Outfile.write (Strresult.c_str (), strresult.length ());      Outfile.flush (); Outfile.close ();}

Conversion under 3.unicode encoding

In the actual work, Unicode-encoded text may be encountered. In this case, the half-angle and full-width conversion process is consistent with the above method, only the "English alphabet and English symbols" and "space" can be processed. Of course, Unicode encoding, all characters are represented by two bytes, such as the half-width of the space is \x0020, the full-width space is \x3000, (Unicode code see: http://www.cnblogs.com/houkai/archive/2013/ 06/04/3116955.html), which avoids the addition or discarding of characters and makes processing easier. For Unicode encoding tables, you can download them. Conversion method:
A. Full-width space is 12288, half-width space is 32
B. Other character half-width (33-126) and full-width (65281-65374) correspondence is: the difference between 65248
The implementation of the program is relatively simple, based on the changetxtencoding function in UTF8 to ANSI, the following example is rewritten as follows:

//the function of the original function is to implement UTF8 encoded SZU8 Word representable unicode,unicode and then turn ANSI//Here,added "full-width to half-width program" section after UTF8 to UnicodeChar* Changetxtencoding (Char*szU8) {      intWcsLen =:: MultiByteToWideChar (Cp_utf8, NULL, SzU8,-1Null0); wchar_t* Wszstring =NewWchar_t[wcslen]; :: MultiByteToWideChar (Cp_utf8, NULL, SzU8,-1, wszstring, WcsLen); //Full-width Turning half-width program//wszstring is a Unicode encoding     for(intI=0; i<wcslen; i++)    {        if(wszstring[i]==12288)//Space{Wszstring[i]= +; }        if((wszstring[i]>=65281) && (wszstring[i]<=65374) )//Other characters{Wszstring[i]-=65248; }    }    intAnsilen =:: WideCharToMultiByte (CP_ACP, NULL, wszstring,-1Null0, NULL, NULL);//wcslen (wszstring)    Char* Szansi =New Char[Ansilen]; :: WideCharToMultiByte (CP_ACP, NULL, wszstring,-1, Szansi, Ansilen, NULL, and NULL);    Delete[] wszstring; returnSzansi; }

Conversion of full-angle and half-angle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.