VC text conversion to the in-machine code, input Chinese characters and numbers, output a string of 16 code (digital-〉asii code, Chinese characters-〉 GB code)

Source: Internet
Author: User

    //can be used, this program is to convert text into the in-machine code.    In-Machine code = GB Code +8080h, but study. //This program is the use of Chinese characters in the machine output is the principle of in-machine code, directly saved, in fact, quite simple. //input A string of Chinese characters and numbers of the mixed character, after the program conversion, corresponding output a string of 16 code (digital-〉asii code, Chinese characters-〉 GB code)CString temp; GetDlgItemText (Idc_edit1,m_hanzi);//save Kanji to variable M_hanziunsignedChar*b=NewUnsignedChar[M_hanzi. GetLength () +1];//new unsigned character array, assigned to unsigned character pointer bmemcpy (B,m_hanzi,m_hanzi. GetLength ());//Chinese characters in M_hanzi copied to character Pointer B (in memory)     for(intI=0; I<m_hanzi. GetLength (); i++)//Convert one character to one character (the length of the kanji){M_shuzi. Format ("%x", B[i]);//in-memory machine code characters are displayed in hexadecimalTemp=temp+m_shuzi;//Display in text box} m_shu. SetWindowText (temp);//    Delete(b);//Delete character pointer b

The English characters in the vc/c++ are still ASCII encoded. As you can imagine, when programmers in other countries use the Vc/c++ program to enter native characters, Vc/c++ uses that country's character encoding to handle these characters.

I personally suspect that the VC installation program should have a different country in the code library, which will certainly occupy a lot of space.

The encoding method of Chinese characters in 2.VC

Vc/c++ is the use of GB2312 internal code as the encoding of Chinese characters, so vc/c++ in various input and output methods, such as cin/wcin,cout/wcout,scanf/wsanf,printf/wprintf ... are based on GB2312, if the inner code of the Chinese character is not the encoding method, then the use of the above methods will not parse the Chinese characters correctly.

The following demo program can be used to simulate the process of outputting Chinese characters in vc/c++.

UnsignedCharinput[ -];cin>>input; intflag=0;  for(inti =0; I < -; i++)    {       if(Input[i] >0xa0&& Input[i]! =0)       {           if(Flag = =1) {cout<<"Chinese character"<<Endl; Flag=0; }           Else{flag++; }       }       Else if(Input[i] = =0)       {            Break; }       Else{cout<<"中文版 character"<<Endl; }}

Input: Hello China ("China" corresponds to the GB2312 code: 214 208,185 250)

Output: 中文版 character

中文版 character

中文版 character

中文版 character

中文版 character

Chinese character

Chinese character

Both Java and C # languages are Unicode-encoded, in which a character is defined in memory that is the two-byte Unicode code for that character. As shown below:

Char a= ' me '; + = The Unicode code stored in memory is: 25105

----

Above unsigned char input[50], this belongs to the declaration of Unicode encoding format, so the sentence behind you "problem arises again", there is no point. My personal opinion.

The Chinese character coding in VC

Reprinted from: http://m.blog.csdn.net/blog/wwxl1986622/7227266

The English characters in the vc/c++ are still ASCII encoded. As you can imagine, when programmers in other countries use the Vc/c++ program to enter native characters, Vc/c++ uses that country's character encoding to handle these characters.

I personally suspect that the VC installation program should have a different country in the code library, which will certainly occupy a lot of space.

Ii. solving the knowledge required to introduce a problem

Mainly requires two aspects of knowledge, the first is the character, especially the encoding of Chinese characters, as well as language and tool support, the second is vc/c++ in Mutibyte charater set and Wide Character set about memory allocation situation.

The encoding method of three Chinese characters and its processing in vc/c++

1. Introduction of Chinese character coding method

For the processing of English characters, the characters in the 7-bit ASCII character set can satisfy the usage requirement, and the input and output of English characters on the computer are very simple, so the input, storage, internal processing and output of English characters are all allowed to use the same encoding (such as ASCII code).

And the Chinese character is a kind of hieroglyphics, the word number is very many (the modern Chinese character only characters commonly used has the 六、七千个, the total word count is up to 50,000 above), and the glyph complex, each Chinese character all has "the sound, the shape, the righteousness" three elements, the same word, the homologous word also many, these all give the Chinese character's computer processing To deal with Chinese characters in a computer, we must solve the following problems: first, the input of Chinese characters, that is, how to input the complex structure of the box Chinese characters into the computer, which is the key to the processing of Chinese characters; second, how do Chinese characters be represented and stored in the computer? How can I be compatible with Latin? Finally, how to output the processing results of Chinese characters from the computer?

Therefore, it is necessary to code the Chinese characters, that is, to encode the Chinese characters. Corresponding to the above-mentioned Chinese character processing in the input, internal processing and output of the three main links, each of the Chinese character encoding includes the input code, Exchange code, internal code and glyph code. In the computer's Chinese character information Processing system, the following code conversion is required when processing Chinese characters: input code → interchange Code → internal Code → glyph code.

(1) Input code: The role is to use it and the existing standard Western keyboard combination to enter Chinese characters. The input code is also called the external code. Mainly classified as four categories:

A) Digital coding: Digital coding is to use the number of equal length string for Chinese character numbering, with this number as the input code of Chinese characters. For example, location Code, telegraph code, etc. are all digital codes.

b) Pinyin code: Pinyin code is the pronunciation of Chinese characters based on the input method.

c) Glyph code: The glyph code is the input code based on the Chinese character's glyph structure. For example, Wubi font code (Wang code).

d) Phonetic code: The phonetic code is the input code which takes into account the pronunciation and the glyph of Chinese characters.

(2) Exchange code: For Chinese characters outside the code and internal Code exchange. Exchange code of the national Standard Code is gb2312-80.

(3) Internal code: The internal code is the basic representation of Chinese characters in the computer, is the computer's identification, storage, processing and transmission of the encoding. Internal code is also a double-byte encoding, the GB code two bytes of the highest bit is set to "1", that is converted into the internal code of Chinese characters.

(4) Glyph code: Glyph Code is a Chinese character glyph information (the structure of Chinese characters, shapes, strokes, etc.) of the code, used to achieve computer output of Chinese characters (display, printing).

The encoding method of Chinese characters in 2.VC

Vc/c++ is the use of GB2312 internal code as the encoding of Chinese characters, so vc/c++ in various input and output methods, such as cin/wcin,cout/wcout,scanf/wsanf,printf/wprintf ... are based on GB2312, if the inner code of the Chinese character is not the encoding method, then the use of the above methods will not parse the Chinese characters correctly.

Look closely at the ASCII character set, starting with the 161th character, the characters that follow are not often used by the user, and negative values are not used. The GB2312 encoding method takes full advantage of this feature and uses the numerical space between 161-255 ( -95~-1) as the identification code of Chinese characters. Since 255-161 = 94 can not meet the Chinese character capacity requirements, will be every two characters and in one piece (that is, a Chinese character accounted for two bytes), obviously, 94* 94 = 8836 has basically met the requirements of the number of commonly used Chinese characters. When a computer processes characters, it is considered that the two bytes hold a kanji character when processed to two bytes large and 160 (or -95~-1) in a row. The following demo program can be used to simulate the process of outputting Chinese characters in vc/c++.

unsigned char input[50];

cin>>input;

int flag=0;

for (int i =0; i <; i++)

{

if (Input[i] > 0xa0 && input[i]! = 0)

{

if (flag = = 1)

{

cout<< "Chinese character" <<endl;

Flag = 0;

}

Else

{

flag++;

}

}

else if (input[i] = = 0)

{

Break

}

Else

{

cout<< "中文版 character" <<endl;

}

}

Input: Hello China ("China" corresponds to the GB2312 code: 214 208,185 250)

Output: 中文版 character

中文版 character

中文版 character

中文版 character

中文版 character

Chinese character

Chinese character

3. New internal code Standard---Unicode

Unicode (Uniform Code, universal Code, single code) is a character encoding used on a computer. It sets a uniform and unique binary encoding for each character in each language to meet the requirements for cross-language, cross-platform text conversion, and processing. Research and development began in 1990, officially announced in 1994. With the increased ability of computers to work, Unicode has gained popularity in the more than 10 years since its debut. The latest version of Unicode is the Unicode 4.1.0 launched on March 31, 2005. In addition, 5.0 Beta was launched on December 12, 2005 for members ' evaluation.

Unicode encoding system can be divided into two levels of encoding and implementation.

Encoding: Unicode is encoded in the same way as the Universal Character set (Universal Character set,ucs) of ISO 10646, and the current version of Unicode for practical use corresponds to UCS-2, using 16-bit encoded space. That is, each character occupies 2 bytes. This can theoretically represent a maximum of 216 characters. Basically meet the use of various languages. In fact, the current version of Unicode has not been filled with this 16-bit encoding, reserving a lot of space for special use or future expansion.

Implementation method: Unicode is implemented differently than encoding. The Unicode encoding of a character is deterministic. However, in the actual transmission process, because different system platform design is not necessarily consistent, and for space-saving purposes, the implementation of Unicode encoding is different. The implementation of Unicode is known as the Unicode Conversion format (Unicode translation format, referred to as UTF). For example, UTF-8 encoding, which is a variable-length encoding that will still be represented by a 7-bit encoding of the basic 7-bit ASCII characters, occupies one byte (the first complement of 0). In the case of mixing with other Unicode characters, a certain algorithm is converted, each character is encoded with 1-3 bytes and identified by the first 0 or 1.

Both Java and C # languages are Unicode-encoded, in which a character is defined in memory that is the two-byte Unicode code for that character. As shown below:

Char a= ' me '; + = The Unicode code stored in memory is: 25105

----

Above unsigned char input[50], this belongs to the declaration of Unicode encoding format, so the sentence behind you "problem arises again", there is no point. My personal opinion.

VC text conversion to the in-machine code, input Chinese characters and numbers, output a string of 16 code (digital-〉asii code, Chinese characters-〉 GB code)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.