Coding pain (up) the ins and outs of coding

Source: Internet
Author: User
Tags coding standards

Code origin

The growth of the coding witnessed the computer by the individual to the collective by the State to the world

First, the initial code

The computer originally coded the concept of the first computer everyone held 0101

Second, the following is the origin of the byte

"The so-called byte, the original meaning is used to denote a complete character." The initial computer performance and storage capacity are poor, so the general use of 4-bit BCD code (this code appears earlier than the computer, the earliest is used in the punch card). The BCD code indicates that the number is OK, but it is not very useful to represent letters or symbols, and it needs to be represented by multiple encodings. The 6-bit BCD code (BCDIC) was later evolved, and the 7-bit ASCII encoding that is still widely used today. However, the size of the final decision Byte, is the famous system/360. At that time, IBM designed a set of 8-bit EBCDIC codes for system/360, which covers numbers, uppercase and lowercase letters, and most commonly used symbols, while also compatible with the 6-bit bcdic codes widely used for punch cards. System/360 was successful, and also laid the basis for the character storage unit to use 8-bit length, (the character here is the first character of the byte is denoted by 8 bytes) This is the origin of 1 bytes = 8 bits. "from:http://www.guokr.com/question/542532/

Third, the origin of ANSI (ASCLL) (Mei-di Code)

A total of eight bytes can be combined with 256 (2 of 8) different states. They have a number from 0 to start the 32 states have specified a special purpose, but the terminal, the printer when the agreed-upon these bytes are transmitted, it is necessary to do some of the agreed action. Meet 00x10, terminal on line, meet 0x07, terminal on people toot called, example Good met 0x1b, printer on the print anti-white word, or terminal on the color display letters. They see this very well, so they call these 0x20 byte states as "control codes". They also put all the spaces, punctuation, numbers, uppercase and lowercase letters with a continuous byte state, has been compiled into the 127th, so that the computer can use different bytes to store English text. You see this, all feel good, so we all call this program ANSI "Ascii" code (American Standard Code for Information Interchange, the United States Information Interchange standards). All the computers in the world used the same ASCII scheme to save English text.
Later, like the construction of Babel, all over the world began to use computers, but many countries are not in English, their letters are not in the ASCII, in order to save their text in the computer, they decided to use the vacancy after 127th to represent the new letters, symbols, Also added a lot of drawing tables need to use down to the horizontal line, vertical line, cross and other shapes, has been numbered into the last state 255. The character set of this page from 128 to 255 is called "Extended character set". Since then, the greedy human no new state can be used, the United States imperialism may have not thought of the third world countries also hope that people can use the computer it!
Iv. Coding of Chinese people

GB2312

When the Chinese people get the computer, there is no available byte state to represent Chinese characters, and there are more than 6,000 commonly used Chinese characters need to be preserved. But this hard-to-do wisdom of the Chinese people, we do not hesitate to those 127th after the strange symbols are directly canceled, stipulating: a character less than 127 is the same as the original, but two more than 127 words connect prompt together, it represents a Chinese character, The previous byte (which he calls the high byte) is used from 0xa1 to 0xf7, followed by a byte (low byte) from 0xa1 to 0xFE, so that we can assemble about 7,000 more simplified Chinese characters. In these codes, we also put mathematical symbols, Roman Greek alphabet, Japanese kana have been compiled into, even in ASCII, the number, punctuation, letters are all re-compiled two bytes long code, this is often said "full-width" character, and the original under 127th is called "Half-width" character.

The Chinese people see this is very good, so they call this scheme "GB2312". GB2312 is a Chinese extension to ASCII.
But there are too many Chinese characters, and we soon find that there are a lot of people who have no way of getting their names out here, especially in some of the most troublesome country leaders. So we have to continue to GB2312 not use the code to find out to be honest and polite to use.

GBK

Later still not enough, so simply no longer require that the low byte must be 127th after the inner code, as long as the first byte is greater than 127 fixed indicates that this is the beginning of a Chinese character, whether followed by the expansion of the character set in the content. The result of the expanded coding scheme is called the GBK Standard, andGBK includes all the contents of the GB2312, while adding nearly 20,000 new Chinese characters (including traditional characters) and symbols. Later, the minority also to use the computer, so we expanded, and added thousands of new minority characters, GBK expanded into a GB18030. Since then, the Chinese nation's culture can be passed on in the computer age.
Chinese handlers see this series of Chinese character coding standards as good, so they are called "DBCS" (Double byte Charecter set DWORD character set). In the DBCS series of standards, the biggest feature is the two-byte long Chinese characters and one-byte long English characters coexist in the same set of coding scheme, so they write the program in order to support the Chinese processing, must pay attention to the string of each byte value, if this value is greater than 127, Then it is assumed that a character in a double-byte character set appears. At that time, all the computer monks who had received the blessing of the program had to read the following mantra every day times:
"A Chinese character counts two English characters!" A Chinese character counts two English characters ... "

V. Coding of the world's people

Because at that time each country like China to make a set of their own coding standards, the result of each other who do not know whose code.

An international organization called ISO (International Standard organization) decided to tackle the problem. Their approach is simple: to scrap all the regional coding schemes and to re-engage a code that includes all the cultures, letters and symbols on Earth! They intend to call it "Universal multiple-octet Coded Character Set", referred to as UCS, commonly known as "Unicode". For the Unicode versions, refer to Http://www.cnblogs.com/wpcockroach/p/3907324.html

Reference: http://blog.chinaunix.net/uid-20446794-id-1677389.html

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Coding pain (above) the ins and outs of coding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.