Coding pain (up) the ins and outs of coding

Last Update:2015-07-29 Source: Internet

Author: User

Tags coding standards

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Code origin

The growth of the coding witnessed the computer by the individual to the collective by the State to the world

First, the initial code

The computer originally coded the concept of the first computer everyone held 0101

Second, the following is the origin of the byte

"The so-called byte, the original meaning is used to denote a complete character." The initial computer performance and storage capacity are poor, so the general use of 4-bit BCD code (this code appears earlier than the computer, the earliest is used in the punch card). The BCD code indicates that the number is OK, but it is not very useful to represent letters or symbols, and it needs to be represented by multiple encodings. The 6-bit BCD code (BCDIC) was later evolved, and the 7-bit ASCII encoding that is still widely used today. However, the size of the final decision Byte, is the famous system/360. At that time, IBM designed a set of 8-bit EBCDIC codes for system/360, which covers numbers, uppercase and lowercase letters, and most commonly used symbols, while also compatible with the 6-bit bcdic codes widely used for punch cards. System/360 was successful, and also laid the basis for the character storage unit to use 8-bit length, (the character here is the first character of the byte is denoted by 8 bytes) This is the origin of 1 bytes = 8 bits. "from:http://www.guokr.com/question/542532/

Third, the origin of ANSI (ASCLL) (Mei-di Code)

A total of eight bytes can be combined with 256 (2 of 8) different states. They have a number from 0 to start the 32 states have specified a special purpose, but the terminal, the printer when the agreed-upon these bytes are transmitted, it is necessary to do some of the agreed action. Meet 00x10, terminal on line, meet 0x07, terminal on people toot called, example Good met 0x1b, printer on the print anti-white word, or terminal on the color display letters. They see this very well, so they call these 0x20 byte states as "control codes". They also put all the spaces, punctuation, numbers, uppercase and lowercase letters with a continuous byte state, has been compiled into the 127th, so that the computer can use different bytes to store English text. You see this, all feel good, so we all call this program ANSI "Ascii" code (American Standard Code for Information Interchange, the United States Information Interchange standards). All the computers in the world used the same ASCII scheme to save English text.
Later, like the construction of Babel, all over the world began to use computers, but many countries are not in English, their letters are not in the ASCII, in order to save their text in the computer, they decided to use the vacancy after 127th to represent the new letters, symbols, Also added a lot of drawing tables need to use down to the horizontal line, vertical line, cross and other shapes, has been numbered into the last state 255. The character set of this page from 128 to 255 is called "Extended character set". Since then, the greedy human no new state can be used, the United States imperialism may have not thought of the third world countries also hope that people can use the computer it!
Iv. Coding of Chinese people

GB2312

When the Chinese people get the computer, there is no available byte state to represent Chinese characters, and there are more than 6,000 commonly used Chinese characters need to be preserved. But this hard-to-do wisdom of the Chinese people, we do not hesitate to those 127th after the strange symbols are directly canceled, stipulating: a character less than 127 is the same as the original, but two more than 127 words connect prompt together, it represents a Chinese character, The previous byte (which he calls the high byte) is used from 0xa1 to 0xf7, followed by a byte (low byte) from 0xa1 to 0xFE, so that we can assemble about 7,000 more simplified Chinese characters. In these codes, we also put mathematical symbols, Roman Greek alphabet, Japanese kana have been compiled into, even in ASCII, the number, punctuation, letters are all re-compiled two bytes long code, this is often said "full-width" character, and the original under 127th is called "Half-width" character.

The Chinese people see this is very good, so they call this scheme "GB2312". GB2312 is a Chinese extension to ASCII.
But there are too many Chinese characters, and we soon find that there are a lot of people who have no way of getting their names out here, especially in some of the most troublesome country leaders. So we have to continue to GB2312 not use the code to find out to be honest and polite to use.

GBK

Later still not enough, so simply no longer require that the low byte must be 127th after the inner code, as long as the first byte is greater than 127 fixed indicates that this is the beginning of a Chinese character, whether followed by the expansion of the character set in the content. The result of the expanded coding scheme is called the GBK Standard, andGBK includes all the contents of the GB2312, while adding nearly 20,000 new Chinese characters (including traditional characters) and symbols. Later, the minority also to use the computer, so we expanded, and added thousands of new minority characters, GBK expanded into a GB18030. Since then, the Chinese nation's culture can be passed on in the computer age.
Chinese handlers see this series of Chinese character coding standards as good, so they are called "DBCS" (Double byte Charecter set DWORD character set). In the DBCS series of standards, the biggest feature is the two-byte long Chinese characters and one-byte long English characters coexist in the same set of coding scheme, so they write the program in order to support the Chinese processing, must pay attention to the string of each byte value, if this value is greater than 127, Then it is assumed that a character in a double-byte character set appears. At that time, all the computer monks who had received the blessing of the program had to read the following mantra every day times:
"A Chinese character counts two English characters!" A Chinese character counts two English characters ... "

V. Coding of the world's people

Because at that time each country like China to make a set of their own coding standards, the result of each other who do not know whose code.

An international organization called ISO (International Standard organization) decided to tackle the problem. Their approach is simple: to scrap all the regional coding schemes and to re-engage a code that includes all the cultures, letters and symbols on Earth! They intend to call it "Universal multiple-octet Coded Character Set", referred to as UCS, commonly known as "Unicode". For the Unicode versions, refer to Http://www.cnblogs.com/wpcockroach/p/3907324.html

Reference: http://blog.chinaunix.net/uid-20446794-id-1677389.html

Coding pain (above) the ins and outs of coding

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More