ASCII
In a computer, all data is stored and computed using binary numbers (because the computer represents 1 and 0, respectively, with high and low levels), for example, 52 letters (including uppercase) such as a, B, C, D, and 0, 1, and other commonly used symbols (such as *, #, @ And so on in the computer is also used to store the binary number to represent, and specifically which binary numbers to indicate which symbol, of course, everyone can contract their own set (this is called code), and if you want to communicate with each other without causing confusion, then we must use the same coding rules, So the United States, the standardization of the introduction of the ASCII code, unified rules of the above-mentioned symbols with which binary numbers to represent.
The United States Standard Information Interchange code is a standard single-byte character encoding scheme, developed by the American National Standards Institute (American, Institute, ANSI), for text-based data. It began in the late 50 and was finalized in 1967. It was originally the United States national standard for different computers to communicate with each other as a common character encoding standard, it has been the International Organization for Standardization (International Organization for Standardization, ISO) as the standard, known as ISO 646 standard. Applies to all Latin letter characters.
The ASCII code uses the specified 7-bit or 8-bit binary number combination to represent 128 or 256 possible characters. The standard ASCII code is also called the base ASCII code, using a 7-bit binary number to represent all uppercase and lowercase letters, numbers 0 through 9, punctuation, and special control characters used in American English. which
0~31 and 127 (33 total) are control characters or communication-specific characters (the rest are the characters that can be displayed), such as: LF (newline), CR (carriage return), FF (page feed), DEL (delete), BS (BACKSPACE), BEL (Bell), and so on; Communication special characters: SOH (head), EOT (end) ACK (acknowledgement), etc. ASCII values of 8, 9, 10, and 13 are converted to backspace, tab, newline, and carriage return characters, respectively. They do not have a specific graphical display, but vary depending on the application and have different effects on the text display.
32~126 (a total of 95) is a character (32 is a space), where 48~57 is 0 to 90 Arabic numerals.
65~90 is 26 uppercase English letters, 97~122 is 26 lowercase English letters, the rest is some punctuation marks, arithmetic symbols and so on.
Also note that in standard ASCII, its highest bit (B7) is used as the parity bit. The so-called parity check, refers to the code in the process used to verify whether there is a method of error, the general sub-parity check and parity two. Odd check rules: The correct code in one byte of the number of 1 must be odd, if not odd, the highest bit B7 Tim 1; Parity rule: The correct code in a byte of 1 must be an even number, if not even, the highest bit B7 add 1.
The latter 128 are called extended ASCII codes. Many x86-based systems support the use of extended (or "high") ASCII. The extended ASCII code allows the 8th bit of each character to be used to determine additional 128 special symbol characters, foreign letters, and graphic symbols.
Ansi
ASCII code is the United States standard, with the use of computers more and more countries, ASCII obviously can not meet the needs of other countries language. So different countries and regions have set different standards, resulting in GB2312, BIG5, JIS and other coding standards. These use 2 bytes to represent a character of a variety of Chinese character extension encoding, called ANSI encoding. ANSI is the system's default encoding, in the Simplified Chinese system, the ANSI code represents GB2312 encoding, in the Japanese operating system, the ANSI code represents the JIS code. Different ANSI encodings are incompatible, and when information is exchanged internationally, text that is in two languages cannot be stored in the same piece of ANSI-encoded text. The ANSI encoding represents an English character with a byte that represents Chinese with two bytes.
Gb2312/gbk
GB2312 is an extension of ASCII code, he stipulates: A character less than 127 is the same as the original, but two more than 127 words connect prompt together, it represents a Chinese character, the front of a byte (what he calls a high byte) from 0xa1 to 0xf7, The next byte (low byte) is from 0xa1 to 0xFE, so we can assemble about 7,000 + Simplified Chinese characters. In these codes, we also put mathematical symbols, Roman Greek alphabet, Japanese kana have been compiled into, even in ASCII, the number, punctuation, letters are all re-compiled two bytes long code, this is often said "full-width" character, and the original under 127th is called "Half-width" character. so it is called the "Gb2312″" of the Chinese character scheme.
But Chinese characters too many, and later still not enough to use, so simply no longer require that the low byte must be 127th after the inner code, as long as the first byte is greater than 127 fixed indicates that this is the beginning of a Chinese character, whether followed by the expansion of the character set in the content. The result of the expanded coding scheme is called the GBK Standard, and GBK includes all the contents of the GB2312, while adding nearly 20,000 new Chinese characters (including traditional characters) and symbols. Later, the minority also to use the computer, so we expanded, and added thousands of new minority characters,GBK expanded into a GB18030.
ANSI, ASCII, GB2312, GBK