C language programming ASCII American Standard Information Interchange Standard Code Introduction

Source: Internet
Author: User
Tags file separator
American Standard Information Interchange Standard Code
(American Standard Code for information interchange, ASCII)

In a computer, all data is stored and computed in binary representation (because the computer is silly, only the binary values of the 0 and 1 digits are suitable for it). Similarly, 52 letters (including uppercase letters) such as A, B, C, and D, as well as numbers such as 0, 1, and 2, as well as some common symbols (such as *, #, @, etc) when storing data in a computer, the binary number must also be used to represent the specific number used to represent which symbol. Of course, everyone can agree on their own set (this is called encoding ), if you want to communicate with each other without confusion, you must use the same encoding rules. Therefore, the American Standardization Organization has introduced the so-called ASCII code, specifies which binary number is used to represent the above commonly used symbols.
The American Standard Information Exchange Code is a standard single-byte character encoding scheme developed by the American National Standard Institute (ANSI) for text-based data. Started in the late 1950s S and finalized in 1967. It was originally a US National Standard for different computers to use for mutual communication as a standard for Spanish character encoding. It has been used by the International Organization for Standardization (ISO) it is an international standard, known as the ISO 646 standard. Applies to all Latin characters.
ASCII Code uses a combination of the specified 7-bit or 8-bit binary numbers to represent 128 or 256 possible characters. The standard ASCII code is also known as the basic ASCII code. It uses 7-bit binary numbers to indicate all uppercase and lowercase letters, numbers 0 to 9, and punctuation marks, and special control characters used in American English. Where:
0 ~ 32 and 127 (34 in total) are control characters or communication special characters (others can be displayed), such as control characters: LF (line feed), Cr (Press ENTER), FF (page feed) del (delete), BS (return), BEL (zhenling), and other special characters for communications: Soh (Text header), EOT (Text tail), Ack (confirmation), etc; the ASCII values are 8, 9, 10, and 13, respectively, and are converted to backspace, tabulation, line feed, and carriage return characters. They do not display specific images, but vary with applications, with different effects on text display.
33 ~ 126 (94 in total) is a character, 48 ~ 57 is 0 to 9 10 Arabic numerals;
65 ~ 90 is 26 uppercase English letters, 97 ~ There are 26 lower-case English letters, and the remaining are some punctuation marks and operator numbers.
In standard ASCII, the highest bit (B7) is used as the parity bit. The so-called parity check refers to a method used to check whether an error occurs during code transfer. It is generally divided into two types: Odd checksum and even verification. Odd check rules: correct code must contain an odd number of 1 bytes. If the number is not an odd number, 1 is added to the highest bit B7. Even check rules: correct code: the number of 1 in a byte must be an even number. If the number is not an even number, 1 is added to the highest bit of B7.
The last 128 are extended ASCII codes. Currently, many x86-based systems support extended (or "high") ASCII codes. The expanded ASCII code allows 8th characters of each character to be used to determine the additional 128 special characters, foreign letters, and graphical symbols. The following is a standard ASCII table:

Bin dec hex Abbreviation/character explanation
0000 0000 0 00 NUL (null) NULL Character
0000 0001 1 01 start of Soh (start of handing) Title
0000 0010 2 02 STX (start of text) body start
0000 0011 3 03 etx (end of text) End of Body
0000 0100 4 04 EOT (end of transmission) Transfer ended
0000 0101 5 05 Enq (Enquiry) Request
0000 0110 6 06 ack (acknowledge) received notification
0000 0111 7 07 bell
0000 1000 8 08 BS (backspace) Return
0000 1001 09 09 HT (horizontal tab) Horizontal Tab
0000 1010 10 0a LF (NL line feed, new line) line feed key
0000 1011 11 0b vt (vertical tab) Vertical Tab
0000 1100 12 0C ff (NP form feed, new page) form key
0000 1101 13 0d Cr (carriage return) Return key
0000 1110 14 0e so (shift out) No need to switch
0000 1111 15 0f Si (shift in) Enable Switch
0001 0000 16 10 DLE (Data Link escape) Data Link escape
0001 0001 17 11 DC1 (Device Control 1) Device Control 1
0001 0010 18 12 DC2 (Device Control 2) Device Control 2
0001 0011 19 13 DC3 (Device Control 3) Device Control 3
0001 0100 20 14 dc4 (Device Control 4) Device Control 4
0001 0101 21 15 Nak (negative acknowledge) Reject
0001 0110 22 16 Syn (synchronous idle) Synchronization idle
0001 0111 23 17 etb (end of Trans. Block) Transfer Block ended
0001 1000 24 18 can (cancel) canceled
0001 1001 25 19 EM (end of medium) Media interruption
0001 1010 26 1A sub (substitute) Bench
0001 1011 27 1B ESC (escape) Overflow
0001 1100 28 1C FS (File separator) file delimiter
0001 1101 29 1D GS (group separator) group Operator
0001 1110 30 1E RS (record separator) record delimiter
0001 1111 31 1f us (Unit separator) unit Separator

0010 0000 32 20 Spaces
0010 0001 33 21!
0010 0010 34 22"
0010 0011 35 23 #
0010 0100 36 24 $
0010 0101 37 25%
0010 0110 38 26 &
0010 0111 39 27'
0010 1000 40 28 (
0010 1001 41 29)
0010 1010 42 2a *
0010 1011 43 2B +
0010 1100 44 2C,
0010 1101 45 2D-
0010 1110 46 2E.
0010 1111 47 2f/
0011 0000 48 30 0
0011 0001 49 31 1
0011 0010 50 32 2
0011 0011 51 33 3
0011 0100 52 34 4
0011 0101 53 35 5
0011 0110 54 36 6
0011 0111 55 37 7
0011 1000 56 38 8
0011 1001 57 39 9
0011 1010 58 3A:
0011 1011 59 3B;
0011 1100 60 3C <
0011 1101 61 3D =
0011 1110 62 3E>
0011 1111 63 3f?
0100 0000 64 40 @

0100 0001 65 41
0100 0010 66 42 B
0100 0011 67 43 C
0100 0100 68 44 d
0100 0101 69 45 E
0100 0110 70 46 F
0100 0111 71 47g
0100 1000 72 48 h
0100 1001 73 49 I
0100 1010 74 4A J
0100 1011 75 4B K
0100 1100 76 4C L
0100 1101 77 4D m
0100 1110 78 4E n
0100 1111 79 4f o
0101 0000 80 50 p
0101 0001 81 51 Q
0101 0010 82 52 r
0101 0011 83 53 s
0101 0100 84 54 t
0101 0101 85 55 U
0101 0110 86 56 v
0101 0111 87 57 W
0101 1000 88 58 x
0101 1001 89 59 y
0101 1010 90 5A Z
0101 1011 91 5B [
0101 1100 92 5C \
0101 1101 93 5D]
0101 1110 94 5E ^
0101 1111 95 5f _
0110 0000 96 60'

0110 0001 97 61
0110 0010 98 62 B
0110 0011 99 63 C
0110 0100 100 64 d
0110 0101 101 65 E
0110 0110 102 66 F
0110 0111 103 67g
0110 1000 104 68 h
0110 1001 105 69 I
0110 1010 106 6a J
0110 1011 107 6B K
0110 1100 108 6C L
0110 1101 109 6D m
0110 1110 110 6e n
0110 1111 111 6f o
0111 0000 112 70 p
0111 0001 113 71 Q
0111 0010 114 72 R
0111 0011 115 73 s
0111 0100 116 74 t
0111 0101 117 75 U
0111 0110 118 76 V
0111 0111 119 77 W
0111 1000 120 78 X
0111 1001 121 79 y
0111 1010 122 7A Z
0111 1011 123 7b {
0111 1100 124 7C |
0111 1101 125 7d}
0111 1110 126 7E ~

0111 1111 127 7f del (delete) Delete
There are also 128-255 ASCII characters
Brief character set history
Hieroglyphics before January 1, 6000
Alphabet before January 1, 3000
From 1838 to 1854, Samuel F. B. Morse invented the telegraph. Each character in the alphabet corresponds to a series of short and long pulses.
Louis Braille invented Braille From 1821 to 1824, with six-digit code. It coded characters, common letters, common words, and punctuation.
A special escape Code indicates that subsequent character codes should be interpreted as uppercase letters. A special shift code allows subsequent code to be interpreted as numbers.
In 1931, CCITT standardized telex code, including Baudot #2 code, all of which are five-bit code including characters and numbers.
In the early 1890 s, the computer's QR code was a six-bit binary-coded decimal interchange code (bcdic)
Extended to 8-bit ebcdic in 1960s, IBM mainframe Standard
1967 US Information Exchange Standard Code (ASCII: American Standard Code for information interchange)
There is a lot of controversy over whether the character length is 6, 7, or 8. From the reliability point of view, replacement characters should not be used,
Therefore, ASCII cannot be a 6-bit code, but the 8-bit version scheme is also excluded due to the cost (each storage space is still expensive at that time ).
In this way, the final escape Code contains 26 lower-case letters, 26 upper-case letters, 10 digits, 32 characters, 33 handles, and a space. A total of 128 escape codes are provided.
ASCII is now documented in the ANSI X3.4-1986 character set-the 7-bit American Standard Code for information exchange (7-bit ASCII: 7-bit American National
Standard Code for information interchange), released by the American National Standards Institute.
The ASCII escape code shown in Figure 2-1 is similar to the format in the ANSI file.



ASCII international problems
ASCII is an American standard, so it cannot meet the needs of other English-speaking countries. For example, where is the British pound symbol?
English alphabet accent
Use Greek, Hebrew, Arabic, and Russian of the Slavic alphabet.
Chinese hieroglyphics in the Chinese character system, Japan and North Korea.

In 1967, the International Organization for Standardization (ISO: International Standards Organization) recommended an ASCII variant,
Code 0x40, 0x5b, 0x5c, 0x5d, 0x7b, 0x7c, and 0x7d are "reserved for national use", while code 0x5e, 0x60, and 0x7e are marked
"Other graphical symbols can be used when special Chinese characters require 8, 9, or 10 space locations ". This is obviously not the best international solution,
This does not guarantee consistency. However, this shows how people try their best to code different languages.

Extended ASCII
1981 ibm pc rom256 Character Set, IBM extended Character Set
The 1985 windows character set was called the "ANSI character set" and complies with the ANSI draft and ISO Standards (ANSI/ISO 8859-1-1987, simple "Latin 1 ".
The original version of the ANSI character set:
April 1987 code page 437, Character Image Code, appears in MS-DOS 3.3

Double-Byte Character Set
The dubyte character set (DBCS: double-byte character set) is compatible with ASCII characters in Chinese, Japanese, and Korean.
DBCS starts from code 256, just like ASCII. Like any well-performing code page, the first 128 codes were ASCII.
However, some of the higher 128 codes always follow the second byte.
These two bytes together (called the first byte and the following byte) define a character, usually a complex pictogram.

Ref: http://baike.baidu.com/view/15482.htm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.