# Computer knowledge that programmers should understand (3)-Information Representation and coding

Source: Internet
Author: User

Introduction

We have briefly introduced the binary number system. We will not go into detail here. The computer can only recognize binary systems. We know that a computer is composed of multiple electronic components, and the electronic components usually have two states, for example, power-on or breakpoint.) the binary system has only two digital 0 or 1 states) similarly, the CPU internal assembly instruction set of a computer uses 0 or 1 to indicate the status of the electronic components of different components, therefore, the computer can only recognize binary systems.

A program is composed of a series of binary numbers. The sequence arrangement and combination are related to a specific CPU machine. This program instruction sequence can be mapped to the instruction set in the CPU, can be resolved to specific commands to control the computer to complete the work. This is the execution principle of the program.

If we talk so much nonsense, we just want to talk about one problem. A computer can only identify binary values consisting of 0 and 1. Why can a computer present all kinds of information to us? The answer is encoding, which sometimes becomes compilation. Any program, file, symbol, value, and other information in the computer are encoded as follows, then, the CPU controls the different actions of the electronic components to display different effects.

Numeric data representation

The value here mainly refers to all digits in a number system, plus or minus signs (±), and decimal points .) A numerical value, which is mainly used to represent the unit of a volume, relative to other non-numerical symbols.

The actual size of a data is the size in a mathematical range. Generally, the actual value can be divided into integers, 0, and negative numbers, or decimal and integer. However, in a computer, data can only be simulated by electronic signals or stored by combining electronic components. The numerical representation in the computer is called the number of machines or machine code. The number of machines is to use binary data to simulate machine components and signal the data in reality. Generally, the data can be divided into unsigned numbers and signed numbers, the number of symbols can be divided into positive numbers, negative numbers, decimals, and so on. It can also be divided into fixed points and floating point numbers. There are certain measurements for various partitioning methods. The following describes the machine number representation, that is, the numerical data encoding method in the computer. The following describes the representation in decimal computers)

1. unsigned number

Here, the unsigned number mainly refers to the unsigned integer. Here, the author divides the decimal point and the positive number as symbols, so the unsigned number is only an integer, and it is also a positive integer in mathematics, if you have different understandings, do not misunderstand them ).

In a computer, one byte equals eight bits. That is to say, one byte can represent 256 different types of information. Each bit has a random combination of 0 and 1, that is, 28 = 256 different combinations, so a byte can represent 256 different types of information ). Assume that 0 is represented by 00000000, 1 is represented by 00000001, 2 is represented by 00000010, and so on, the maximum number is 11111111, convert it to a decimal value: 0, 1, 2 ...... 255, which is the principle of Computer Representation and storage of unsigned integers. It is not difficult to see that the maximum unsigned integer in a byte in the computer is 0 ~ 2550 ~ 28-1 ). During programming, you will surely find that any numeric type has its value range, which means that.

In fact, the value range of the number of machines is limited by the machine's processing capability mainly refers to the CPU length. For example, in an 8-bit CPU, the value range of the number of unsigned machines can be 0 ~ 28-1, which uses a byte to represent a value. For example, 00000000 represents 0, represents 1, represents 2, and so on. In a 16-bit CPU, the value range of the number of unsigned machines is 0 ~ 216-1, which uses two bytes to represent a value. For example, 00000000 00000000 represents 0, 00000001 represents 00000010 represents 1, represents 2, and so on. In a 32-bit CPU, the value range of the number of unsigned machines is 0 ~ 232-1, which uses four bytes to represent a value. In a 64-bit CPU, the unsigned machine count range can be 0 ~ 264-1, which uses eight bytes to represent a value. It can be seen that the longer the CPU word length, the larger the data range it represents, of course, the more accurate the computing, but its computing speed will decrease accordingly.

The unsigned data is represented by the number of digits related to the machine. The value range is 0 ~ 2n-1n indicates the number of machine digits ). The expression of the unsigned data in a computer is relatively simple. Given an unsigned binary number, it can be easily converted to its corresponding decimal value.

2. Number of symbols

Here, the signed number mainly refers to the positive and negative numbers and decimals. Here, the author divides the decimal point and the positive and negative numbers as symbols, so the signed ones are the positive and negative numbers and decimal places, which have different understandings, do not misunderstand ).

Because the computer can only recognize 0 and 1 Symbols, it is like the plus and minus signs "+", "-") and decimal points "in mathematics. ") these symbols cannot be identified and can only be replaced by 0 or 1. Because both 0 and 1 represent numerical values by default, they can only be replaced by digital positions) to describe the plus or minus signs or decimal points. This process is called the "digitization of symbols" process.

Values can be divided into two categories: Fixed Points and floating point based on the decimal point. A fixed number of points is a fixed value at the decimal point, and a floating point is a value that cannot be fixed at the decimal point.

2. 1. Set the number of points

A fixed point is the data with a fixed decimal point. It can represent all integers and decimals smaller than 1 and greater than 0 or greater than-1 and less than 0. Because the decimal point of an integer is fixed at the rightmost of its value, and the decimal point greater than-1 is smaller than 0, less than 1, or greater than 0 is fixed at the leftmost of its value.

There are positive and negative numbers. The digitization process of positive and negative numbers in the computer specifies that the highest bit of the binary digital sequence represents the sign bit, which is used to determine the positive and negative form of the value; "0" indicates the positive sign "+"), and "1" indicates the negative sign "-". Because the highest bit of the binary digital sequence is called the symbol bit and does not have actual numerical significance, the formal value of the binary number is different from its true value. For example, in an 8-bit machine, the binary value 00000001 represents the + 1 in decimal format, and the value 10000001 represents the-1 in decimal format. The following general method can be used to describe signed integers:

650) this. length = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/00012244c-0.png "title =" signed integer "width =" 180 "height =" 26 "border =" 0 "hspace =" 0 "vspace =" 0 "style = "width: 180px; height: 26px; "alt =" 220010592.png"/>

"S" indicates that the sign bit 0 indicates "+", 1 indicates "-"), occupies one place, and "M" indicates a number. Indicates the positive and negative values of a binary system.

For a fixed point decimal point, it also stores positive and negative values. Its highest bit indicates the symbol bit, while other digits indicate the numerical bit, and its decimal point is fixed between its symbol bit and the numerical bit. Here we will not do much about it. Its representation is very similar to an integer. For example, 00000001 can represent 0.1, while 10000001 can represent-0.1.

From the introduction of unsigned integers, we can know that a byte in the computer can represent 256 different numeric values, and the range of unsigned numeric values can be regarded as 0 ~ 28-1: 0 ~ 255). For signed values, the maximum bit of each representation is treated as the sign bit. If there is no actual value, the minimum value is 11111111-127 ), the maximum value is 01111111 + 127), so the range of signed integer values can be-127 ~ + 127-27 + 1 ~ + 27-1); from another perspective, the signed decimal range in one byte is-0.127 ~ + 0.127. In fact, the range of signed values indicated by different CPU characters varies depending on the machine itself.

2. floating point representation

The preceding section mainly describes the integer representation, and the decimal point must be taken into account not only the positive and negative symbols, but also the decimal point. Because it is too complex, most documents have been written. Here is a brief introduction.

Decimal points can be divided into two categories: Fixed Point decimal points or floating point decimal points. As mentioned above, given that the decimal point position remains fixed, the representation is very similar to the integer representation. This section describes the representation of floating point decimals.

A floating point decimal point is an unfixed decimal point. Because the decimal point is not fixed, it cannot be expressed as a fixed point decimal point. First, it also has positive and negative points, therefore, one digit must be used as the symbol bit to indicate the positive and negative values of the numeric value. Second, to temporarily represent the decimal place, several digits must be used to describe the decimal place. Second, it must represent the value size, the general format is as follows:

650) this. length = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/00012211A-1.png "title =" signed floating point number representation "width =" 226 "height =" 26 "border =" 0 "hspace =" 0 "vspace =" 0 "style = "width: 226px; height: 26px; "alt =" 220406995.png"/>

"S" indicates that the sign bit 0 indicates "+", and 1 indicates "-") occupies one place. "P" indicates the level code, which is used to describe the decimal point; "M" indicates the number, which occupies multiple places. Indicates a binary positive or negative floating point value.

The order code of a floating point is usually expressed in the form of a shift code. In a computer, the floating point level code is an n-digit integer, which indicates the index of 2 and can be signed. For an Order n of N, the transfer formula is as follows:

N shift = 2n-1 + N (-2n-1 ≤ N <2n-1)

N shift = 1 + N (-1 ≤ N <1)

N indicates the order code value, and n indicates the order code number. The specific content can be further explored.

In a computer, an approximate value is used to represent all real numbers. Similar to the scientific notation in decimal notation, floating point numbers in the computer can be expressed as N = 2n × S, where 2n is also called the exponent of number N ), it is used to indicate the specific location of the decimal point. The ending part of S is N, which is used to represent the symbol and valid value of the number. To more accurately represent floating-point numbers, the same floating-point expression specifications are as follows:

650) this. length = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/0001223253-2.png "title =" signed floating point number representation "width =" 331 "height =" 30 "border =" 0 "hspace =" 0 "vspace =" 0 "style = "width: 331px; height: 30px; "alt =" 221808840.png"/>

The order character and the order code are combined to indicate the decimal point position, and the order code must be an integer. The tail character and tail code are used to indicate the decimal precision, the fixed ending code must be a decimal point, and the absolute value of the ending code must be greater than 0.1 and less than 1. The level character and the ending character take one place, 0 indicates positive, and 1 indicates negative, the order code and tail code vary depending on the machine and the number of digits with different floating point precision. For example, the two Representation Methods 0.1011 × 2100 and 0.01011 × 2101 are the same in our opinion, but the machine has a certain number of digits. If the tail part only has four digits, therefore, to reduce the error, the absolute value of the tail code must be greater than 0.1 and smaller than 1. For example, in an 8-bit machine, the number of machines corresponding to the binary floating point N = 210 × 0. 1010 is 01001010:

650) this. length = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/0001226156-3.png "title =" floating point number example "width =" 451 "height =" 170 "border =" 0 "hspace =" 0 "vspace =" 0 "style =" width: 451px; height: 170px; "alt =" 222101511.png"/>

If the number of machines corresponding to the binary floating point N = 210 X-0.1010 is 01011010, if the number of machines corresponding to the binary floating point N = 2-10 x-0.1010 is 11011010.

The order code in a binary floating point number is used to determine the range of numbers. If it is not specified as the number of digits it occupies, it is difficult to determine the range of decimal numbers it can represent. When the tail number of a binary floating point is 0, it indicates that the number of machines is zero. When the order code is the hour, the floating point number is closest to zero. myopia can be zero in the range that can be expressed. The ending number of a floating point determines the baseband size of the floating point number, which is expressed by the decimal point. Similar to the integer expression, the floating point operation level calculates the digits.

Numerical encoding and Computation

Computer values can be divided into two types: signed and unsigned. unsigned values mainly refer to 0 and positive integers. Signed values mainly refer to positive and negative numbers and 0. The unsigned numeric encoding is relatively simple. It can be directly expressed using the binary digital sequence of the specified number of digits related to the machine. The operation is relatively simple, and each bit is involved in the operation; the signed integer and fixed-point decimal number must provide a digit during the encoding process as the symbol bit to indicate the positive and negative numbers of its values, because it reduces a valid value bit, therefore, the computation process is relatively complicated. In the encoding process, a signed floating point decimal point not only needs to provide a number to indicate the positive and negative values of the value, it also requires multiple digits to represent the decimal point, so the operation process is more complicated.

In fact, in most cases, we use signed values. So how can a computer store the computation? Next we will discuss how a computer performs a signed numeric operation.

1. source code, reverse code, and complement code

There are three forms of symbolic numeric encoding in the computer: source code, reverse code, and complement code. The following is a brief introduction.

1. Original code

The original code is used to represent numerical symbols. That is, in the binary digital sequence, the highest bit represents the symbol bit, and 0 represents the positive number and 1 represents the symbol. All the digits except the symbol bit represent the size of the value. This represents the value by the original code notation. For example, in an 8-bit machine, the original code of 1 is 00000001, and the original code of-1 is 10000001. The highest bit on the left of the binary value is different. It is not hard to see that 0 stores two original code Representation Methods: 00000000 or 10000000. From the mathematical point of view, you can use the following formula to obtain the original code:

• The original integer code.

N original = N (0 ≤ N <2n-1)

N original = 2n-1 + | N | (-2n-1 <N ≤ 0)

• Original code of the decimal point

N original = N (0 ≤ N <1)

N original = 20 + | N | (-1 <N ≤ 0)

N indicates the number of binary machines with n digits, and n indicates the machine font length.

The original code is the simplest encoding method for signed data in a computer. It is mainly used for input and output data. Because of the storage error between it and the true value, it cannot be directly involved in operations. For a machine whose word length is n, the number of data records that can be expressed is 2n. The unsigned data range that can be expressed is 0 ~ 2n-1; the signed data range can be-2n-1 + 1 ~ + 2n-1-1.

Let's see if the original code can be involved in the operation? For example, the original code is used to calculate the decimal formula-1 + 1. For an 8-bit machine, the original code of-1 is 00000001, 100000010, =, it is obviously incorrect to convert the result to-2 in decimal format. Let's analyze why? The highest bit of the original code indicates the symbol of the value, not the size of the value. However, it is incorrect to calculate the value as the value in the formula above. Therefore, we come to the conclusion that the original code cannot be involved in the operation. If it is necessary to force the operation, its symbol bit cannot be involved in the operation, But how should the symbol of the result be given, you need to compare the size of the original data to specify.

1. 2. Anti-code

The above example uses the original code to calculate-1 + 1. The result is-2, and the result is incorrect. We know that-1 + 1 = 0, and the result obtained after the original code calculation is-2, mainly because its symbol bit is involved in the operation, leading to errors. When processing data, a computer can only use each bit of the data as a digital operation. It is too complicated to extract the symbol bit for separate processing, after research, a new signed data representation-anti-code emerged.

An Inverse code is another method in a computer that represents a value. It is mainly used to represent a negative number. The reverse Code specifies that the reverse code form of a positive number is the same as that of the original code. The reverse code form of a negative number is based on the original code form, and the result of the bitwise inversion of the symbol is obtained. Here the "inverse" is the logic non-operation in the binary, that is, if a bit is 1, the inverse is 0, and if it is 0, the inverse is 1. For example, the 8-bit machine code 000000011) is 11111110,-1) is. It is not hard to see that 0 stores two reverse code Representation Methods: 00000000 or 11111111. From the mathematical point of view, you can use the following formula to obtain the inverse code:

• Anticode of Integer

N inverse = N (0 ≤ N <2n-1)

N inverse = (2n-1) + N (-2n-1 ≤ N ≤ 0)

• Decimal point

N inverse = N (0 ≤ N <1)

N inverse = 2-2-(n-1) + N (-1 <N ≤ 0)

N indicates the number of binary machines with n digits, and n indicates the machine font length.

The back code is relative to the original code. It indicates that the range of the Data exactly corresponds to the original code. The difference between the back code and the original code is that it can be directly involved in the operation. Next, let's take an example of how the anticode involved in the operation. The anticode of-1 is 111110,1 is 00000001, And the anticode operation of-1 + 1 is 11111110 + 00000001 = 11111111, the original code is 10000000, which is converted to-0 in decimal format. Let's take a look at how to calculate 1-2 with an anti-Code. In an 8-bit machine, the original code is in the form of 00000001 + 10000010, the reverse code is in the form of 00000001 + 11111101 = 11111110, and the original code is 10000001, converted to a signed decimal value, it is-1. Obviously, it can be regarded as an anti-code to solve the problem that the original code and symbol bit cannot be involved in the operation.

1. 3. Supplemental code

You may find that, although the anticode solves the essential problem of the original code, the symbol bit cannot be involved in the operation), but the result obtained through the anticode operation-1 + 1 is-0, however, this does not represent the real logic, because there is no positive or negative division in reality. Although the desired result is obtained, it is not reasonable. For a more reasonable operation, another data representation-complement. What is a supplemental code? For the moment, let's take a look at the concept of "model.

In mathematical algebra, a model represents an algebra system and can be viewed as a ring-shaped algebra system. The simplest model is a "clock". In this meter, we can see that the range of the volume is 1 ~ 1212), the modulo is 12. For example, the value at can be regarded as a reversal of nine values at 12, or a positive conversion of three values at 12, that is, in a clock, 3 = 12-9 = 12 + (12-9) = 15. You can add a 3 operation to the value of 9. It can be concluded that all subtraction operations in a model can be converted to corresponding addition operations. From another point of view, in the metering device with a model of 12, 3 and 15 store a common feature. The remainder divided by 12 is 3, which is called 3 and 15 as the same number, it also becomes a complementary number. The same remainder is A concept in the module metering device. The remainder of the number A and number B in the module is the same by dividing the number N. Complementarity is also A concept in the module metering device, in the module N metering device, A number A and N-A is A pair of complementary number, its characteristic is that the subtraction can be converted into addition, for example, X-A = X + (X-A ).

In the computer, the module represents the counting range of a metering system. We know that the metering range of an n-bit machine is 0 ~ 2n-1), the modulo is 2n. We can see that the concept of the computer model is very similar to that of the clock model. With the help of the idea that subtraction can be converted into addition, in order to further simplify the circuit design in the computer, the concept of Code complement was born.

The complement is a method in the computer that represents a signed value. The complement expression specifies that the positive value complement form is consistent with the original code form; the complement form of a negative number is to add 1 to the last digit of the anticode except the symbol bit, and discard the highest carry. For example, in an 8-bit machine, the source code of 1 is 00000001, and its complement code is 00000001. The source code of-1 is 10000001, its reverse code is 11111110, and its complement code is 11111111, we can see that 0 only stores one complement. Since there is only one way to complete the 0 complement, it is required to use-128 to replace the original negative zero-128 and there is no corresponding source code or reverse code, in essence, it is an overflow value of 100000000). Therefore, in an 8-type machine, its complement indicates that the data range is-128 ~ 0 ~ 127; that is, for n-bit machines, the representation range of the complement code is slightly different from that of the original code and the reverse code, which is-2n ~ 0 ~ 2n-1 is a total of 2n values ). From the mathematical point of view, you can use the following formula to obtain the complement code:

• Integer Complement

N population = N (0 ≤ N <2n-1)

N population = 2n + N (-2n-1 ≤ N ≤ 0)

• Complement the decimal point

N complement = N (0 ≤ N <1)

N population = 2 + N (-1 ≤ N <0)

N indicates the number of binary machines with n digits, and n indicates the machine font length.

The complement code is obtained based on the anticode. It is derived mainly from the concept of complementarity. The application of the complement Code further simplifies the binary computation of the computer, and the symbol bit can be involved in the computation, which is more reasonable than the anticode. Let's take a look at the complement operation of-1 + 1. Its original code form is 10000001 + 00000001, its reverse code form is 11111110 + 00000001, and its complement form is 11111111 + 00000001 = 100000000, convert to the back code 11111111, and convert to the original code 00000000, that is, 0.

From the above example, we can see that the complement operation is more reasonable than the reverse code operation. Therefore, all the digital data in the computer is represented or stored in the form of a complement, and is calculated in the form of a complement.

1. 4. Summary

The range of values represented in a computer is related to the length of the machine word, and the range of data represented in programming languages is related to the specified bytes. In the computer, there are three types of signed numeric encoding methods: source code, anti-code, and complement code. The complement code is mainly used for data calculation and storage. Because the sign bit can be directly involved in the computation in the complement expression method, all subtraction operations can also be converted to addition operations.

• The original Code specifies the nth place of the highest bits of signed binary data) as the symbol bit, and the other n-1 bits as the numerical bit. If the sign bit is 0, the true value is a positive number, if the sign bit is 1, the true value is negative.

• The anticode specifies that when the true value is a positive number, its anticode is the same as the original code; when the true value is a negative number, its anticode is the original code based on the bitwise get inverse.

• If the true value is a positive number, the complement code is the same as the original code. If the true value is a negative number, the complement Code adds 1 to the bitwise Based on the reverse code, and the given bitwise carry cannot exceed the machine font length ).

• For a positive number, its original code, reverse code, and complement code are exactly the same as their numerical values. For a negative number, its symbol bit is 1, and its numerical bit is the same as its numerical bit, the bitwise of the anticode is the inverse value of the bitwise. The bitwise of the complement code is added to the bitwise of the anticode by 1.

• The reverse code of the data is equal to the original code of the data, and the complement code of the data is equal to the original code of the data.

• In a computer, signed data is represented in the form of a complement code. After the computation is performed in the form of a complement code, the result is also in the form of a complement code. To obtain a true value, you need to convert it. If the result's symbol bit is 0, 0 is converted to the positive "+"), and the value bit remains unchanged. If the result's symbol bit is 1, 0 is converted to the negative "-"), value bitwise OR minus 1 for Inverse or inverse addition 1.

2. Count Calculation

The calculation of the number of points is relatively simple. You only need to perform the operation in the complement form. For example, to calculate 3-5, the following uses an 8-bit machine as an example. First, let's take a look at the corresponding source code, reverse code, and complement list of 3 and 5:

 True Value Original code Reverse code Complement 3 00000011 00000011 00000011 -5 10000101 11111010 11111011

The essence of calculation 3-5 is to calculate the complement sum of 3 and-5, 3-5 = 3 + (-5) = 00000011 + 11111011 = 11111110, and convert it to an anti-code of 11111101, if it is converted to the original code, the value is 10000010, which is-2 in decimal format.

The calculation of the number of vertices in the computer is to perform and calculate the corresponding complement code, and then convert the result into the original code to get the final calculation result.

3. Floating Point Calculation

Floating Point calculation is relatively complex. Here we will briefly introduce the floating point calculation ideas.

Floating Point Numbers are mainly composed of order and tail code. The order code represents the value range of the floating point number, while the tail code represents the numerical precision of the floating point number. Therefore, if the order code is different, it cannot be involved in the operation. When the order code of the floating point involved in the operation is the same, you can directly perform the corresponding complement operation on its tail code, and finally convert the result to the corresponding original code, which is the final result; when the number of floating-point numbers involved in the operation is different, the first step is to perform the inverse operation, that is, to convert it to the same order according to a certain precision, the process is divided by or multiplied by 21 ), the final result is the final result after the final result is converted to the corresponding source code.

4. Overflow

"Overflow" is a commonly used term in computers. Simply put, a value cannot be expressed by computer memory. A value's binary number exceeds the maximum number of digits that the machine can hold. To be accurate, when two signed data operations are performed, the result is beyond the range of signed data that can be expressed by the current machine. For an n-bit machine, the signed data range can be-2n-1 + 1 ~ + 2n-1-1, which includes positive and negative zeros. Here we will not talk about the problem of positive and negative zeros), so we will see "overflow. So how can we solve the overflow problem? This is a machine thing. We need effective prevention.

When "carry" occurs during the addition operation, but the carry has exceeded the length of the numeric bit, the "upper overflow" occurs; when the value of a value exceeds the length of a value, the value overflows.

Non-numeric data encoding

Information processed by a computer includes values, files, symbols, speech, graphics, and images. All information in a computer must be transmitted, stored, and processed in the form of digital binary encoding. Therefore, no matter what information is, it should be converted into binary encoding. The encoding process can use a small number of binary digits and a certain combination of rules to express a large amount of complex and diverse information. The following describes three encoding methods: decimal value encoding, English character encoding, and Chinese character encoding. They are just a brief introduction. The specific principles must be further explored.

1. BCD code

The computer uses binary to represent all data, and we are used to using decimal. In the number system, apart from the number of machines used to represent the binary number, it is sometimes necessary to use a number of digits to represent a decimal number, so that the number system can represent and use a decimal number, to meet the requirements of processing the decimal number, a BCD code was born, which uses binary code to represent the decimal number.

BCDBinary Coded Decimal) code is a four-bit binary code that represents a Decimal number. The four-digit binary code is the true value of its corresponding decimal number. The bitwise weights of the Four Binary operators are 23, 22, 21, and 20 respectively from high to low. Therefore, they are also called 8421 codes. The encoding rules for the 8421 code and the decimal number are as follows:

650) this. length = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/0001224543-4.png "title =" 8421BCD code table "width =" 450 "height =" 176 "border =" 0 "hspace =" 0 "vspace =" 0 "style =" width: pixel PX; height: 176px; "alt =" 225218492.png"/>

For example, the BCD code of decimal 3 is 0011, And the BCD code of decimal 931 is 1001 0011. This is an encoding method, not its binary original code representation. If the computer uses the BCD code for decimal numeric encoding, the encoding stored in the computer will be stored in the BCD code, A computer needs to have a built-in BCD code encoding and decoding algorithm to implement its processing.

2. ASCII Encoding

Because the computer uses high and low levels to simulate the numbers 1 and 0 respectively, the computer can only store and transmit binary numbers. In order to be able to use a unified way to express the concepts of numbers, numbers, and numbers), letter characters, and common symbols, therefore, another international standard encoding method-ASCII encoding is generated.

ASCII American standard information interchange Code) is a computer coding system based on Latin letters. It can be used to encode modern English and other Western European languages. It is a single-byte character encoding scheme, it is mainly used for text data encoding and transfer information between computers and computers, computers and peripherals.

ASCII requires an 8-bit single-byte combination of binary numbers to represent 128 or 256 possible characters. Therefore, it is called single-byte encoding. The maximum ASCII value of the standard ASCII code is 0. A 7-digit binary number is used to indicate all uppercase/lowercase letters, numbers 0-9, punctuation marks, and special control characters used in American English. When the fixed maximum bit is 0, the remaining 7 characters may indicate 27 = 128 valid characters, corresponding to the decimal 0 ~ 127 a total of 128 values. The following table compares the standard ASCII encoding with characters:

650) this. length = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/000122E32-5.png "title =" Standard ASCII encoding table "width =" 450 "height =" 336 "border =" 0 "hspace =" 0 "vspace =" 0 "style = "width: pixel PX; height: 336px; "alt =" 225420794.png"/>

The first line indicates the first four digits of the encoding, and the first column indicates the last four digits of the encoding. The combination of the encoding corresponds to the yellow part of the symbol, convert the result of the combination to a decimal value ranging from 0 ~ 127. 0 ~ The 33 characters represented by 31 and 127 are computer control characters or communication special characters, 32 ~ 126 represents 95 characters, which are known as printable characters, with SP representing space characters), that is, a specific graphic display. 48 to 48 characters that can be Explicitly Displayed ~ 57 indicates 0 ~ 9-10 digits, 65 ~ 90 indicates 26 uppercase English letters, 97 ~ 122 represents 26 lower-case English letters, and others represent some common punctuation marks and operators. For example, the character string "Hello" uses ASCII encoding and is encoded in the memory unit as 01001000 01100101 01101100 01101100 01101111.

Since the standard ASCII code can only represent 128 characters, it is far from meeting the symbol requirements in reality, so the highest bit of the standard ASCII code will also be used as the encoding bit, in this way, more than 128 symbols are added. These symbols are called extended ASCII codes, which can represent most of the western European symbols. Because ASCII code is widely used, we call single-byte ASCII characters.

3. Chinese character encoding

Chinese characters are characterized by hieroglyphics, single-word single-tone, and the data size is very large and the shape is complex. It is much more complicated than ASCII code to express a Chinese character in the form of binary sequence in the computer. Therefore, Chinese characters are not encoded in the input, output, storage, and processing processes.

The representation of Chinese characters in a computer also needs to be encoded using a fixed binary. According to different purposes of the application, Chinese character encoding can be divided into four types: external code, Exchange Code, in-machine code, and word form code.

3. 1. External code input)

An external code is also called an input code. It is used to input Chinese characters to a group of keyboard symbols in a computer, that is, a series of character symbols input through the keyboard. Common input codes include pinyin, five-stroke, natural, table, cognitive, location, and report codes.

We know that there are a variety of Chinese character input methods), and the external code is managed by the Chinese Character Input Method. For example, enter zhong1 in pinyin to indicate the Chinese character "medium, then "zhong1" is the external code of the Chinese character "medium. External code is only a way to recognize Chinese characters. You can find the appropriate Chinese characters in the input method. No matter how the external code of a certain Chinese Character changes, its machine code is a binary number) is unchanged. After we enter the external code, we will use the corresponding algorithm program to convert Chinese characters to the corresponding machine code.

3. 2. Exchange codes of the Country Code)

The computer uses binary code to represent a Chinese character, but binary code is not convenient to use. Therefore, a convenient encoding is required to map the binary code, so an Exchange Code occurs. The Chinese character information exchange code is referred to as the "National Character Code", that is, the GB2312-80 encoding standard, the standard income of 6763 commonly used Chinese characters, including 3755 first-level Chinese characters, 3008 second-level Chinese characters ), there are also 687 English, Russian, and Japanese letters and other symbols, with a total of more than 7000 symbols.

The encoding rules of the Country Code are as follows: each Chinese Character consists of a 2-byte (16-bit binary) encoding, the highest position of each byte is "0", and the remaining 7 digits are used to form a variety of different code values. In order not to be the same as the ASCII code control characters, each byte removes 34 ASCII code control characters, and each byte has 94 encoding characters left. The two bytes form a two-dimensional structure. The first byte is called "zone", and the last byte is called "bit". Therefore, the Country Code is also called a location code, A total of 94 or 94 Chinese characters and other symbols are supported. More than 7000 symbols are occupied, and the remaining positions are used as backups.

3. Internal code)

The machine code of Chinese characters is the code that stores and processes Chinese characters in the computer ". The Internal Code describes the actual representation of Chinese characters in the computer. Inner code is the code used when the computer internally stores and processes Chinese characters. No matter what type of Chinese character input code is used to input Chinese characters into the computer, various types of input codes must be converted into consistent internal Chinese characters for storage and processing convenience.

Chinese character information machine internal code encoding rules: Based on the country code, the highest bit of the two bytes is changed from "0" to "1. If the highest bit is "1", it is considered as Chinese character encoding, "0", and is considered as ASCII characters. Therefore, the exchange codes of the same Chinese character are not the same as those of the machine, while those of the same ASCII character are the same as those of the machine.

3. 4. Font code

A Chinese Character Font is an output code of Chinese characters. It is also known as the graphic code for displaying and printing Chinese characters. The computer uses graphics when outputting Chinese characters. The number of strokes of boring Chinese characters can be written in blocks of the same size. Generally, Chinese characters are displayed in a 16 × 16 dot matrix mode.

Summary

The computer simulates binary numbers 1 and 0 through the level of electronic components. Any information stored in the computer must appear in binary digital mode before processing can be completed, the original binary digital sequence is called the information machine code. binary digital can perform computation perfectly only by completing the code. In order to be able to describe more symbol information, there is a variety of Symbol Information encoding specifications, encoding is the process of converting any symbol into a computer can recognize the binary digital sequence.

After that, the content of this article may be "shift-by-shift" for most people. I am a half-channel IT engineer and have little understanding of basic computer knowledge. You know ), these contents are summarized by the author through a variety of documents, and there are definitely some mistakes, so I hope you don't laugh, a lot of ink jet ......

Related Keywords:
Related Article

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

## A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

• #### Sales Support

1 on 1 presale consultation

• #### After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

• Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.