Chapter II Representation and processing of information

Last Update:2015-10-08 Source: Internet

Author: User

Tags integer division modulus

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Three important numerical representations (1) unsigned number, signed number, floating point

The original code, the inverse code, and the complement of a positive number are itself.
Negative number of the original code is its own, anti-code is to the original code in addition to the symbol bit of the counter, complement is anti-code plus 1.

(2) Why do you use the complement to express

Ability to unify representations of +0 and-0
In the original code, the binary representation of +0 is 0 000 0000, and the binary representation of-0 is 1 000 0000;
The binary representation of +0 is 0 000 0000, and the binary representation of-0 is 1 111 1111, with the inverse code representation;
In complement notation, the binary representation of +0 is 0 000 0000, while the binary representation of-0 is 1 111 1111+1=1 0000 0000 because the computer truncates and takes only a low 8 bit, so the complement representation of 0 is 0000 0000.
The complement representation range is larger than the range of the original code and the inverse code representation. With the complement can be expressed in the range of -128~127,0~127 respectively with 00000000~01111111 to represent, and -127~-1 is 10000001~11111111 to express, the extra 10000000 is used to denote-128.
The operation of a signed integer can be processed with the symbol bit as the value bit

If the symbol bit is considered separately, the CPU instruction should also deliberately judge the highest bit, making the most basic implementation of the computer complex.

(3) Integer Overflow vulnerability

Reference: integer overflow and program security

An integer is a fixed length, the maximum value it can store is fixed, and an integer overflow is caused when attempting to store a value greater than this fixed maximum.
An integer overflow will result in an "indeterminate behavior." Most compilers ignore integer overflows, causing an integer overflow to be not immediately perceptible, so there is no way to use an application to determine whether the previously computed results are actually correct, which can result in certain types of bugs, buffer overflows, and so on.

2. Information Store (1) decimal-binary-eight binary-16 conversion exercise (2) words

Each computer has a word length that indicates the nominal size of the integer and pointer data. Because the virtual address is encoded in one of these words, the most important system parameter of Word length is the maximum size of the virtual address space.
For a machine with a word length of W, the virtual address range is 0~2^w-1 and the program accesses up to 2^w bytes.

(3) GCC-M32/64

When there is no-m32 or-m64 parameter, code that is consistent with the operating system's number of bits is normally generated
Gcc-m32 can generate 32-bit code on 64-bit machines, such as the lab building's environment

(4) byte order

Byte order is the basis of network programming, refers to the memory of the memory of more than one byte type of data in the storage order, usually have small end, big endian two byte order.

The small-endian byte-order refers to low-byte data stored at low-memory addresses where high-byte data is stored at high memory addresses.
The big endian byte order is that the high-byte data is stored at the low address, and the low-byte data is stored at the high address.

(5) Boolean algebra

Logical operations
The result is 1 or 0
All logical operations can be expressed in relation to, or, non-expression (maximum, minimum) and/or non-available "and non" or "non", so that all the logical operations can be done with one and the other.
```
逻辑与（&&）遇0为0；逻辑或（||）遇1为1；逻辑非 遇0为1，遇1为0；
```
Bit arithmetic

The result is a bit vector

按位与（&）二进制每一位遇0为0；按位或（|）二进制每一位遇1为1；按位异或（^）0^0=0，0^1=1，1^0=1，1^1=0；按位取反（~）二进制每一位取反。

Mask operations

A mask is an important application of bit operations, where a mask is a specific bit pattern that represents a collection of bits selected from a single word. You can set one for a specific bit and you can clear zero.

Logical operations and mask operations
The result of the logical operation is 1 or 0, and the result of the bitwise operation is the bit vector
If the first parameter is evaluated to determine the result of the expression, the logical operator does not evaluate the second parameter.

3. Integer representation (1) Integer data type

char字符型数据，占用一个字节unsigned char无符号字符型数据，占用一个字节short短整形数据，占用两个字节unsigned short无符号短整型数据，占用两个字节int整形数据，占用两个字节unsigned int无符号整型数据，占用两个字节long长整型数据，占用四个字节unsigned long无符号长整型数据，占用四个字节

The data type long long is introduced in ISO C99. (Compile: gcc-std=c99).

(2) code for unsigned numbers

For a bit vector with a length of W, there is a unique value corresponding to it; in turn, each integer between 0~2^w-1 has a unique binary representation of a bit vector with a length of w corresponding to it.

(3) Complement code

The most common notation for signed integers: The twos complement form.

The three features of the binary complement form:

1）二进制补码的范围是不对称的：|TMin|=|TMax|+1，即不存在与最小值相对应的整数，这容易造成程序中细微的错误。2）位数相同的前提下，无符号数的最大值刚好是二进制补码最大值的2倍加1：UMax=2TMax+1。3）二进制补码中的-1与UMax有相同的位表示——全1位串。

The ANSI C standard does not specify the use of twos complement to represent signed integers, but almost all machines do. The complement uses the fixed-length character of the register to simplify the mathematical operation and unify the mathematical operations into addition, so that all the mathematical operations can be achieved with an adder.

(4) Two other standard representations of signed integers

二进制反码形式：与二进制补码的表示方法类似，区别在于最高有效位的权值不同。    原码：最高有效位是符号位，确定剩下的位取负权值还是正权值。

Both of these representations have a strange attribute, that is, there are two encodings for the number 0. For both methods, [00..0] is interpreted as +0, and 0 is expressed as [11..1] in the binary code, and is expressed as [10..0] in the original.
Although there have been machines based on binary inverse notation, almost all modern machines use twos complement.
Symbolic numeric encoding is used in the representation of floating-point numbers.

(5) Conversion between the number of symbols and unsigned numbers

ANSI c stipulates that when casting between unsigned integers and signed integers, the bit pattern of the object is not changed and the mode of the bit is interpreted. The general rule for converting between signed and unsigned numbers with the same word length is that the values may change and the bit patterns are unchanged.

T2U：补码到无符号数的转换U2T：无符号数到补码的转换

When a signed number is converted to an unsigned number, negative numbers are converted to large positive numbers (which can be understood as the original value plus 2 n-th), while positive numbers remain unchanged.
```
x<0T2Uw（x）=x+2^wx>0T2Uw（x）=x+2^w（*w表示数据类型的位数）
```
When an unsigned number is converted to a signed number, the original value is maintained for the small number, and the large number is converted to a negative (which can be understood as the original value minus 2 of the n-th square)
```
u<2^(w-1)U2Tw（u）=uu>=2^(w-1)U2Tw（u）=u-2^w（*w表示数据类型的位数）
```

(6) signed and unsigned numbers in C

The constants of life are usually considered to be symbolic, and to create an unsigned constant, you must add the suffix character ' u ' or ' u '.
The C language allows the conversion between unsigned and signed numbers, and the principle of conversion is that the underlying bits remain unchanged.
```
显式转换：强制类型转换。隐式转换：一种类型的表达式被赋值给另外一种类型的变量时。
```
When the C language processes an expression that contains both unsigned and signed numbers, it implicitly converts the signed number to an unsigned number, assuming both operands are nonnegative and then performs the operation. This feature is not much different for standard arithmetic operations, but for relational operators like "<" and ">", it can sometimes result in inconsistent results with intuition.
Example: When a signed number "1" and an unsigned number "0" are connected by a relational operator, the 1 implicitly converts to the unsigned 4294967295 (assuming a complement-free 32-bit machine), negative numbers become positive, and the result value of " -1<0u" is 0.

-Mandatory type conversion for C language

(7) Extend the bit representation of a number

零扩展：要将一个无符号数转换为一个更大的数据类型，只需简单的最高位前加0。符号扩展：将一个补码数字转换为一个更大的数据类型，在表示中添加最高有效位值的副本。

(8) Truncation of numbers

Truncating a number may change its value, which is also a form of value overflow. ，

无符号数x，将其截断成k位，mod 2^k。有符号数x，先将其看作无符号数截断，然后在转换成有符号数。

4. Integer arithmetic (1) unsigned addition

"Word Length expansion"
In a series of successive addition operations, the previous result of the operation and the current operand exceeds the current word length representation, and to save the result of the operation, the word length must be increased.
The continuous "word-length expansion" means that to complete the result of representing arithmetic operations, there is no limit to the word length. Some programming languages, such as Lisp, support infinite precision operations, allowing integer operations of arbitrary word lengths. More commonly, the programming language supports only fixed precision (word length) operations.
An unsigned operation can be considered a form of modulo operation. Unsigned addition is equivalent to calculating the sum of the modulus, and the modulus calculation can be done by simply discarding the part that exceeds the word length.
Arithmetic Operation overflow
- The full integer result cannot be placed into the word length limit of the data type.
- When you execute a C program, the overflow is not issued as an error warning signal.
- Judging whether unsigned operations overflow, such as s=x+y (s, x, y are unsigned), the only reliable criterion is s<x or s<y.
- The modulo addition forms a mathematical structure-abelian group. exchangeable, associative, unit 0, each element has an additive inverse.

(2) Binary complement addition

Most computers use the same machine instructions to perform unsigned or signed additions.
The binary complement addition discards the part of the result that exceeds the word length in the overflow, however, the result is mathematically equivalent to the modulo.
```
以z=x+y为例，其中x，y，z均为位长为n的有符号数
```
- Positive overflow: x+y >= 2^ (w-1)
```
z=x+y-2^wx，y为正值，而最终结果z为负值。
```
- Negative overflow: X+y < -2^ (w-1)
```
z=x+y-2^wx，y为负值，而最终结果z为正值。
```

(3) Complement of non-

Complement non-arithmetic
For x in the range [ -2^ (w-1), 2^ (w-1)), the complement does not operate in the following two cases:
```
x=-2^(w-1)时，为-2^(w-1)x>-2^(w-1)时，为-x
```
Bit-level complement non-
For each one, then the result +1
Set K to the rightmost 1 position, and reverse all the bits on the left side of K.

(4) multiplication

The unsigned multiplication in C language is defined as the value after the low truncation of the result of the integer product, which is equivalent to the modulo operation of the multiplication result.
Symbolic multiplication in C is achieved by truncating the product of the 2w bit to W-bit, which is the MoD 2^w.

For unsigned and complement multiplication, the bit-level representation of the multiplication is the same.

(5) Constant multiplication

On most machines, the multiplication instruction is much slower than the instructions for addition, subtraction, bitwise arithmetic, shift, and so on. Therefore, one of the important optimizations that the compiler uses is to try to replace multiplication by multiplying the constant factor with a combination of shift and addition.
When a constant is 2, the K-power is shifted directly to the left.
When a constant is not an integer power of 2, the constant c is represented as a sum of several integral powers of 2, combined with shift operations and addition operations.

For unsigned variable x

x<<k等价于x*2^k。（特别地，我们可以用1U<<k来计算2^k）

Number of twos complement X
```
x<<k等价于x*2^k。
```
Even when the result of the operation overflows, the above equivalence relationship is still valid.

(6) divided by the power of 2

In machine operations, the multiplication is slower than Fabienne. When the divisor is an integer power of 2, it is resolved by moving right. When you move right, you need to differentiate between unsigned and complementary.
```
注意：整数除法总是舍入到零
```
Unsigned number-Logical right Shift: The unsigned number is divided by the K power of 2, which is equivalent to moving the K-bit to its logical right.
Complement-arithmetic right shift: complement the arithmetic left shift, you need to consider the complement number of positive and negative, because the integer division is always rounded to zero, no negative numbers in the unsigned number do not have to worry, but there is positive negative in the complement, positive numbers rounded down to zero, negative numbers should be rounded up to zero. So this involves biasing before the shift.
```
即：x≥0时，除以2的k次幂等价于将x算术右移k位x＜0时，先将x加上(2^k)-1，再算术右移k位
```
Unlike multiplication, this right-shift method cannot be generalized to any constant c.

(7) Thinking of Integer arithmetic

The "integer operation" performed by the computer is actually a modulo operation.
A finite word length that represents a number limits the range of possible values that can be taken.
Whether the operands are expressed in unsigned or complementary form, there are exactly the same or very similar bit-level behavior.

5. Floating point (1) Overview

The floating-point representation encodes the rational number of a shape such as V=x*2^y
Applies to: Very Large numbers (| V|>>0), a number very close to 0 (| V|<<1), the approximate value of the real number operation.
IEEE floating-point standard: IEEE Standard 754

(2) Binary decimals

The left of the binary point is the first position, the right is 2^i, the right is the first position, and the right is the ^i.
Increasing the length of the binary representation can increase the precision of the representation.

(3) IEEE floating-point format

IEEE Floating point Standard
Symbol: s Determines whether the number is positive or negative. 0 The sign bit special case processing.
Mantissa: M is a binary decimal with a range of 1~2-ε or 0~1-ε,ε= (^n).
Order code: E to the floating-point weighting, the weight is 2 of the E power (may be negative).
The bit representation of a floating-point number is divided into three fields, each of which encodes the values
A separate sign bit s directly encodes the symbol S.
The Order field of K-bit exp = e (k-1) ... e1e0 coded order E.
N-bit decimal field frac = f (n-1) ... f1f0 encodes the mantissa M, but the encoded value also depends on whether the value of the Order field equals 0.
Floating point number can be divided into three kinds of expression
Normalized value
Non-normalized values
Special values

A) the normalized value

exp的位模式既不全0也不全1的时候，这是最一般最普遍的情况，因而是规格化的。

The Order field is interpreted as a signed integer represented in a biased form.
Order Code: E = E-bias
E: unsigned integer
Bias: Offset value, bias=[2^ (k-1)-1]
The small number segment frac is interpreted as describing the decimal value F, the binary decimal point to the left of the most significant bit of the small number segment.
Mantissa M = 1+f (sometimes also referred to as implied 1-based representation)

b) Non-normalized values

阶码域全为0时的数。

Order Code: E = 1-bias
Mantissa: M = f (the value of the small number field, not containing the implied 1)
Function:
Provides a way to represent a value of 0.
Represents a number that is very close to 0.0, providing a "gradual overflow" attribute.

c) Special values

特殊值是在阶码位全为1的时候出现的。

Decimal fields are all 0 o'clock, values are used to denote infinity: The sign bit is 0 for positive infinity, and the sign bit is 1 for negative infinity.
The decimal field is not full 0 o'clock, and the value is used to denote "NaN" (not a number). Such values are used to denote meaningless results such as the ^0.5 (-1).

(4) rounding

Because the representation method limits the range and precision of the floating-point numbers, floating points can only approximate the real numbers.
The task of rounding operations: find the closest matching value x ' to the value x ', which can be expressed in the desired floating-point form.

The IEEE floating-point format defines four different rounding methods:

向偶舍入(默认)  将数字向上或向下舍入，是的结果的最低有效数字为偶数。能用于二进制小数。向零舍入  把整数向下舍入，负数向上舍入。向下舍入  正数和负数都向下舍入。向上舍入  正数和负数都向上舍入。

The default (that is, to even round) method can get the closest match, the remaining three produces the actual value of the true bounds.

(5) floating point arithmetic

Floating point Addition

- 可交换- 不具结合性：缺少的重要群属性- 大多数值的浮点加法都有逆元，除了无穷和NaN。- 满足单调性

Floating-point multiplication

- 可交换的- 不具有结合性：可能发生溢出，或由于舍入而失去精度。- 乘法单位元为1.0- 在加法上不具备分配性- 在一定条件下满足单调性（无符号或补码的乘法没有这些单调性属性）

(6) floating-point number in C language

C provides two floating-point types: float and double, representing single-precision and double-precision floating-point numbers on machines that support IEEE floating-point standards.
The C standard does not expressly require the machine to adopt an IEEE standard on the representation of floating-point numbers.

Coercion type conversions between int, float, double

- int → float数字不会溢出，但有可能舍入- int/float → double保留精确的数值- double → float由于精确度变小，可能溢出为±∞，也有可能被舍入- float/double → int可能向零舍入，可能溢出。

Problems encountered and resolution 1. How exactly does the Perl language code compile and run?!

1，创建XXX.pl并编辑
2，perl XXX.pl3，运行./XXX.pl

2. General rules for the conversion of signed and unsigned numbers of the same length

数值可能改变，但是位模式不变。

Chapter II Representation and processing of information

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More