1.1.1. Information Storage
Information processed and stored by computers is represented by binary symbols. These binary numbers, that is, bit, may have no meaning when a single bit is obtained, but the bit is combined and added with some explanation, the information we want to represent. Here, the bitwise combination is actually the encoding method. Let's first look at the three most important digital encodings:
Unsigned encoding. It is a traditional binary representation that represents a number greater than or equal to zero.
Binary complement (two's-complement) encoding, indicating the most common method of signed integers.
Floating-point encoding indicates the version of the real number scientific notation based on 2.
A computer uses a limited bit to encode a number. When the result of an operation exceeds the indicated range, the Operation will cause overflow.
1.1.1.1. Basic concepts:
Byte: 8-bit block in the computer, with the smallest addressable memory unit.
Virtual Memory: Machine-level programs Regard memory as a very large byte array.
Address: each byte in the memory is identified by a unique number.
Virtual Address Space: a set of all possible addresses.
Word: each computer has a word size, indicating the nominal size of the integer and pointer data (norminal size ). A machine with n characters in length. The virtual address range is 0 ~ 2n-1, the program can access up to 2N bytes.
The pointer in C, no matter what it points to, is the virtual address of the first byte of a storage block. The C compiler Associates each pointer and type information to generate different machine-level code based on the pointer type to access the value stored in the position pointed by the pointer. The C compiler maintains this type of information, but the machine-level program it generates does not have information about the data type. It simply treats each program object as a byte block, the program itself is considered as a byte sequence.
Size of the numeric data type in C Language (unit: byte)
C statement |
Typical 32-bit Machine |
Compad Alpha Machine |
Char Short int Int Long int |
1 2 4 4 |
1 2 4 8 |
Char * |
4 |
8 |
Float Double |
4 8 |
4 8 |
1.1.1.2. hexadecimal
The value range of a byte, expressed in binary as 000000002 ~ 111111112. The hexadecimal format is 0016 ~ Ff16. In C, numbers starting with 0x or 0x are considered hexadecimal values.
Convert decimal number X to hexadecimal: [x = q0 * 16 + R0]-> [q0 = Q1 * 16 + R1]-> ...... -> [Rn = 0*16 + R n], then the result is [Rn rn-1... R2 R1]. Of course, Ri must be written as a hexadecimal number.
1.1.1.3. Addressing and byte order
For a program object that spans multiple bytes, we must establish two rules: what is the address of this object? How do we sort these bytes in memory?
Answer: 1. Multi-byte objects are stored as consecutive byte sequences. The object address is the smallest address in the byte sequence used.
2. There are two rules for sorting the byte sequence of an object: Large-end method and small-end method. Large-end method: the highest valid byte is at the beginning; the smallest valid word is at the beginning. If there is a hexadecimal number 0x01234567, It is shown in the following table:
Big client Method |
|
Zero x 100 |
Zero X 101 |
Zero X 102 |
Zero x 103 |
|
... |
01 |
23 |
45 |
67 |
... |
Small terminal Method |
... |
Zero x 100 |
Zero X 101 |
Zero X 102 |
Zero x 103 |
|
... |
67 |
45 |
23 |
01 |
... |
The byte sequence becomes very important during network communication. Different computers may have different big-end and small-end methods. Therefore, before sending some data, you need to convert the data from the host to the network in the byte order. After receiving the data, data needs to be converted from the network byte to the host byte for processing.
Print the byte representation of the program object:
# Ifndef show_byte_h
# Define show_byte_h
# Include <stdio. h>
Typedef unsigned char * byte_pointer;
Class cshowbytes
{
Public:
Void show_bytes (byte_pointer start, int Len)
{
Int I;
For (I = 0; I <Len; I ++)
{
Printf ("%. 2x", start [I]);
}
Printf ("\ n ");
}
Void show_int (int x)
{
Show_bytes (byte_pointer) & X, sizeof (INT ));
}
Void show_float (float X)
{
Show_bytes (byte_pointer) & X, sizeof (float ));
}
Void show_pointer (void * X)
{
Show_bytes (byte_pointer) & X, sizeof (void *));
}
Void show_string (char start [], int Len)
{
Int I;
For (I = 0; I <Len; I ++)
{
Printf ("%. 2x", start [I]);
}
Printf ("\ n ");
}
};
# Endif
1.1.1.4. String
The C string is an array of characters ending with null characters. Why do I need to convert the host byte sequence and network byte sequence for "some" data during network communication, rather than all of them? This is because the string is platform independent. Why?
Because each character is represented by a standard encoding, common ASCII encoding is used. For example, if you use an ASCII verification code in any system, the same result will be obtained, regardless of the byte sequence and the word size. The storage format of numbers and strings in the memory is compared. Because a character exactly corresponds to one byte, there is no relationship with the byte sequence, and the number involves the order of digits.
1.1.1.5. Boolean algebra and ring, bit and logical operation of C
Binary is the core of computer coding, storage, and operation information. The Boolean operations and ring structures around 0 and 1 become very important. Boolean algebra <{0, 1}, | ,&,~, 0, 1> there are many similar features with basic arithmetic operations, such as exchange, associativity, and identity. The red, green, and blue basic colors are valued in {0, 1}, and different Boolean operations are performed to generate a variety of colors.
The bitwise AND logical operations of C fully utilize the knowledge of Boolean algebra. I remember seeing a pen test on the Internet, that is, to exchange the values of two variables X and Y, but do not introduce third-party variables. If Boolean algebra is well used, you can easily provide the following solutions.
Void inplace_swap (int * X, int * Y)
{
* X = * x ^ * Y;
* Y = * x ^ * Y;
* X = * x ^ * Y;
}
In fact, the properties of a ^ A = 0 and a ^ 0 = A are used here.
The difference between logical operations and bit operations is that if the first parameter can determine the expression result, it will not calculate the subsequent parameter values. For example, P & * P ++ does not indirectly reference null pointers. For specific reasons, see the Assembly representation of logical operations in the Assembly representation of C.
C's shift operation, the Left shift is relatively simple, directly fill 0 in the right blank. The right shift is different. It can be divided into two types: Logical right shift and arithmetic right shift. The logical right shift is also done by filling 0 directly on the left, the right shift of arithmetic is the copy of the highest valid bits on the left-side gap. Almost all compilers/machine combinations use arithmetic shifts right on the number of symbols. The shift operation is also very important. Multiplication is executed on the CPU much more than addition or subtraction. To improve efficiency, we can usually use the shift operation to optimize the multiplication operation.