2.Representing and manipulating information This chapter from the binary, word length, byte order, always talk about Boolean algebra, bit operations, the last unsigned, signed integer, floating point number representation and operation. Admittedly some of the mathematical proof of some of the boring, but overall, this chapter is still dry full!
2.1 Decimal vs. Binary Notation
We're used to decimal just because we have 10 fingers (? ), so you will not be used to the binary. But
the two-value signal (Two-value signal) has a great advantage in representation, storage, and transmission, from the hole in the perforated belt (the code representation), to the high and low voltage on the wire (data transmission), to the magnetic domain (magnetic domain) clockwise, counterclockwise rotation ( Storage on disk)。
2.2 Words
Word length is the amount of data that the CPU can handle at a time, typically representing the length of the number of digits that can be processed, as well as the width of the CPU data path (address bus, data bus).
2.3 Addressing & Byte ordering
byte order, large tail end of small tail, these familiar words, as if have been over and over again to learn a lot to be good. But what the hell is this thing for? The answer is very useful:
- binary data transfer between different machines : I don't know if it is a big end or a small end, it must be wrong when parsing.
- the integer data in the decompile program : The instruction does not matter what order, but the operand followed by the command if you do not know the order that can be messed up! For example, decompile a sentence in the program: 01 05 64 94 04 08. where 01 05 represents the Add%eax register, then the byte order of the 64 94 04 08 After the operation is critical. Small end of the word, it represents 0x8049464, the big end is the opposite.
- avoid the type system directly accessing the underlying bytes : For example, the C language uses cast to convert an object to a completely different type than when it was created. This is certainly not recommended for application programming, but it is very useful in system programming!
Let's go over the byte-order again. Suppose a 4-byte integer 0x01234567 is stored at address 0x100, from address 0x100 to 0x103. Then the size of the end of the storage method as shown. It must be noted that the so-called size is relative to the memory address. The large end is from the low address to high address, low address storage high, high address storage low, so called the large end.
If the address grows from left to right, then the large end is the same as the order in which we are accustomed to writing.
However, there is no such distinction for strings. For example, the byte of the string "12345" is: 31 32 33 34 35 00 (Terminator). In any machine that uses ASCII as a character encoding, this is the result, regardless of the context of the byte order, word length, and so on. Because strings are made up of a single character, there is no order for individual characters. So
character data has better platform independence than binary data.
Expand: That's the problem with Unicode encoding that uses multibyte to represent characters?"UTF-8 is a single-byte encoding unit with no byte order problem. UTF-16 takes two bytes as the encoding unit, before interpreting a UTF-16 text, it is first to clarify the byte order of each coding unit. For example, the Unicode encoding for receiving a "Kui" is 594E, and the Unicode encoding for "B" is 4E59. If we receive the UTF-16 byte stream "594E", then is this "Kui" or "B"? The recommended way to mark byte order in the Unicode specification is Bom,byte order mark. The BOM is a bit of a clever idea: there is a character called "ZERO WIDTH no-break SPACE" in the UCS code, and its encoding is Feff. Fffe is not a character in UCS, so it should not appear in the actual transmission. The UCS specification recommends that the character "ZERO WIDTH no-break SPACE" be transmitted before the byte stream is transmitted. Such
if the recipient receives Feff, it indicates that the byte stream is Big-endian, and if Fffe is received, it indicates that the byte stream is Little-endian. So the character "ZERO WIDTH no-break SPACE" is also called a BOM.
About the end of the story is also very interesting, excerpt for a moment to relax:"The two great powers, Lilliput and Blefuscu, have been fighting for the past 36 months. The war started because of a reason: we all think, before eating eggs, the original method is to break the larger end of the egg, but today the emperor's grandfather ate eggs, a time to beat eggs by the ancient law happened to be a finger broke, so his father, then the emperor, then a pardon order, Ordering all subjects to eat eggs is to break the small end of the egg, breach heavy fines. The people were very disgusted with the order. History tells us that there have been six rebellions, one of whom has been killed and the other has lost his throne. Most of these judgments were instigated by Blefuscu's Kingdom ministers. After the rebellion subsided, the exiles fled to the Imperial district to seek refuge. According to statistics, several times there are 11,000 people willing to die also refused to break the small end of the egg. In this dispute, hundreds of books have been published, but the book of the big-endian has been banned, and the law stipulates that no one in the faction can be an official. (Excerpt from Shangian's Gulliver travels, chapter 4th) Swift is a mockery of the continuing conflict between Britain (Lilliput) and France (Blefuscu). Danny Cohen, an early pioneer of network protocols, used these terms for the first time to refer to byte order.
2.4 bit-level Operation & Logical Operation
Shannon created the information theory and, for the first time, established a connection between Boolean algebra and digital logic. We can extend the Boolean operation to the vector, the bit vectors have two useful ways: 1) represent a finite set to achieve the purpose of compression. I can encode any subset of collection a {0, 1, ... w-1} with a bit vector [aw-1, ..., a1, A0]. When I belongs to a, the AI is set to 1. For example, we use [01101001] to denote [0, 3, 5, 6], with [01010101] to denote [0, 2, 4, 6]. That means the bit vectors are "backwards" saved! (pending further study)
2) Signal mask. Different signals can interrupt the execution of the program, so we can use specific bit vectors to enable or disable different signals.
Classic Bitwise operations:1)
Clear Value: X & 0xFF (keep the lowest one byte only) 2)
Setting the value: x | 0x33
Portable "1": ~04)
Invert value: x ^ 0xF (xor
Feature 1: 101 ^ 111 = 010, while
feature 2: 101 ^ 000 = 101) 5)
Qing 0: Xor%edx,%edx (XOR
Feature 3。 Why not set the value directly, because the machine code generated in this way is only two bytes, and the value is set to five bytes)
The logical operation is often confused with the bit operations mentioned above. The two are still very different: 1) The logical operation considers any non-0 parameter as True and returns only 1 or 0. Therefore, only if the argument is 0 and 1 o'clock, the logical operation has the same behavior as the bitwise operation. 2) The logic operation has a short-circuit effect. After a certain part of the logical operation has been able to determine the true and false of the entire expression, the latter part of the logical operation will not be executed. This behavior is also not in the bit operations.
2.5 Integer Representation
For unsigned integers it is very simple, and each one represents a 2 i-th square.
For signed integers, let's start by saying a binary representation of integers, which we can directly think of: Use the highest position as the sign bit, 0 for integers, and 1 for negative numbers. But this approach has an important
defect: integer 0 has two positive and negative representations。 Modern computers use a different kind of binary notation that we are familiar with: complement (two ' s complement)! The highest bit is 1 o'clock and the highest bit value is the negative of its weight. So
when we look at assembly or machine code, we often see the 0xffff...xx, because the highest bit is the negative value of the weight, so we have to set a lot of times to represent a "small" negative number。 such as 0xfffffec8=-312. The small tail end machine is also the C8 FE FF ff.
Six Star Classic Csapp notes (2) operation and representation of information