Information Security system Design Fundamentals third Week study summary

Last Update:2015-10-03 Source: Internet

Author: User

Tags rounds

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Chapter II Representation and processing of information

First, preface

1. Binary digits are called bits (bit)

2. Three important numbers: unsigned code, complement (signed), floating point number (scientific notation)

3, floating point operation although overflow will produce a special value +∞, but a set of positive product is always positive. Because of the limited precision of the representation, floating-point arithmetic cannot be combined. Integer and floating-point arithmetic have different mathematical properties because they deal with a different way of representing the finite number of digits-the representation of an integer can only encode a relatively small range of values, but this representation is accurate, while floating-point numbers can encode a larger range of values, but this representation is just near.

Second, information storage

1. Most computers use a 8-bit block, or byte, as the smallest addressable memory unit. Machine-level programs treat memory as a very large array of bytes, called virtual memory. Each byte of the memory is identified by a unique number, called its address, and the collection of all possible addresses is called the virtual address space.

2. how the compiler and runtime systems divide the memory space into more manageable units to hold different program objects (program object), that is, procedural data, directives, and control information. the value of a pointer in C (whether it is pointing to an integer, a struct, or some other program object) is the virtual address of the first byte of a storage block.

3. Hexadecimal notation

A byte is made up of 8 bits. In binary notation, its range is 000000002 to 111111112, and if it is represented by a decimal integer, its range is 0 ~ 255. The bit pattern is represented by a base of 16, or by a hexadecimal (hexadecimal) number. Hex ("hex") uses the number ' 0 ' ~ ' 9 ', and the character ' A ' ~ ' F ' to represent 16 possible values. In hexadecimal notation, the value of a byte is 0016 ~ FF16.

A numeric constant that begins with 0x or 0X is considered to be a hexadecimal value. The character ' A ' ~ ' F ' can be either uppercase or lowercase or even case mixed.

A common task for writing machine-level programs is to manually convert the decimal, binary, and hexadecimal representations of the in-place mode.

When n is expressed in the form of i + 4 J, where 0≤i≤3, we can write X as the beginning of the hexadecimal number 1 (i = 0), 2 (i = 1), 4 (i=2) or 8 (i=3)

4. Each computer has a word size, which indicates the nominal size of the integer and pointer data (nominal size). Because the virtual address is encoded in one of these words, the most important system parameter that the word length determines is the maximum size of the virtual address space. For a machine with a word length of W, the virtual address ranges from 0 to 2w-1, and the program accesses up to 2w bytes.

5. Data size

The data type of C char represents a separate byte. Although "Char" is named for the fact that it is used to store a single character in a text string, it can also be used to store integer values.

C's data type int can also be preceded by a qualifier short, long, and the nearest long long to provide an integer representation of various sizes.

The exact number of bytes depends on the machine and the compiler. The "short" integer is assigned 2 bytes, and the unqualified int is 4 bytes. The "Long" integer uses the full word length of the machine. The "Long" integer data type introduced by ISO C99 allows 64-bit integers.

Single precision (declared in C as float) and double precision (declared as double in C). The format uses 4 bytes and 8 bytes, respectively.

6. Addressing and byte order

What is the address of this object, and how these bytes are arranged in memory.

Some machines choose to store objects in memory in the order in which they are from the least significant byte to the most significant byte, while others are stored in the order from the most significant byte to the lowest valid byte. The previous rule-the least significant byte in the front of the way, called the small-end method (little endian). This rule is used by most Intel compatible machines. The latter rule-the most significant byte in the front way, called the big endian. Most IBM and Sun Microsystems machines use this rule.

Code writing for a network application must follow established rules on byte order to ensure that the sender machine translates its internal representation into network standards, while the receiving machine translates the network standard into its internal representation.

A disassembler is a tool that determines the sequence of instructions represented by an executable program file.

When reading this small-end method machine-generated machine-level program representation, the bytes are often displayed in reverse order. The natural way to write a byte sequence is that the lowest byte is on the left, while the highest byte is on the right, which is exactly the most significant bit on the left, and the least significant bit on the right is the opposite.

The third case where byte order becomes visible is when writing programs that circumvent the normal type system. In the C language, you can use coercion type conversion (CAST) to allow referencing an object in a data type that differs from the data type defined when the object was created.

On Linux 32, Windows and Linux 64 this shows that they are small end-of-the-way machines, while Sun is a big-endian machine.

Multibyte objects are stored as contiguous sequence of bytes, with the address of the object being the smallest address in the byte being used.

Linux 32, Windows, and Sun machines use 4-byte addresses, while Linux 64 uses 8-byte addresses.

The ASCII code for the decimal digit x is exactly 0x3x, and the hexadecimal representation of the terminating byte is 0x00. Any system that uses ASCII code as a character code will get the same result, regardless of the byte order and word size rules. As a result, text data has greater platform independence than binary data.

The Java programming language uses Unicode to represent strings. Unicode-enabled libraries are also available for the C language.

7. Boolean algebra

Binary values are the core of computer coding, storing, and manipulating information.

The simplest Boolean algebra is defined on the basis of the two-tuple set {0,1}.

Claude Shannon (1916-2001), which founded the field of information theory, first establishes the connection between Boolean algebra and digital logic.

The Boolean operation extends the operation of the in-place vector, which is a string with a fixed length of W, composed of 0 and 1. The operation of a bit vector can be defined as an operation between each corresponding element of a parameter.

A useful application of bit vectors is to represent a finite set.

Boolean Operations | and & correspond to the set's and intersection respectively, and the ~ corresponds to the complement of the set.

This mask represents a set of valid signals.

8. Bit-level arithmetic in C language

The best way to determine the result of a bit-level expression is to extend the hexadecimal parameter into binary notation and perform a binary operation, and then convert back to 16 binary.

A common use of bit-level arithmetic is to implement a mask calculation, where the mask is a bit pattern that represents a collection of bits selected from a word.

A mask of 0xFF (the lowest 8 bits of 1) represents the low byte of a word.

9. Logical operation in C language

10. Shift operation in C language

The machine supports two forms of right shift: logical right SHIFT and arithmetic right shift. The logical right moves at the left to complement K 0, the arithmetic right shift is the value of the most significant bit in the left complement K

The C language standard does not explicitly define which kind of right-shift should be used. For unsigned data (that is, integer objects declared with qualifier unsigned), the right shift must be logical. For signed data (the default declared integer object), the right shift of arithmetic or logic is possible.

Almost all compiler/machine combinations use arithmetic right shifts for signed data, and many programmers assume that the machine will use this right shift.

Java, on the other hand, has a clear definition of how to move right. The expression x>>k shifts the X arithmetic to the right by the K position, and the x>>>k shifts the logical right of the X.

Third, integer representation

1. Integer data type

The C language supports multiple shaping data types-integers representing a limited range.

To use the "long Long" type in C99, compiling is to use the Gcc-std=c99

2, unsigned number of code p39-44

Suppose an integer data type has a w bit. We can write a bit vector as x→, represent the whole vector, or write [Xw-1, xw-2,...,x0], representing each bit in the vector. As a binary representation of x→, the unsigned representation of x→ is obtained.

Unsigned binary has an important attribute, that is, each integer between 0~2^w-1 has a unique value of w encoded, and the function is a double-shot.

3, signed number and unsigned number conversion p44-47

The C language allows for coercion of type conversions between different numeric data types. Converting negative numbers to unsigned numbers can get 0. If the converted unsigned number is too large to exceed the range that the complement can represent, you may get Tmax.

The C language allows the conversion between signed and unsigned numbers, and the principle of conversion is that the underlying bit representation remains the same

4. Signed number and unsigned number p47-49 in C language

5, expand the bit representation of a number

To convert an unsigned number to a larger data type, simply add 0, or 0, at the beginning of the representation; Convert a complement number to a larger data type you can perform a symbol extension, which is a copy of the value that represents the most significant bit added.

6. Truncation of digital P51

The number of a W bit assumes that we do not extend a value with an extra bit, but rather reduce the number of digits that represent a number. X=[xw-1, xw-2,...,x0] when truncated to a K-bit number, the high w-k bit is discarded, a bit vector is obtained [Xk-1, xk-2,...,x0], and truncating a number may change his value-a form of overflow.

Note : Implicit coercion of type conversions with signed numbers to unsigned numbers results in some non-intuitive behavior. These non-intuitive features often result in program errors, and this kind of error, which contains subtle differences in implicit coercion type conversions, is difficult to find. Because this coercion type conversion occurs without explicit instructions in the code, the programmer often ignores its impact.

7. Complement code

The most common form of a computer representation of signed numbers is the complement. In this definition, the most significant bit of the word is interpreted as a negative right.

The range of values that can be represented [ -2^ (w-1) ~2^ (w-1)-1], in the range that can be represented, each number has a unique W-bit complement code, and the function is a double-shot.

Note :

The complement uses the length of the register as a fixed feature to simplify mathematical operations. Think of clocks, 12-1 is equivalent to 12 + 11, the use of complement can be used to unify mathematical operations into addition, as long as an adder can achieve all the mathematical operations.
The range of the complement is asymmetrical: | tmin| = | Tmax| + 1, which means that tmin does not have a positive number corresponding to it. This leads to some special attributes of the complement operation and can easily cause minor errors in the program. This asymmetry occurs because half of the bit patterns (the number of sign bits set to 1) represent negative numbers, and half of the numbers (the sign bit is set to 0) represent non-negative numbers. Because 0 is a non-negative number, it means that a positive number can be expressed less than a negative number.
The largest unsigned value is just twice times larger than the maximum of the complement: UMAXW = 2 Tmaxw + 1. All bit patterns representing negative numbers in the complement representation become positive numbers in the unsigned representation.
- Anti-code: In addition to the most effective bit of the right is-(2w-1-1) and not -2w-1, it is the same as the complement
- Original code: The most significant bit is the sign bit used to determine whether the remaining bits should take negative or positive rights.

8. Other representations of the number of symbols

Anti-code: In addition to the most effective bit of the right is-(2w-1-1) and not -2w-1, it is the same as the complement

Original code: The most significant bit is the sign bit used to determine whether the remaining bits should take negative or positive rights.

Four, integer arithmetic

1. Unsigned addition

Consider two non-negative integers x and y, satisfying 0≤x, y≤2w-1. Each number can be represented as a W-bit unsigned number. However, if we calculate their and, we have a possible range of 0≤x + y≤2w+1-2. Represents this and may require a W + 1 bit. This constant "word-length expansion" means that the word length is limited to the full expression of the result of the arithmetic operation.

Unsigned operations can be considered as a form of modulo operations. Unsigned addition is equivalent to computation and 2w on the modulo. This value can be calculated by simply discarding the highest bit represented by the W + 1 bits of x + Y.

overflow : An arithmetic operation overflow means that a complete integer result cannot be placed in the total length limit of the data type.

2. Complement addition

3, the complement of non-

Each digit x in the range -2w-1≤x < 2w-1 has an additive inverse under +WT. defined for the -2w-1≤x < 2w-1 within the range of X, the complement of the non-op-WT is as follows:

4. Non-symbolic multiplication

In 0≤x, integers x and y in Y≤2w-1 can be represented as unsigned numbers of w bits, but their product x The values for Y range from 0 to (2w-1) 2 = 22w-2w+1+1. This may require a 2w bit to represent. However, the unsigned multiplication in the C language is defined as the value that produces the W-bit, which is the value represented by the low w bit of the integer product of the 2w bit. can be regarded as equivalent to the computational product modulo 2w.

Therefore, the W-bit unsigned multiplication operation * Wu results in:

5. Complement multiplication

The symbolic multiplication in C is achieved by truncating the product of the 2w bit to w bit.

6. Multiply constants

The compiler used an important optimization to try to replace multiplication multiplied by a constant factor with a combination of shift and addition operations. The integer is split into a power of 2, then the shift is used to calculate (left), and the result is added at the end. Similarly, for non-negative numbers, the arithmetic right shift K-bit is the same as dividing by 2^k.

We can fix this inappropriate rounding by "biasing" (biasing) before the shift. The attributes used by this technique are: for integers x and y of any y > 0, there are "x/y for example, when x =-30 and y = 4, we have x + y-1 =-27, and"-30/4 we have x + y-1 =-29, and "-32/4 here 0≤r < y , get (x + y-1)/y = k + (r + y-1)/y, so when the following item equals 0, and when R > 0 o'clock, equals 1. That is, by adding a partial y-1 to X and then rounding the division down, when y divides x, we get k, otherwise we get K + 1. Therefore, for x < 0, if you precede the right shift with X plus 2k-1, then we will get the result of rounding correctly.

Four, floating point number

A floating-point representation encodes the rational number of a shape such as V = xx2y. It involves a very large number of executions (| V |>

0), very close to 0 (| V |<<1), and more generally as a calculation of the approximate value of a real number operation, is useful.

1, binary decimal

2. IEEE Floating point representation

The IEEE floating-point standard represents a number in the form of V = ( -1) ^sxmx2^e:

Symbol: s Determines whether the number is negative (S=1) or positive (s=0), while the symbolic bit interpretation for the value 0 is handled as a special case.

Mantissa: M is a binary decimal, and its range is 1~2-ε, or 0~1-ε.

Order code: The role of E is weighted to the floating-point number, which is 2 of the power of the E (may be negative).

Divide the bit representation of a floating-point number into three fields and encode the values:

A separate sign bit s directly encodes the symbol S.

The Order field of K-bit exp = EK-1...E1E0 encoded Order E.

N-bit decimal field frac = FN-1...F1 F0 encoded mantissa m, but the encoded value also depends on whether the value of the Order field equals 0.

Given a bit representation, the encoded value can be divided into the following three cases, depending on the exp value:

Case 1: Normalized value

This is true when the EXP bit mode is neither 0 (value 0) nor all 1 (single-precision value is 255, double-precision is 2047). In this case, the Order field is interpreted as a signed integer in the form of a bias (biased). That is, the value of the order is E = E-bias, where e is an unsigned number, its bit is represented as EK-1...E1E0, and Bias is a bias value equal to 2k-1-1 (single precision is 127, double precision is 1023). This results in an exponential range of values, which is -126~+127 for single precision and -1022~+1023 for double precision.

Case 2: Non-normalized values

When the Order field is full 0 o'clock, the number represented is a non-normalized form. In this case, the order value is E = 1-bias, and the value of the mantissa is M = f, which is the value of the small number field, which does not contain the implied beginning of 1. The reason for non-normalized values to set the bias value is that it seems counterintuitive to make the order value 1-bias instead of a simple-bias. We will soon see that this approach provides a smooth transition from non-normalized values to normalized values.

The number of non-normalized numbers has two uses:

First, they provide a way to represent the value 0, because with normalized numbers we must always make m≥1, so we cannot represent 0.

Another feature is the number that is very close to 0.0. They provide a property called a gradual overflow, where the possible numerical distributions are uniformly close to 0.0.

Case 3: Special values

Occurs when the point code is all 1. When the decimal field is all 0 o'clock, the resulting value represents infinity, when s = 0 o'clock is +∞, or when s = 1 o'clock is-∞. When we multiply two very large numbers, or divide by zero, infinity can represent the result of overflow.

3. Rounding

* * Rounding: * * Because the representation method limits the range and precision of floating point numbers, floating-point arithmetic can only approximate the real number operation. Therefore, for the value x, we generally want to use a system method, can find the "closest" Match value X ', it can be expressed in the desired floating-point form.

* * The rounding direction is determined in the middle of two possible values: * * An alternative approach is to maintain the lower and upper bounds of the actual number. For example, we can determine the values that can be represented by X and x+, so that the values of x lie between them: x-≤x≤x+.

The IEEE floating-point format defines four different rounding methods.

The default method is to find the closest match, while the other three can be used to calculate upper and lower bounds.

The other three ways of producing actual values are indeed bounded. These methods are useful in a number of applications. Rounds a positive number to a 0 rounding method, rounds the negative numbers up, and gets the value x^, making | X ^|≤| x |. The rounding down method rounds both positive and negative numbers down to get the value x, which makes the x-≤x. Rounding up the positive and negative numbers rounded up to get the value x+ to meet the x≤x+.

4. Floating point Arithmetic

The IEEE standard specifies a simple rule that is used to determine the results of arithmetic operations such as addition and multiplication. The floating-point values X and Y are considered real, and an operation ⊙ is defined on the real number, and the calculation produces round (x⊙y), which is the result of rounding the exact result of the actual operation. When one of the parameters is a special value (such as-0,-∞, or Nan), the IEEE standard defines some rules that make it more reasonable. For example, defining 1/-0 will produce-∞, and defining 1/+0 will produce +∞.

Floating-point addition does not have a binding, which is the most important group attribute that is missing.

Floating-point addition satisfies the monotonicity attribute: If a≥b, then for any value of a, B, and X, except Nan, there is x + a≥x + B. unsigned or complement addition does not have the attribute of this real (and integer) addition.

For any A, B, and C, and A, B, and C are not equal to Nan, floating-point multiplication satisfies the following monotonicity:

5, C-language floating-point number

All C language versions provide two different floating-point data types: float and double. On machines that support IEEE floating-point format, these data types correspond to single-precision and double-precision floating-point.

The newer version of the C language, including the ISO C99, contains the third floating-point data type long double. For many machines and compilers, this data type is equivalent to a double data type. However, for Intel compatible machines, GCC uses the 80-bit "extended precision" format to implement this data type, providing a much larger range and precision than the standard 64-bit format.

============ problems encountered ==================

The understanding of the formula is still not enough, although the teacher said the formula can not look, but do not look at the formula will be a large part of the content can not be understood, practice is also troublesome. So I spend a lot of time on the formula, but in the end it is not as good as the previous compilation, computer introduction to the logarithmic system of simple understanding. The code also has many parts that need to be constructed. If there is no answer, I think I may not know how to write.

Information Security system Design Fundamentals third Week study summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More