Discussion on signed and unsigned numbers

Last Update:2018-12-04 Source: Internet

Author: User

Tags mul

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This problem is easy to understand. However, if you think deeply, there are some things.
Next I will try to extend this item to a bit more in-depth.

I. There is only one standard!

At the Assembly language level, when declaring variables, there is no difference between signed and unsignde. The assembler processes all the input integers as signed numbers into supplementary codes and saves them to the computer, only this standard is available! The assembler does not distinguish between signed and unsigned, and then uses two standards for processing. All of them are signed! And all of them are compiled into supplementary codes! That is to say, the DB-20 assembly is followed by the EC, and the DB 236 assembly is also followed by the EC. There is a small problem here. If you think deeply, you will find that dB is allocated a byte, so the signed integer range of a word is-128 ~ + 127, so dB 236 exceeds this range. How can this problem be solved? Yes, + 236
The complement of is indeed beyond the range of one byte, so take two bytes (of course more bytes are better) can be mounted, should be: 00 EC, that is to say, the complement code of + 236 should be 00 EC. one byte cannot be mounted, but don't forget the concept of "truncation". That is to say, the final compilation result is truncated. 00 EC is two bytes, it is truncated to EC, so this is a "beautiful error". Why? Because when you regard 236 as an unsigned number, the results after compilation are exactly the same as those of EC. This is a big pleasure. Although the assembler only uses one standard for processing, however, after the beautiful error "truncation" is used, the results are in line with two standards! That is to say, to give you a byte, you want to enter the number of characters, such
-20 the Assembly result is in line with the signed number; if you enter 236, you must handle it as an unsigned number (because 236 is not within the range of the symbol number expressed by energy saving ), the result is in line with the unsigned number. So I give you the illusion that the assembler has two sets of standards that distinguish between signed and unsigned, And Then assemble them separately. Actually, you are cheated. :-)

2. There are two sets of commands!

The first point indicates that the assembler uses only one method to compile the integer literal into a real machine number. It does not mean that the computer does not distinguish between the number of symbols and the number of unsigned. On the contrary, the computer is very clear about the distinction between the number of symbols and the number of unsigned numbers, because two sets of commands are used as a backup when a computer processes some of the same functions, they are prepared for the signed and unsigned numbers respectively. However, it should be emphasized that the computer does not know whether a number is a signed number or an unsigned number, which is determined by you, when you think that the number you want to process is signed, you can use that set to process the number of signed commands. When you think that the number you want to process is unsigned, use the set of commands that process the unsigned number. Addition and subtraction only have one set of commands, because these commands are applicable to both signed and unsigned commands. The following Commands: MUL
Div movzx... Is to process the unsigned number, and these: imul idiv movsx... Is to process signed.
For example:
In the memory, there is a byte X: 0x EC, and a byte y is: 0x02. When X and Y are treated as signed numbers, x =-20, y = + 2. X = 236, y = 2. The following is an addition operation. Run the Add command and the result is 0x EE. The 0x EE is treated as a signed number:-18, and the unsigned number is 238. Therefore, adding a command can be used in two cases: signed and unsigned. (Haha, why do I need to complete the code? It's just for this ,:-))
The multiplication operation will not work. Two sets of commands must be used. If there is a symbol, the result obtained with imul is: 0x FF D8 is-40. If Mul is used, the value 0x01 D8 is 472. (See Appendix 2 routine)

3. Cute and terrible C language.

Why c again? Because most of my friends who encounter signed or unsigned problems are caused by the signed and unsigned declarations in C, why did they start with assembly? Because the C compiler, whether GCC or vc6 Cl, is used to compile C language code into assembly language code and then compile it into machine code with the assembler. The compilation is equivalent to a fundamental understanding of C. In addition, the compilation must be used to consider problems with machine thinking. (I usually encounter any strange C language problem by compiling it into an assembly .)

C is cute, because C conforms to the KISS Principle and the abstract degree of the machine is just right. This makes us improve the thinking level (more humane than the machine level of assembly ), it won't be too far away from the machine (such as C # and Java ). C Of K & R was a high-level compilation ...... :-)

C is terrible because it reflects everything on the machine layer. For example, there is no symbol in this case (Java does not have this problem, because it is designed as All integers are signed ). To illustrate its terrible features:

# Include <stdio. h>
# Include <string. h>

Int main ()
{
Int x = 2;
Char * STR = "ABCD ";
Int y = (X-strlen (STR)/2;

Printf ("% d \ n", y );
}

The result is-1, but 2147483647 is returned. Why? Because the return value of strlen is size_t, that is, the unsigned int, the symbol type is automatically converted to the unsigned type when it is mixed with int, and the result is naturally unexpected...
Observe the compiled code. The division command is Div, which means the unsigned division.
The solution is to convert it to int y = (INT) (X-strlen (STR)/2; to convert it to a signed direction (the opposite is true for the compiler by default, the division command is compiled into idiv.
We know that the two memory units in the same status are the results obtained by using signed processing commands such as imul and idiv, and the results obtained by processing commands such as Mul and Div with unsigned, is completely different! So it involves the problem of signed and unsigned computing, especially when there is a nasty automatic conversion, be careful! (No GCC or CL prompt is displayed during automatic conversion here !!!)

To avoid these errors, we recommend that you ensure that all your variables are signed During computation.

IV. C's practices.

For signed and unsigned processing, the C language layer is more user-friendly. For example, when declaring a variable, C has a signed prefix and an unsigned prefix. However, there is no difference in assembly. You must be sure that you are all yourself. For example: if you want to input a signed number in a byte, do not exceed-128 ~ + 127. To enter an unsigned number, make sure that the value ranges from 0 ~ In the range of 255. If you enter 236 and you want to enter the number of symbols, you must be wrong, because the number of symbols 236 requires at least two bytes for storage (00 EC ), do not underestimate the 00 of that byte. In the symbol multiplication mode, the 00 EC of the two bytes and the EC of the same byte are multiplied by the same number, the results are completely different !!!

Let's take a look at the specific column child (generated using the CL compiler of vc6 ):

Compilation languages produced after C language compilation
......
Char X;
Unsigned char y;
Int Z;

X = 3;
Y = 236;

Z = x * Y;
...... ......
_ X $=-4
_ Y $=-8
_ Z $=-12
......
MoV byte PTR _ x $ [EBP], 3
MoV byte PTR _ y $ [EBP], 236

Movsx eax, byte PTR _ x $ [EBP]
MoV ECx, dword ptr _ y $ [EBP]
And ECx, 255

Imul eax, ECx
MoV dword ptr _ Z $ [EBP], eax
......

We can see that when assigning values (the green part), after compilation, it is the same as the first article in this article, whether there is a symbolic grasp in itself, and C is better than assembly, which is not reflected, this can also be understood, because C will eventually be compiled into an assembly, and the Assembly does not have the function of declaring the time zone with or without symbols in the Variable. Naturally, C cannot. However, since C provides signed and unsigned declarations, after compilation, the code will definitely reflect this point. The red part in the table is. Symbol extension is performed on the number of signed characters X, and zero extension is performed on the unsigned y. Here, for the convenience of examples, we have carried out the mixed operation of the number of signed and unsigned numbers. This situation should be avoided in actual programming.

(End)

Appendix:

1. the computer uses only one encoding method for the expression of signed integers. If there is no positive number, the original code is used. If there is a negative number, the original code is used as the complement code. In this case, all signed integers in most computers use the complement code, that is to say, no matter whether it is positive or negative, this computer only uses the complement code to encode it !!! However, the positive and 0 complement codes are in the same form as their original codes, while the negative complement codes are in the same form as the absolute value of the original code and are reversed and incremented.

2. Two sets of multiplication instruction result routines:

; The program is stored as X. s

Extern printf
Global main

Section. Data
Str1: DB "% x", 0x0d, 0x0a, 0
N: db 0x02
Section. Text
Main:
XOR eax, eax
MoV Al, 0xec
Mul byte [N]; The signed multiplication command is: imul

Push eax
Push str1
Call printf

Add ESP, byte 4
RET

Compilation steps:
1. NASM-felf X. s
2. GCC x. o

Ubuntu7.04 is compiled using NASM and GCC. The results comply with the document.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More