The distinction between signed and unsigned numbers in assembly

Source: Internet
Author: User
Tags mul truncated
Origin:http://blog.chinaunix.net/uid-28458801-id-3576608.html
Reproduced from: http://hi.baidu.com/asmsky/blog/item/7290d20076cab6da277fb5b8.html
One, there is only one standard.


In the assembly language level, when declaring variables, there is no signed and Unsignde, the assembler all, the integer literal you entered as a signed number of symbols processed into the computer, only this one standard. The assembler does not distinguish between a symbol and a symbol and then uses two criteria to handle it, all of which are symbolic. And they are all compiled into complement. That is, the db-20 is compiled after: EC, and DB 236 is compiled after the EC. Here's a little problem, thinking deep friends will find that DB is allocating a byte, then a word energy-saving notation signed integer range is:-128 ~ +127, then db 236 over this range, how can. Yes, the +236 complement does exceed a byte representation range, then take two bytes (of course more bytes better) can be installed, should be: EC, that is, +236 of the complement should be EC, a byte can not fit, but, do not forget the "truncation" this concept, This means that the final result is truncated, the EC is two bytes, is truncated to EC, so this is a "beautiful mistake", why say so. Because, when you use 236 as an unsigned number, the result of the Assembly is also the EC, which is all right, although the assembler only used a standard to deal with, but borrowed the "truncation" of this beautiful error, the result is in line with the two standard. That is to say, give you a byte, you want to enter a signed number, such as-20, so the result of the assembly is correct; if you enter 236 then you must treat it as an unsigned number (because 236 is not in the range of symbolic numbers represented by a word energy-saving) and the result is correct. So there is an illusion: the assembler has two sets of standards, will distinguish between signed and unsigned, and then separate assembly. In fact, you have been cheated. :-)


Second, there are two sets of directives.


1th, the assembler uses only one method to assemble the integer digital plane into the actual number of machines. But it is not that the computer does not distinguish between signed numbers and unsigned numbers, and on the contrary, the computer distinguishes between signed and unsigned numbers very clearly, because the computer has two sets of instructions as back-up when processing some of the same functions, which are prepared separately for signed and unsigned numbers. However, here to emphasize that a number in the end is a signed or unsigned number, the computer does not know, this is up to you, and when you think that the number you are dealing with is symbolic, then you use that set of signed instructions, and when you think that the number you are dealing with is unsigned, then use the set of instructions that deal with unsigned numbers. Addition and subtraction have only one set of instructions, as this set of instructions applies both to signed and unsigned. Following these instructions: Mul div movzx ... is to deal with unsigned numbers, and these: Imul idiv movsx ... is to deal with the signed.
For example:
In memory there is a byte x is: 0x EC, one byte y is: 0x 02. When X,y is viewed as a signed number, x = -20, y = +2. When viewed as unsigned, x = 236, y = 2. The following add operation, with the add instruction, the result is: 0x ee, then this 0x EE as a signed number is:-18, unsigned number is 238. Therefore, add an instruction can be applied to both signed and unsigned cases. (hehe, in fact why to complement Ah, is for this chant,:-))
Multiplication operation is not, must use two sets of instructions, signed case with Imul results are: 0x FF D8 is 40. Without the symbol of the case with Mul, get: 0x D8 is 472. (see article Appendix 2 routines)


Third, the lovely and terrible C language.


Why did you get to C again? Since most of the friends who come across symbols or unsigned questions are caused by the signed and unsigned statements in C, why do they start with the assembly? Because we now use the C compiler, whether GCC or vc6 of CL, is to compile the C language code into assembly language code, and then assemble the assembler into machine code. Figuring out the assembly is equivalent to fundamentally understanding C, and, using machine thinking to consider problems, must be compiled. (I usually encounter any strange C language problem is to compile it into a compilation to see.) )


C is lovely, because C conforms to the principle of kiss, the abstraction of the machine is just good, let us increase the thinking level (more humane than the assembly machine level), and not too far from the machine (like C #, Java and so on too far). The original k&r version of C is a high-level compilation ...:-)


C is scary because it reacts to everything at the machine level, and the question of whether or not the symbol is the case (Java does not exist because it is designed to have all integers signed). To illustrate the horror of C, one example:


#include <stdio.h>
#include <string.h>


int main ()
{
int x = 2;
char * str = "ABCD";
int y = (X-strlen (str))/2;


printf ("%d\n", y);
}


The result should be-1 but get: 2147483647. Why? Because the return value of strlen, the type is size_t, that is, unsigned int, the type is automatically converted when mixed with int, and the result is naturally unexpected ...
Observe the compiled code, the division instruction is Div, which means unsigned division.
The workaround is a cast, which becomes int y = (int) (X-strlen (str))/2; Force to a signed directional conversion (the compiler defaults to the opposite), so the division instruction is compiled into Idiv. We know that the same state of two memory units, with signed processing instructions Imul, idiv, and other results, and the use of unsigned processing instructions mul,div, and so the results are very different. So it involves the problem of signed unsigned computations, especially when there is a nasty automatic conversion, be careful. (when automatically converted, either GCC or CL will not prompt ...) )




To avoid these mistakes, it is advisable to make sure that your variables are signed in the calculation. Finish






Appendix
The 2nd volume PDF of 1:ia-32 Intel architecture Software Developer ' Manual ' is described as "sign-extended imm8," which is the meaning of the symbolic extension. Say symbolic extension: when the operand is extended in length, it is necessary to make the operand longer and cannot change the original value, so there is a sign extension. such as Movsx Ax, 0xEC, after the extension, the AX value is: 0xFFEC, the length is longer, the result has not changed, are-20.


2: Two sets of multiplication instruction result routines


;; Program is stored as X.S
;; --start--------------------------------------------------------


extern printf
Global Main


Section. Data
Str1:db "%x", 0x0d,0x0a,0
N:db 0x02
Section. Text
Main
XOR Eax,eax
mov al, 0xec
Mul byte [n]; Signed multiplication instruction: Imul


Push EAX
Push STR1
Call printf


Add Esp,byte 4

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.