The symbol bit extension after char becomes int type

Last Update:2015-08-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Binary negative Number:

The original code is the original representation method

Anti-code is the inverse of the sign bit (highest bit)

Complement = anti-code +1

1 bytes It can only represent 256 numbers anyway, because there is a sign, so we're going to represent it as a range:-128-127. How is it stored in the computer? It can be understood that the symbol bit with the highest level, if 0 is a positive number, if it is 1 for negative numbers, the remaining 7 bits to store the absolute value of the numbers, can represent the absolute value of 27 numbers, and then consider the positive and negative two cases, 27*2 or 256 number. First, the definition of 0 is stored in the computer as 00000000, and for positive numbers we can still convert like unsigned numbers, from 00000001 to 01111111 in turn 1 to 127. Then these numbers correspond to the binary code is the original code of these numbers. Many people will think that the negative number from 10000001 to 11111111 in turn represents-1 to-127, then you find no, if so then there are only 255 numbers, because 10000000 of the situation is not taken into account. In fact, 10000000 represents the smallest negative integer in the computer, which is the-128, and it is not actually from 10000001 to 11111111, in turn, 1 to-127, but just the opposite, From 10000001 to 11111111 in turn-127 to-1. Negative integer in the computer is in the form of a complement of storage, complement is how to express, here also to introduce another concept-anti-code, the so-called anti-code is the negative number of the original code (negative number of the original code and its absolute value corresponding to the same original code, simply said that the absolute value of the same numbers of original code) is 1 to 0, is 0 to 1, such as 1 of the original code is 00000001, and 1 of the original code is the same, then-1 of the anti-code is 11111110, and the complement is on the basis of anti-code plus 1, that is, 1 of the complement is 11111110+1= 11111111, so we can figure out that-1 is stored in the computer by 11111111. Summing up, the computer stores a signed integer, is used in the complement of the integer storage, 0 of the original code, the complement is 0, a positive number of the original code, the complement can be a special understanding of the same, negative complement is its anti-code plus 1. Here are a few more examples to help you understand!

Decimal → binary (how to calculate?) If you don't know how to read a computer-based book)

47→101111

Signed integer original code anti-code complement

47 00101111 00101111 00101111 (positive complement and original code, inverse code is the same, can not literally understand)

-47 10101111 11010000 11010001 (negative complement is added 1 on the inverse code)

///////////////////////////

C Language Data type length

In C, it is only possible to specify sizeof (short) <sizeof (long), sizeof (short) ≤sizeof (int) ≤sizeof (long), as to whether the int is 16-bit or 32-bit, Depends on the platform and language implementation (compiler). < span= "" > in 32-bit environments such as VC + + (x86), int and long represent 32-bit signed integers with the same range.

//////////////////////////

code example:

Static Get_utili (constchar*p) {int Util;...while (isspace ((int) *p))//Skip space ++p;util= (int) *p++; ...}

Phenomena & Consequences:

When the incoming parameter P points to content such as 0x9a, 0XAB and so on (the highest bit is 1), the resulting int variable util value will be an error, because char expands the symbol so that the 0x9a (154 in decimal) becomes-102. Causes a data processing error when the program is running.

Bug Analysis:

The char symbol extension is compiler-dependent, but on the x86 platform, Char is always symbol-extended for any major compilation platform. When the above code assigns the char type *p to the int variable util, the conversion from char to unsigned char is required to avoid symbolic expansion according to the highest bit of char.

The symbol extension process for the above error code is as follows:

Because the short data type to be expanded is the signed number-char x=10011100b (that is, 0x9a)

Thus, when int y= (int) x is a symbol extension, that is, the sign bit of the short data type fills the high byte bit of the Long data type (the part that is more than the short data type), the value of Y is 11111111 10011100b (which becomes the decimal-102);

However, the short data type that will be expanded becomes an unsigned number after--unsigned char x=10011100b (that is, 0x9a)

when int y= (int) x is expanded by 0, that is, the high-byte bits of the Long data type are populated with zero, the value of y should be 00000000 10011100b (154 in decimal).

Correct code:

util= (int) *p++; change to util= (int) (unsigned char) *p++

Bug targeting:

The bug was discovered during code review.

The problem with the char symbol extension is difficult to find if the corresponding case is not constructed during the test. In the face of such problems, careful code review is essential, whether through code review directly discover the problem or through the review to enrich the corresponding case structure, code review should be an indispensable link.

About symbol Extensions

One, the short data type expands to the Long data type

1. The short data type to be expanded is the number of signed

Symbol extension, that is, the symbol bit of the short data type is populated with high-byte bits of the Long data type (that is, the part that is more than the short data type), and the size of the expanded value is guaranteed to be unchanged

such as 1:char x=10001001b; Short y=x; Then the value of y should be 11111111 10001001b;

2:char x=00001001b; Short y=x; Then the value of y should be 00000000 00001001b;

2. The short data type to be expanded is the unsigned number

0 expansion with zero to fill high byte bits of Long data types

such as 1:unsigned char x=10001001b; Short y=x; Then the value of y should be 00000000 10001001b;

2:unsigned Char x=00001001b; Short y=x; Then the value of y should be 00000000 00001001b;

Two, long data types reduced to short data types

If the high byte of the Long data type is all 1 or all 0, the low byte is directly truncated to the short data type, and if the high byte of the Long data type is not all 1 or not all 0, then the transfer error occurs.

Three, the same length of the data type of the number of symbols and the number of unsigned to each other conversion

Directly assigns the data in memory to the type to be converted, and the value size changes. When a short type is extended to a long type, but the short type and the long type are of the signed number and the unsigned number, then the extension of the type is preceded by the rule one, and then the value of the memory is assigned to the other person directly by this rule.

Attached: conversion of signed numbers

From	To	Method
Char	Short	Symbol bit extension
Char	Long	Symbol bit extension
Char	unsigned char	The highest bit loses the symbolic bit meaning and becomes the data bit
Char	unsigned short	The sign bit expands to short, then goes from short to unsigned short
Char	unsigned long	Symbol bit extended to long; Then go from long to unsigned long
Char	Float	Symbol bit extended to long; Then go from long to float
Char	Double	Symbol bit extended to long; Then go from long to double
Char	Long double	Symbol bit extended to long; Then go from long to long double
Short	Char	Keep Low byte
Short	Long	Symbol bit extension
Short	unsigned char	Keep Low byte
Short	unsigned short	The highest bit loses the symbolic bit meaning and becomes the data bit
Short	unsigned long	Symbol bit extended to long; Then go from long to unsigned double
Short	Float	Symbol bit extended to long; Then go from long to float
Short	Double	Symbol bit extended to long; Then go from long to double
Short	Long double	Symbol bit extended to long; Then go from long to double
Long	Char	Keep Low byte
Long	Short	Keep Low byte
Long	unsigned char	Keep Low byte
Long	unsigned short	Keep Low byte
Long	unsigned long	The highest bit loses the symbolic bit meaning and becomes the data bit
Long	Float	expressed using a single-precision floating-point number. Possible loss of precision.
Long	Double	expressed using double-precision floating-point numbers. Possible loss of precision.
Long	Long double	expressed using double-precision floating-point numbers. Possible loss of precision.

The conversion of unsigned numbers

From	To	Method
unsigned char	Char	Highest bit as symbol bit
unsigned char	Short	0 extensions
unsigned char	Long	0 extensions
unsigned char	unsigned short	0 extensions
unsigned char	unsigned long	0 extensions
unsigned char	Float	Convert to Long; Convert from long to float
unsigned char	Double	Convert to Long; Convert from long to double
unsigned char	Long double	Convert to Long; Convert from long to double
unsigned short	Char	Keep Low byte
unsigned short	Short	Highest bit as symbol bit
unsigned short	Long	0 extensions
unsigned short	unsigned char	Keep Low byte
unsigned short	unsigned long	0 extensions
unsigned short	Float	Convert to Long; Convert from long to float
unsigned short	Double	Convert to Long; Convert from long to double
unsigned short	Long double	Convert to Long; Convert from long to double
unsigned long	Char	Keep Low byte
unsigned long	Short	Keep Low byte
unsigned long	Long	Highest bit as symbol bit
unsigned long	unsigned char	Keep Low byte
unsigned long	unsigned short	Keep Low byte
unsigned long	Float	Convert to Long; Convert from long to float
unsigned long	Double	Convert directly to Double
unsigned long	Long double	Convert to Long; Convert from long to double

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

Symbol extension, 0 expansion, and reduction

Several modern high-level programming languages allow programmers to use expressions that contain integer objects of different sizes. So what happens when the two operands of an expression differ in size? Some languages will error, while other languages will automatically convert the operands to a uniform format. This conversion comes at a cost, so if you don't want the compiler to automatically add a variety of conversions to your very perfect code without your knowledge, you need to know how the compiler handles these expressions.

In the complement system, the representation of the same negative number in different size representations is different. You cannot arbitrarily use a 8-bit signed number in an expression that contains 16-digit numbers, and the conversion is required. This conversion, along with its inverse operation (converting a 16-bit number to 8-bit), is the symbol extension (sign extension) and the reduce (contraction) operation.

64 For example, its 8-bit twos complement representation is $c0, while the equivalent 16-bit twos complement representation is $ffc0. Obviously, its bit pattern is different. Take a look at the number +64, its 8-bit and 16-bit representation is the $ $ and $0040 respectively. One obvious fact is that the size of the extended negative number is completely different from the size of the extended non-negative number.

The number of digits from a single digit symbol to a larger number is simple, just to copy the symbol bit to the new high-end of the newly created format, for example, in order to extend a 8-bit symbol to 16-bit, you simply copy the 7th bit of the 8-bit number to the 8th of the 16-digit number. 15-bit. Instead of extending a 16-digit symbol to a double word, you only need to copy the 15th bit to the 16th of the double word. 31-bit.

You must use a symbol extension when you have a different length and number of symbols. For example, when you add a byte amount to a word volume, you must extend the byte-volume symbol to 16 bits before adding it. Other operations may also require symbols to be extended to 32 bits.

Table 2-5 Example of symbol extension

8-bit	16-bit	32-bit	Twos complement means
$FF,	$FFFF _ff80	11_1111_1111_1111_1111_1111_1000_0000
$28	$0028	$0000_0028	00_0000_0000_00 00_0000_0000_0010_1000
$9a	$FF 9 a	$FFFF _ff9a	11_1111_1111_1111_1111_1111_1001_1010
$7f	$007f	$0000_007f	00_0000_0000_0000_0000_0000_0111_ 1111
N/a	$1020	$0000_1020	00_0000_000 0_0000_0001_0000_0010_0000
N/a	$8086	$FFFF _8086	11_1111_1111_1111_1000_0000_1000_0110

When dealing with unsigned binary numbers, you can use the 0 extension (zero extension) to extend the unsigned number of small digits to the unsigned number of large digits. The 0 extension is very simple-just need to fill the high-end byte of the large-number operand with zero. For example, in order to extend the 8-digit $82 0 to 16-bit, you only need to insert zeros in the high-end byte, that is, get $0082.

Table 2-60 extension examples

8-bit	16-bit	32-bit	Twos complement means
$0080	$0000_0080	00_0000_0000_0000_0000_0000_1000_0000
$28	$0028	$0000_0028	00_0000_0000_00 00_0000_0000_0010_1000
$9a	$009a	$0000_009a	00_0000_0000_0000_0000_0000_1001_1010
$7f	$007f	$0000_007f	00_0000_0000_0000_0000_0000_0111_ 1111
N/a	$1020	$0000_1020	00_0000_000 0_0000_0001_0000_0010_0000
N/a	$8086	$0000_8086	00_0000_0000_0000_1000_0000_1000_0110

Most high-level language compilers automatically handle symbol extensions and 0 extensions, and the following C-language examples illustrate how they work:

Signed Char SByte; The character type in the C language is a byte

short int sword; The short integer in C is typically 16-bit

long int sdword; Long integers in c are typically 32-bit

. . .

Sword = sbyte; Automatically expands 8-bit value symbols to 16-bit

Sdword = sbyte; Automatically expands 8-bit value symbols to 32-bit

Sdword = Sword; Automatically expands 16-bit value symbols to 32-bit

Languages (such as ADA) require an explicit conversion (explicit cast) when converting from a small data type to a large data type. Check the reference manual for the language you are using to see if this explicit conversion is necessary. The advantage of a language that requires explicit conversions is that the compiler never does anything without the programmer's knowledge. If you do not provide the necessary conversions, the compiler will give you a diagnostic message to let you know that the program needs to be improved.

Symbol extensions and 0 extensions, one thing that needs to be clear is that they have to pay a price. Assigning a small integer to a large integer may require more machine instructions (longer execution time) than transferring data between integer variables of the same size. Therefore, use caution in mixing variables of different sizes in a mathematical expression or an assignment statement.

Symbol reduction, it is troublesome to convert a certain number of digits to the same value, but with smaller numbers. Symbol extensions never fail, and with symbolic extensions, a M-bit signed number can always be converted to an n-bit number (here n>m). Unfortunately, in the case of M, an n-bit number is not always converted to M-digits. For example, the 16-bit hexadecimal representation of 448 is $fe40, and the size of this number is too large for 8 bits, and we cannot reduce its symbol to 8 bits.

To correctly sign a value, you must check the high-end bytes that need to be discarded. First, these high-end bytes must be all 0 or $ff, and if they contain other values, we cannot sign the number down. Second, the highest bit of the final result must be consistent with all the bits that are discarded. Here are some examples of converting from 16-bit numbers to 8-digit numbers:

$FF 80 (11_1111_1000_0000) can be reduced to $ (00_0000) by the symbol.

$0040 (00_0000_0100_0000) can be reduced to $ (00_0000) by the symbol.

$FE 40 (11_1110_0100_0000) cannot be reduced to 8-bit by symbol

$0100 (00_0001_0000_0000) cannot be reduced to 8-bit by symbol

It is difficult to use reduction in a language, some languages, such as C, will directly store the low-end part of the expression in smaller variables and discard the high-end part (in the best case, the C compiler may give a warning in the compilation process indicating a possible loss of precision). You can take steps to get the compiler to stop complaining, but it still doesn't check the validity of the numbers. The following are typical codes for symbol reduction in C:

Signed Char SByte; The character type in the C language is a byte

short int sword; The short integer in C is typically 16-bit

long int sdword; Long integers in c are typically 32-bit

. . .

SByte = (signed char) sword;

SByte = (signed char) Sdword;

Sword = (short int) Sdword;

Language, the only safe solution is to compare the result value to an upper-and-lower boundary value before storing the result value of the expression in a small variable. Unfortunately, if you need to do this often, the code becomes unwieldy. The following are the conversion codes that are added after these checks:

if (Sword >= && Sword <= 127)

{

SByte = (signed char) sword;

}

Else

{

Reporting Errors

}

Another scenario, using assertions:

Assert (Sword >= && Sword <= 127)

SByte = (signed char) sword;

ASSERT (Sdword >= 32768 && sdword <= 32767)

Sword = (short int) Sdword;

Easy to see, which makes the code ugly. You might prefer to write them as a macro (#define) or as a function in C + + to improve the readability of your code.

Some high-level languages, such as Pascal and Delphi/kylix, are automatically symbol-reduced, and the results are checked to make sure it works for target Operation 4. These languages produce some type of exception (or stop the program from running) when a cross-border violation occurs. Of course, if you want to add the error correction code, you either need to write some exception handling code or use the IF statement sequence used in the previous C language example.

Reference:

Http://testing.etao.com/experience_list/66

http://apps.hi.baidu.com/share/detail/40431986

Http://blog.sina.com.cn/s/blog_6adcb3530101cmsd.html

The symbol bit extension after char becomes int type

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The symbol bit extension after char becomes int type

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The symbol bit extension after char becomes int type

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support