[C] semantic differences between string-to-value mutual conversion

Source: Internet
Author: User

Conversion between strings and values is a very common function, which is common to everyone. ExceptProgramNo one except the creator of the database will struggle for a day or two like me.

 

C provides a set of functions for mutual conversion between strings and values, including ITOA, atoi, and strtol. To facilitate the description, I abstracted this function into the following two pseudo C functions:

 
StringC_inttostr (IntValue,IntRadix );IntC_strtoint (StringSTR,IntRadix );

 

You can know their purpose based on the function name. Note: This article only discusses integer values. floating point numbers are not in this range.

 

For decimal conversion, the two functions work normally. For non-negative non-decimal conversion, the two functions work normally. There is nothing to say about it. However, the non-decimal conversion of negative numbers is different. Let's take a look at the following calls:

 
StringSTR = c_inttostr (-1,16);

 

After execution, the STR value is "ffffffff ". Well, everything is fine.

 

In my ideal world, c_inttostr and c_strtoint should be reversible, that is, after the following call:

 
IntA =-1;StringSTR = c_inttostr (,16);IntB = c_strtoint (STR,16);

Both A and B should be-1. However, in factCodeThen, the value of B is 2147483647, which is the maximum value of the int type. If you check errno, you will find that its value is erange, which means an overflow occurs.

 

There will be a lot of questions here: Isn't ffffffff-1? Why does it overflow? Why is the returned value 2147483647 instead of-2147483648 ?......

 

Before answering these questions, we need to clarify the fact that we think that ffffffff is equal to-1 from the perspective of computer scientists. We all know that a negative number in a computer is expressed by a complement. The-1 complement form indicates that all bits are 1. For a 32-bit int type, converting to hexadecimal is ffffffff. I wonder if you have noticed that there is no positive or negative number in the discussion of computer internal storage-All values are unsigned, and negative numbers are expressed in a special way. Therefore, we naturally regard ffffff as a storage method inside the computer.

 

From the mathematician's point of view (if he doesn't know the computer), convert-1 in decimal format to hexadecimal format-1, and ffffffff in decimal format 4294967295, -ffffffff is decimal-4294967295. From the perspective of pure mathematics, numeric values are both positive and negative, regardless of the operating system.

 

Therefore, we obtain two types of semantics for mutual conversion between strings and values: Computer semantics and mathematical semantics. Computer semantics holds that, except decimal, other binary strings represent the storage mode of values in the computer. Mathematical semantics holds that all hexadecimal strings represent the value itself. In computer semantics, the reason for separating decimal from non-decimal is that the storage method described by other hexadecimal Methods aims to represent the decimal number, and the positive and negative numbers are in decimal format -- this means that other hexadecimal strings cannot carry a negative number.

 

If c_inttostr and c_strtoint use the same semantics, they are reversible. Unfortunately, they use different semantics: c_inttostr uses computer semantics; c_strtoint uses Mathematical semantics. Therefore, using c_inttostr will never get a non-decimal string with a negative number. When using c_strtoint, if you do not add a negative number before the string, you will never get a negative number.

 

Now you can answer the three questions mentioned above. In terms of computer semantics, it is obvious that c_inttostr (-1, 16) gets "ffffffff. From the mathematical semantics, ffffffff is 4294967295, which is greater than the maximum value of int type 2147483647, so c_strtoint ("ffffffff", 16) will determine overflow. Finally, if an overflow occurs, c_strtoint returns the minimum or maximum value of the int type based on whether the string is signed: "ffffffff" is a positive number, so the maximum value is 2147483647; if it is "-ffffffff", the minimum value-2147483648 is returned.

 

The above analysis is based on MSC, and I don't know how the other C Runtime Library works. I am writing a C ++ function library, including the tostring and fromstring functions, which wrap the corresponding functions of C. At the beginning, the inconsistency between c_inttostr and c_strtoint caused me a lot of confusion. I 've been struggling with this problem for a long time, and now I 've finally figured it out.

 

Finally, if you want to write a set of functions that convert strings and values, you must specify the semantics used. Do not mix them. If you must support two types of semantics (this is unlikely to happen !), The best practice is to provide two sets of such functions, one for computer semantics and the other for mathematical semantics. When using computer semantics, pay attention to the size of the type bytes, because the numeric type is not only int, but also char, short or even long.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.