Precautions for porting 32-bit code to a 64-bit Platform

Source: Internet
Author: User
Tags truncated

Reading Tips: with the advent of a low-cost 64-bit platform, coupled with the falling price of memory and hard disk, it is undoubtedly a great effort for 32-bit programs to port 64-bit hardware, those scientific operations, databases, and programs that consume a large amount of memory or intensive floating-point operations are also taking advantage of this ride. This article mainly discusses some minor issues that should be paid attention to when porting existing 32-bit code to the 64-bit platform.

The latest 64-bit platform is Binary compatible with 32-bit applications, which means that existing programs can be easily transplanted. Many programs that currently run well on 32-bit platforms may not need to be transplanted unless the program has the following requirements:
· More than 4 GB memory is required.
· The file size is usually larger than 2 GB.
· Intensive floating point operations require the advantage of a 64-bit architecture.
· Benefit from the optimized mathematical library of the 64-bit platform.
Otherwise, it is enough to simply recompile it. Most well-written programs can be transplanted to a 64-bit platform without any effort. Assuming that your program is well written and familiar with the issues to be discussed in this article.
Ilp32 and lp64 Data Models
The 32-bit environment involves the "ilp32" data model because the C data type is 32-bit int, long, and pointer. The 64-bit environment uses different data models. At this time, the long and pointer are already 64-bit, which is called the "lp64" data model.
Currently, all 64-bit UNIX platforms use the lp64 data model, while 64-bit Windows uses the llp64 data model. Except for the 64-bit pointer, the basic types are not changed. We will discuss how to port ilp32 to lp64 here. Table 1 shows the differences between the ilp32 and lp64 data models.
When porting code to 64-bit, we can conclude a simple rule: never think that the length of int, long, and pointer is the same. Any code that violates this rule may encounter different problems when running in the lp64 data model, and it is difficult to find out the cause. In example 1, there are many violations of this rule, which need to be rewritten when transplanted to a 64-bit platform.
Example 1:

1 int * myfunc (int I)
2 {
3 return (& I );
4}
5
6 int main (void)
7 {
8 int Myint;
9 long mylong;
10 int * myptr;
11
12 char * name = (char *) getlogin ();
13
14 printf ("enter a Number % s:", name );
15 (void) scanf ("% d", & mylong );
16 Myint = mylong;
17 myptr = myfunc (mylong );
18 printf ("mylong: % d pointer: % x \ n", mylong, myptr );
19 Myint = (INT) mylong;
20 Exit (0 );
21
22}

The first step is to require the compiler to capture issues during porting. The options may vary depending on the compiler used, but for the ibm xl compiler series, available options include-qwarn64-qinfo = pro. To obtain a 64-bit executable file, you can use option-q64 (if GCC is used, the option should be-M64, other available GCC options are listed in table 2 ). Figure 1 shows how to compile the code in Example 1.


Code compiling in Example 1

The prototype truncation is missing.
If a function is called without a function prototype specified, the returned value is a 32-bit Int. If you do not use the prototype code, unexpected data truncation may occur, resulting in a segmentation error. The compiler caught the error of line 12th in Example 1.
Char * name = (char *) getlogin ();
The compiler assumes that the function returns an int value and truncates the result pointer. This line of code works normally in the ilp32 data model, because the int and pointer are of the same length at this time, changing to the lp64 model is not necessarily correct, and even the type conversion cannot avoid this error, because getlogin () has been truncated after the return.
To fix this problem, you need to include the header file <unistd. h> with the function prototype of getlogin.
Format specified character
If a 32-bit long or pointer is specified, a program error occurs. The compiler caught the error of line 15th in Example 1.
(Void) scanf ("% d", & mylong );
Note that scanf inserts a 32-bit value into the variable mylong, and the remaining 4 bytes are ignored. To fix this problem, use the % LD character in scanf.
Row 18th also demonstrates a similar problem in printf:
Printf ("mylong: % d pointer: % x \ n", mylong, myptr );
To correct the error here, mylong should use % LD and % P instead of % x for myptr.
Value assignment Truncation
An example of a value truncation discovered by the compiler is in row 16th:
Myint = mylong;
This will not cause any problems in the ilp32 model, because int and long are both 32-bit, while in lp64, when mylong is assigned to Myint, if the value is greater than the maximum value of a 32-bit integer, the value is truncated.
Truncated Parameter
The next error found by the compiler is in row 17th. Although the myfunc function only accepts one int parameter, a long parameter is used during the call, and the parameter is quietly truncated during transmission.
Conversion Truncation
Conversion truncation occurs when Long is converted to int, for example, row 19th in Example 1:

Myint = (INT) mylong;

The reason for conversion truncation is that int and long are not of the same length. The conversion of these types usually occurs in the Code as follows:

Int length = (INT) strlen (STR );

Strlen returns size_t (which is unsigned long in lp64). When assigned to an int, truncation is inevitable. Generally, truncation occurs only when the STR length is greater than 2 GB. Even so, we should try to use the appropriate polymorphism types (such as size_t and uintptr_t), instead of worrying about what the underlying base type is.
Some other minor issues
The compiler can capture porting issues, but it cannot always count on the compiler to identify all the errors for you.
Constants expressed in hexadecimal or binary are usually 32 bits. For example, the unsigned 32-bit constant 0xffffffff is usually used to test whether it is-1:

# Define invalid_pointer_value 0 xffffffff

However, in 64-bit systems, this value is not-1, but 4294967295. In 64-bit systems, the correct value of-1 should be 0 xffffffffffffffff. To avoid this problem, when declaring a constant, use const with signed or unsigned.

Const signed int invalid_pointer_value = 0 xffffffff;

This line of code will run normally on both 32-bit and 64-bit systems.
Other issues related to constant hard encoding are based on improper understanding of the ilp32 data model, as shown below:

Int ** P; P = (INT **) malloc (4 * no_elements );

This line of code assumes that the pointer length is 4 bytes, which is incorrect in lp64 and is 8 bytes at this time. The correct method should use sizeof ():

Int ** P; P = (INT **) malloc (sizeof (* P) * no_elements );

Note the incorrect usage of sizeof (), for example:

Sizeof (INT) = sizeof (int *);

This is incorrect in lp64.
Symbol Extension
Avoid arithmetic operations on the number of signed and unsigned numbers. When we compare the int value with the long value, the data generated at this time is different in lp64 and ilp32. Because it is a symbol-bit extension, it is difficult to find this problem. This problem can be fundamentally prevented only when the operands at both ends are signed or unsigned.
Example 2:

Long K;
Int I =-2;
Unsigned Int J = 1;
K = I + J;
Printf ("Answer: % LD \ n", k );

You cannot expect the answer in example 2 to be-1. However, when you compile this program in lp64, the answer will be 4294967295. The reason is that the expression (I + J) is an unsigned int expression, but when it is assigned to K, the symbol bit is not extended. To solve this problem, the operands at both ends can be either signed or unsigned. As shown below:

K = I + (INT) j

Union)
If the Union contains data types of different lengths, this may cause problems. For example, Example 3 is a common open-source package, which can be run in ilp32 but not in lp64. The Code assumes that the unsigned short array with a length of 2 occupies the same space as long, but this is incorrect on the lp64 platform.
Example 3:

Typedef struct {
Unsigned short BOM;
Unsigned short CNT;
Union {
Unsigned long bytes;
Unsigned short Len [2];
} Size;
} _ Ucheader_t;

To run on lp64, replace unsigned long in the Code with unsigned Int. Check the consortium carefully in all code to make sure that all data members are of the same length in lp64.
Endian)
Due to the difference in the 64-bit platform, the 32-bit program may fail to be transplanted because of the difference in the byte sequence on the machine. CISC chips such as Intel and ibm pc use little-Endian, while those such as Apple use big-Endian and little-Endian) it usually hides the truncation Bug During the porting process.
Example 4:

Long K;
Int * PTR;
Int main (void)
{
K = 2;
PTR = & K;
Printf ("K has the value % lD, value pointed to by PTR is % LD \ n", K, * PTR );
Return 0;
}

Example 4 is an obvious example of this problem. A declaration points to the int pointer, but inadvertently points to the long. On ilp32, this code prints 2 because the length of int is the same as that of long. But on lp64, the pointer is truncated because the length of int and long is different. In any case, in the system with a small tail byte order, the Code still gives K's correct answer 2, but in the big-Endian system, K's value is 0.


Find out available GCC disdain options for 64-bit porting Problems

Table 3 illustrates why different answers are produced due to Truncation in different bytecode systems. In the small-tail byte order, all the truncated high-end addresses are 0, so the answer is still 2. In the large-tail byte order, the truncated high-end addresses contain the value 2, in this way, the result is 0, so in both cases, truncation is a bug. However, you must be aware that the truncation error of a small value is hidden in the small-tail byte sequence. This error can only be detected when it is transplanted to the large-tail byte sequence system.

Performance reduction after porting to a 64-bit Platform
After the code is transplanted to the 64-bit platform, we may find that the performance is actually reduced. The cause is related to the pointer length and data size in lp64, and the resulting problems such as reduced cache hit rate, data structure expansion, and data alignment.
In the 64-bit environment, the pointer occupies a larger byte, causing cache problems of 32-bit codes that run well to varying degrees. The specific manifestation is reduced execution efficiency. You can use a tool to analyze changes in the cache hit rate to check whether the performance is reduced.
After the data is migrated to lp64, the size of the data structure may change. In this case, the program may need more memory and disk space. For example, the structure in Figure 2 only needs 16 bytes in ilp32, but 32 bytes in lp64, an increase of 100%. This is because long is 64-bit at this time, and the compiler adds additional data to align.
By changing the order of data in the structure, we can minimize the impact of this problem and reduce the storage space required. If we put two 32-bit int values together, the storage space will be reduced because the data is not filled. Now, the entire storage structure only needs 24 bytes.
Before you rearrange the data structure, you must carefully measure the data usage frequency to avoid performance loss due to reduced cache hit rate.
How to generate 64-bit code
In some cases, 32-bit and 64-bit programs are difficult to distinguish between source code-level interfaces. Many header files use test macros to differentiate them. Unfortunately, these specific macros depend on specific platforms, specific compilers, or specific compiler versions. For example, GCC 3.4 or later versions DEFINE _ lp64 __to generate 64-bit code for all 64-bit platforms through Option-M64 compilation. However, GCC versions earlier than 3.4 are specific to the platform and operating system.
Maybe your compiler uses macros different from _ lp64 _. For example, when the ibm xl compiler uses-q64 to compile a program, it uses the _ 64bit _ macro, other platforms use _ lp64, which can be tested with _ wordsize. Please refer to the relevant compiler documentation to find the most suitable macro. Example 5 is applicable to multiple platforms and compilers:
Example 5:

# If defined (_ lp64 _) | defined (_ 64bit _) | defined (_ lp64) | (_ wordsize = 64)
Printf ("I am lp64 \ n ");
# Else
Printf ("I am ilp32 \ n ");
# Endif

Shared data
A typical problem when porting data to a 64-bit platform is how to read and share data between 32-bit and 64-bit programs. For example, a 32-bit program may store struct objects as binary files on disks. Now you need to read these files in 64-bit code, the difference in the structure size in the lp64 environment may cause problems.
For new programs that must run on both 32-bit and 64-bit platforms, we recommend that you do not use data types (such as long) that may change the length due to lp64 and ilp32 ), you can use the header file <inttypes. h> the fixed-width integer in. In this way, data can be shared at the binary level of 32-bit and 64-bit, regardless of the file or network.
Example 6:

# Include <stdio. h>
# Include <inttypes. h>
Struct on_disk
{
/* Ilp32 | when lp64 is shared, int32_t */
Long Foo;
};
Int main ()
{
File * file;
Struct on_disk data;
# Ifdef write
File = fopen ("test", "W ");
Data. Foo = 65535;
Fwrite (& Data, sizeof (struct on_disk), 1, file );
# Else
File = fopen ("test", "R ");
Fread (& Data, sizeof (struct on_disk), 1, file );
Printf ("data: % LD \ n", Data. Foo );
# Endif
Fclose (File );
}

Let's take a look at Example 6. Ideally, this program runs properly on both the 32-bit and 64-bit platforms and can read the data of the other party. But it does not actually work, because the length of long varies in ilp32 and lp64. The Foo variable in the on_disk structure should be declared as int32_t. This fixed width type can ensure that data of the same size is generated in the current ilp32 or migrated lp64 data model.
Hybrid Fortran and C
Many scientific computing programs call the Fortran function from C/C ++. In itself, Fortran does not have the problem of porting to a 64-bit platform, because the Fortran data type has a specific bit size. However, if the Fortran and C languages are mixed, the problem arises as follows: in Example 7, the C language program calls the subroutine of the Fortran language in Example 8.
Example 7:

Void Foo (long * l );
Main ()
{
Long L = 5000;
Foo (& L );
}

Example 8:

Subroutine Foo (I)
Integer I
Write (*, *) 'In Fortran'
Write (*, *) I
Return
End subroutine foo

Example 9:

% Gcc-M64-C cfoo. c
%/Opt/absoft/bin/f90-M64 cfoo. O Foo. f90-O out
%./Out
In FORTRAN
0

After the two files are linked, the program prints the variable I value as "5000 ". In lp64, the program prints "0", because in lp64 mode, the child routine Foo transmits a 64-bit parameter through the address, but in fact, the Fortran subroutine requires a 32-bit parameter. If you want to correct this error, declare it as integer * 8 when declaring the Fortran subroutine variable I. At this time, it is the same length as long in C.
Conclusion
The 64-bit platform is a hope to solve large-scale complex scientific and commercial problems. Most well-written programs can be easily transplanted to the new platform. However, pay attention to the differences between the ilp32 and lp64 data models, to ensure a smooth migration process. From: http://www.51cto.com/art/200604/24942.htm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.