Port a Linux application to a 64-bit System
Tips and techniques for smooth migration
Linux is one of the cross-platform operating systems that can use 64-bit processors. Currently, 64-bit systems are very common on servers and desktops. Many developers are now facing the need to migrate their applications from a 32-bit environment to a 64-bit environment. With Intel®Itanium®With the introduction of other 64-bit processors, it is increasingly important to prepare the software for the 64-bit environment.
And UNIX®Like other UNIX-like operating systems, Linux uses the LP64 standard, where the pointer and long integer are both 64-bit, while the common integer is still 32-bit. Although some advanced languages are not affected by different types, other languages (such as C) are indeed affected.
Porting an application from a 32-bit system to a 64-bit system can be very simple or difficult, depending on how these applications are written and maintained. Many trivial problems may cause problems, even in a well-written, highly portable application. Therefore, this article will summarize these problems, some suggestions are provided to solve these problems.
Advantages of 64-bit
32-bit platforms have many limitations that are hindering the development of large application programs (such as databases), especially for developers who want to take full advantage of computer hardware. Scientific Computing usually depends on floating point computing, while some applications (such as financial computing) require a narrow range of numbers, but require higher accuracy, its accuracy is higher than that provided by floating point numbers. The 64-bit mathematical operation provides more precise fixed-point mathematical computation, and provides sufficient Numerical range. In the computer industry, there are many discussions about the address space expressed by 32-bit address space. 32-bit pointers can only address 4 GB of virtual address space. We can overcome this restriction, but application development becomes very complicated and its performance will be significantly reduced.
In terms of language implementation, the current C language standard requires that the "long" data type should be at least 64-bit. However, its implementation may define it as larger.
Another thing to improve is the date. In Linux, the date is represented by a 32-bit integer, which indicates the number of seconds since January 1, January 1, 1970. This will expire in 2038. However, in a 64-bit system, the date is represented by a signed 64-bit integer, which can greatly expand its available range.
In short, 64-bit has the following advantages:
64-bit applications can directly access the 4 eb virtual memory. The Intel Itanium processor provides continuous linear address space.
64-bit Linux allows a maximum file size of 4 EB (63 power of 2). One of its important advantages is that it can process access to large databases.
Back to Top
Linux 64-bit architecture
Unfortunately, the C programming language does not provide a mechanism to add new basic data types. Therefore, to provide 64-bit addressing and integer computing capabilities, you must modify the binding or ing of existing data types or add new data types to the C language.
Table 1. 32-bit and 64-Bit Data Models
ILP32 LP64 LLP64 ILP64
Char 8 8 8 8
Short 16 16 16 16
Int 32 32 32 64
Long 32 64 32 64
Long 64 64 64 64
Pointer 32 64 64 64
The difference between the three 64-bit models (LP64, LLP64, and ILP64) lies in the non-floating point data type. When the width of one or more C data types is transformed from one model to another, the application may be affected in many ways. These effects can be divided into two types:
The size of the data object. The compiler alignment data types based on natural boundaries. In other words, 32-bit data types must be Alignment Based on 32-bit boundaries on 64-bit systems, the 64-bit data type must be aligned on the 64-bit system according to the 64-bit boundary. This means that the size of data objects such as structures or conjoins is different in 32-bit and 64-bit systems.
The size of the basic data type. Generally, the assumption about the relationship between basic data types is invalid in the 64-bit data model. Applications dependent on these relationships may fail to compile on the 64-bit platform. For example, the assumption that sizeof (int) = sizeof (long) = sizeof (pointer) is valid for the ILP32 data model, but is ineffective for other data models.
In short, the compiler needs to align the data type according to the natural boundary, which means that the compiler will "fill" and thus force the alignment in this way, it is like what is done in the C structure and in the combination. The structure or union members are aligned based on the widest member. Listing 1 explains this structure.
Listing 1. C Structure
Struct test {
Int i1;
Double d;
Int i2;
Long l;
}
Table 2 shows the size of each member in this structure and the size of this structure on 32-bit and 64-bit systems.
Table 2. Size of structure and structure members
The size of a 32-bit structure member on a 64-bit System
Struct test {
Int i1; 32-bit 32-bit
32-bit Filling
Double d; 64-bit 64-bit
Int i2; 32-bit 32-bit
32-bit Filling
Long l; 32-bit 64-bit
}; The structure size is 20 bytes. The structure size is 32 bytes.
Note: In a 32-bit system, the compiler may not align variable d, although it is a 64-bit object, this is because the hardware treats it as two 32-bit objects. However, the 64-bit system will align both d and l, which will add two 4-byte padding.
Back to Top
Porting from a 32-bit system to a 64-bit System
This section describes how to solve some common problems:
Statement
Expression
Assignment
Numeric constant
Endianism
Type Definition
Displacement
String formatting
Function Parameters
Statement
To make your code work on both 32-bit and 64-bit systems, pay attention to the following statements:
Use "L" or "U" as needed to declare an integer constant.
Make sure that you use unsigned integers to prevent symbol extension problems.
If some variables must be 32-bit on both platforms, define the type as int.
If some variables are 32-bit in a 32-bit system and 64-bit in a 64-bit system, define the type as long.
To align and performance, declare the numeric variable as int or long. Do not try to use the char or short type to save bytes.
Declare the character pointer and byte as unsigned, which can prevent the problem of 8-character symbol extension.
Expression
In C/C ++, expressions are based on the combination law, operator priority, and a set of mathematical calculation rules. To make the expression work correctly on both 32-bit and 64-bit systems, pay attention to the following rules:
The result of adding two signed integers is a signed integer.
The two numbers of the int and long types are added, and the result is a number of the long type.
If one operand is an unsigned integer and the other operand is a signed integer, the expression returns an unsigned integer.
The int and doubule types are added, and the result is a double number. Here, the number of int type is converted to double type before the addition operation.
Assignment
Since the pointer, int, and long are no longer the same size on a 64-bit system, problems may occur depending on how these variables are assigned and used in applications. The following are some tips for assigning values:
Do not use the int and long types, because this may lead to the truncation of high numbers. For example, do not do the following:
Int I;
Long l;
I = l;
Do not use the int type to store pointers. The following example works well on a 32-bit system, but fails on a 64-bit system because a 32-bit integer cannot store a 64-bit pointer. For example, do not do the following:
Unsigned int I, * ptr;
I = (unsigned) ptr;
Do not use pointers to store int-type values. For example, do not do the following;
Int * ptr;
Int I;
Ptr = (int *) I;
If the expression uses a mix of unsigned and signed 32-bit integers and assigns them to a signed long type, convert one of the operands to a 64-bit type. This causes other operands to be converted to 64-bit types, so that the conversion is unnecessary when the expression is assigned a value. Another solution is to convert the entire expression so that symbol extension can be performed when values are assigned. For example, consider the following usage problems:
Long n;
Int I =-2;
Unsigned k = 1;
N = I + k;
In terms of mathematical calculations, the result of the expression displayed in the above Hei should be-1. However, because the expression is unsigned, symbol extension is not performed. The solution is to convert an operand to a 64-bit type (the first line below is like this), or convert the entire expression (the second line below ):
N = (long) I + k;
N = (int) (I + k );
Numeric constant
Hexadecimal constants are usually used as masks or special bit values. If a hexadecimal constant without a suffix is 32-bit and its high position is set, it can be defined as an unsigned integer.
For example, the constant OxFFFFFFFFL is a signed long type. In a 32-bit system, all bits are set to one, but in a 64-bit system, only the low 32 bits are set. The result is 0x00000000ffffff.
If we want all the bits to be set, a portable method is to define a signed constant with a value of-1. This will set all the bits as it uses the binary Complement Algorithm.
Long x =-1L;
Another possible problem is the setting of the highest bit. In a 32-bit system, the constant 0x80000000 is used. However, a better portability method is to use a displacement expression:
1L <(sizeof (long) * 8)-1 );
Endianism
Endianism refers to the method used to store data. It defines how to address bytes in integer and floating-point data types.
Little-endian stores low-level bytes in the low address of the memory and high-level bytes in the high address of the memory.
Big-endian stores high bytes in the low address of the memory, and stores low bytes in the high address of the memory.
Table 3 provides a 64-bit long integer layout example.
Table 3. 64-bit long int Layout
Low address and high address
Little endian Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 Byte 7
Big endian Byte 7 Byte 6 Byte 5 Byte 4 Byte 3 Byte 2 Byte 1 Byte 0
For example, the 32-Bit 0x12345678 layout on the big endian machine is as follows:
Table 4. 0x12345678 layout on the big-endian System
Memory offset 0 1 2 3
Memory content 0x12 0x34 0x56 0x78
If we treat 0x12345678 as two half words, which are 0x1234 and 0x5678, we can see the following situation on the big endian machine:
Table 5. 0x12345678 views the situation on the big-endian system as two halves
Memory offset 0 2
Memory content 0x1234 0x5678
However, on the little endian machine, the la s of the word 0x12345678 are as follows:
Table 6. 0x12345678 layout on the little-endian System
Memory offset 0 1 2 3
Memory content 0x78 0x56 0x34 0x12
Similarly, the two half characters 0x1234 and 0x5678 are as follows:
Table 7. 0x12345678 is displayed as two halves on the little-endian system.
Memory offset 0 2
Memory content 0x3412 0x7856
The following example illustrates the differences between the byte sequence on the big endian and little endian machines.
The following C program will print "big endian" when compiling and running on a Big endian machine, and "little endian" will be printed when compiling and running on a Little endian machine ".
Listing 2. big endian and little endian
# Include <stdio. h>
Main (){
Int I = 0x12345678;
If (* (char *) & I = 0x12)
Printf ("Big endian ");
Else if (* (char *) & I = 0x78)
Printf ("Little endian ");
}
Endianism is important in the following situations:
When bit mask is used
Indirect pointer address of an object
In C and C ++, there are bitwise domains to help with the endian problem. I recommend that you use a bitfield instead of a mask field or a hexadecimal constant. Several functions can be used to convert 16-bit and 32-bit data from "host byte sequence" to "Network byte sequence ". For example, htonl (3) and ntohl (3) are used to convert 32-bit integers. Similarly, htons (3) and ntohs (3) are used to convert 16-bit integers. However, there is no standard function set for 64-bit integers. But on the big endian and little endian systems, Linux provides the following macros:
Bswap_16
Bswap_32
Bswap_64
Type Definition
We recommend that you do not use the data types in C/C ++ that change the size on 64-bit systems to write applications, instead, some type definitions or macros are used to explicitly describe the size and type of the data contained in the variable. Some definitions can make the code more portable.
Ptrdiff_t:
This is a signed integer. It is the result of the subtraction of two pointers.
Size_t:
This is an unsigned integer and the result of the sizeof operation. This is used when passing parameters to some functions (such as malloc (3). It can also be returned from some functions (such as fred (2.
Int32_t and uint32_t:
Defines an integer with a predefined width.
Intptr_t and uintptr_t:
Define the integer type. Any valid pointer can be converted to this type.
Example 1:
In the following statement, when the bufferSize value is assigned, the 64-bit value returned from sizeof is truncated to 32-bit.
Int bufferSize = (int) sizeof (something );
The solution is to use size_t to convert the returned value type and assign it to the bufferSize declared as size_t type, as shown below:
Size_t bufferSize = (size_t) sizeof (something );
Example 2:
In a 32-bit system, the int and long values are the same. Because of this, some developers exchange the two types. This may cause the pointer to be assigned to the int type, or vice versa. However, in a 64-bit system, assigning a pointer to the int type will result in truncation of a 32-bit high value.
The solution is to store pointers as pointer types or special types defined for this purpose, such as intptr_t and uintptr_t.
Displacement
An unsigned integer constant is of the int type. This may cause truncation during displacement.
For example, in the following code, the maximum value of a can be 31. This is because 1 <a is of the int type.
Long t = 1 <;
To perform displacement on a 64-bit system, use 1L, as shown below:
Long t = 1L <;
String formatting
Functions printf (3) and related functions may become the root cause of the problem. For example, in a 32-bit system, % d can be used to print int or long values, but on a 64-bit platform, this will cause the value of the long type to be truncated to a low 32-bit value. For long type variables, the correct usage is % ld.
Similarly, when a small INTEGER (char, short, int) is passed to printf (3), it will be extended to 64-bit, and the symbol will be extended as appropriate. In the following example, printf (3) assumes that the pointer is 32 bits.
Char * ptr = & something;
Printf (% x ", ptr );
The above Code fails on a 64-bit system, and only 4 bytes lower content is displayed.
The solution to this problem is to use % p, as shown below; this works well on both 32-bit and 64-bit systems:
Char * ptr = & something;
Printf (% p ", ptr );
Function Parameters
When passing parameters to a function, remember the following:
When the data type of a parameter is defined by the function prototype, the parameter should be converted to this type according to standard rules.
If the parameter type is not specified, the parameter is converted to a larger type.
In a 64-bit system, an integer is converted to a 64-bit integer value, and a single-precision floating point type is converted to a double-precision floating point type.
If the return value is not specified, the default return value of the function is int type.
A problem occurs when the sum of signed and unsigned integers is passed as the long type. Consider the following:
Listing 3. Passing the sum of signed and unsigned integers as the long type
Long function (long l );
Int main (){
Int I =-2;
Unsigned k = 1U;
Long n = function (I + k );
}
The above Code fails on a 64-bit system because the expression (I + k) is an unsigned 32-bit expression. When converting it to the long type, the symbol is not extended. The solution is to forcibly convert an operand to a 64-bit type.
There is another problem in the register-based system: the system uses registers instead of stacks to pass parameters to functions. Consider the following example:
Float f = 1.25;
Printf ("The hex value of % f is % x", f, f );
In a stack-based system, the corresponding hexadecimal value is printed. However, in a register-based system, the hexadecimal value is read from an integer register rather than from a floating-point register.
The solution is to forcibly convert the address of the floating point variable into a pointer to the integer type, as shown below:
Printf ("The hex value of % f is % x", f, * (int *) & f );
Back to Top
Conclusion
Mainstream hardware vendors have recently expanded their 64-bit products because the 64-bit platform provides better performance, value, and scalability. The 32-bit system restrictions, especially the 4 GB virtual memory ceiling, have greatly stimulated many companies to consider migrating to the 64-bit platform. Understanding how to port an application to a 64-bit architecture can help us write code with better portability and efficiency.
References
Learning
For more information, see the original article on the developerWorks global site.
64-Bit Programming Models: Why LP64? This article introduces more details about various 64-bit programming models and the LP64 debate.
On Wikipedia, learn about the 2038 issue of 32-bit systems.
Read "porting enterprise applications from UNIX to Linux" (developerWorks, February 2005) to learn about tips and tips for porting large multi-threaded applications to Linux.
"Porting Intel applications to 64-bit Linux PowerPC" discusses some issues to consider when Porting Linux from IA32 to PowerPC.
Linux Online (linux.org) Linux distributions site provides a wealth of information about the release, including the release version on a 64-bit system.
DeveloperWorks Linux on Power Architecture developers corner is a reference for programmers and developers running applications on POWER-based Linux.
Penguinppc.org is a community site designed for Linux users on PowerPC systems.
In the developerWorks Linux area, you can find more references for Linux developers.
Stay tuned to developerWorks technical events and network broadcasts.