Original URL: http://www.ibm.com/developerworks/cn/linux/l-port64.html
With the popularity of 64-bit architectures, preparing your Linux® software for 64-bit systems has become more important than ever. In this article, you'll learn how to prevent portability flaws when you're making statement declarations, assignments, displacements, type conversions, string formatting, and more.
0 Reviews:
Harsha S. Adiga, software engineer, IBM
May 18, 2006
Develop and deploy your next application on the IBM Bluemix cloud platform.
Get started with your trial
Linux is one of the cross-platform operating systems that can use 64-bit processors, and now 64-bit systems are already common on both the server and desktop side. Many developers now face the need to migrate their applications from a 32-bit environment to a 64-bit environment. With the introduction of intel®itanium® and other 64-bit processors, it is becoming increasingly important that software is ready for 64-bit environments.
Like UNIX® and other UNIX-like operating systems, Linux uses the LP64 standard, where pointers and long integers are 64-bit, while normal integers remain 32-bit. While some high-level languages are not affected by this type of size, other languages, such as C, do suffer from this effect.
Porting an application from a 32-bit system to a 64-bit system can be very simple or difficult, depending on how these applications are written and maintained. Many trivial issues can cause problems, even in an extremely well-written highly portable application, so this article summarizes these issues and gives some advice on how to solve them.
Advantages of 64-bit
There are many limitations to 32-bit platforms that are hampering the progress of developers of large applications, such as databases, especially for developers who want to take advantage of the benefits of computer hardware. Scientific calculations typically rely on floating-point calculations, while some applications (such as financial calculations) require a narrower range of numbers, but require higher precision than the precision provided by floating-point numbers. The 64-bit math operation provides this higher-precision, fixed-point mathematical calculation, and also provides sufficient range of numbers. There is now a lot of discussion about the address space represented by the 32-bit address space in the computer industry. A 32-bit pointer can address only 4GB of virtual space. We can overcome this limitation, but application development becomes very complex and its performance is significantly reduced.
In terms of language implementation, the current C language standard requires a "long long" data type of at least 64 bits. However, its implementation may define it to be larger.
Another area that needs improvement is the date. In Linux, the date is represented by a 32-bit integer that represents the number of seconds that have elapsed since January 1, 1970. This will expire in 2038. However, in a 64-bit system, the date is represented by a signed 64-bit integer, which can greatly extend its usable range.
In summary, 64-bit has the following advantages:
- The 64-bit application has direct access to 4EB of virtual memory, and the Intel Itanium processor provides continuous linear address space.
- With 64-bit Linux allowing file sizes up to 4 EB (2 of 63 power), one of the important advantages is that you can handle access to large databases.
Back to top of page
Linux 64-bit architecture
Unfortunately, the C programming language does not provide a mechanism to add new basic data types. Therefore, the ability to provide 64-bit addressing and integer arithmetic must modify the binding or mapping of an existing data type, or add a new data type to the C language.
Table 1. 32-bit and 64-bit data models
|
ILP32 |
LP64 |
LLP64 |
ILP64 |
char |
8 |
8 |
8 |
8 |
short |
( |
) |
+ |
+ |
int |
+ |
+ |
+ |
|
long |
+ |
|
+ |
up |
long long |
|
64 |
|
page |
pointer |
+ |
page |
up |
|
The difference between these 3 64-bit models (LP64, LLP64, and ILP64) is the non-floating-point data type. When the width of one or more C data types is transformed from one model to another, the application may be affected in many ways. These impacts can be divided into two main categories:
- the size of the data object . The compiler aligns the data types by natural boundaries, in other words, the 32-bit data type is aligned on a 64-bit system by 32-bit boundaries, and the 64-bit data type is aligned on 64-bit systems by 64-bit boundaries. This means that data objects, such as structs or unions, are different in size on 32-bit and 64-bit systems.
- the size of the base data type . Generally, assumptions about the relationship between basic data types are not valid on the 64-bit data model. Applications that rely on these relationships will fail to compile on 64-bit platforms. For example,sizeof (int) = sizeof (long) = sizeof (pointer)the assumption is valid for the ILP32 data model, but it is not valid for other data models.
In summary, the compiler aligns the data types according to natural boundaries, which means that the compiler will "populate", forcing the alignment of this way, as it did in the C struct and union. Members of a struct or union are aligned according to the widest member. The structure is explained in Listing 1.
Listing 1. C structure
struct Test {int i1;double d;int i2;long l;}
Table 2 shows the size of each member in the structure, and the size of the structure on 32-bit systems and 64-bit systems.
Table 2. Size of structure and struct members
struct member |
size on 32-bit system |
size on 64-bit system |
struct test { |
|
|
int i1; |
32-bit |
32-bit |
  |
|
32-bit padding |
double D; |
64-bit |
64-bit |
int i2; |
32-bit |
32-bit |
  |
|
32-bit padding |
long l; |
32-bit |
64-bit |
}; |
structure size is 20 bytes |
structure size is 32 bytes |
Note that on a 32-bit system, the compiler might not align the variabled, although it is a 64-bit object, because the hardware treats it as two 32-bit objects. However, the 64-bit systemdaligns the andlall, adding two 4-byte fills.
Back to top of page
Migrating from 32-bit systems to 64-bit systems
This section describes how to troubleshoot some common problems:
- Statement
- An expression
- Assign value
- Numeric constants
- Endianism
- Type definition
- Displacement
- String formatting
- function parameters
Statement
To make your code work on both 32-bit and 64-bit systems, be aware of the following usage of the declaration:
- Use "L" or "U", as appropriate, to declare integral constants.
- Ensure that unsigned integers are used to prevent symbol extension problems.
- If some variables need to be 32-bit on these two platforms, define their type as int.
- If some of the variables are 32 bits on a 32-bit system and 64 bits on 64-bit systems, define their type as long.
- For alignment and performance, declare a numeric variable to be of type int or long. Do not attempt to save bytes using char or short types.
- The character pointer and character byte are declared as unsigned, which prevents the 8-bit character from being extended to symbols.
An expression
In C + +, expressions are based on the binding law, the precedence of operators, and a set of mathematical calculation rules. To make the expression work correctly on both 32-bit and 64-bit systems, note the following rules:
- The result of adding two signed integers is a signed integer.
- The sum of two numbers of int and long is the result of a long type.
- If an operand is an unsigned integer and the other operand is a signed integer, the result of the expression is an unsigned integer.
- The sum of two numbers for the int and doubule types, and the result is a number of type double. The number of the int type here is converted to a double type before the addition operation is performed.
Assign value
Because pointers, int, and long are no longer the same size on 64-bit systems, problems can occur depending on how these variables are assigned and used in the application. Here are some tips on assigning a value:
- Do not swap using the int and long types, because this can cause high-bit numbers to be truncated. For example, do not do the following things:
int I;long l;i = l;
- Do not use the int type to store pointers. The following example works well on a 32-bit system, but fails on a 64-bit system because a 32-bit integer cannot hold a 64-bit pointer. For example, do not do the following things:
unsigned int i, *ptr;i = (unsigned) ptr;
- Do not use pointers to hold values of type int. For example, do not do the following things;
int *ptr;int i;ptr = (int *) I;
- If you mix unsigned and signed 32-bit integers in an expression and assign them to a signed long type, convert one of the operands to a 64-bit type. This causes the other operands to be converted to a 64-bit type, so that no conversion is required when the expression is assigned. Another solution is to convert the entire expression so that it can be extended at the time of assignment. For example, consider the following issues that may occur with this usage:
Long N;int i = -2;unsigned k = 1; n = i + K;
Mathematically, the result of the expression shown above in boldface should be-1. However, because the expression is unsigned, the symbol extension is not performed. The solution is to convert an operand to a 64-bit type (this is the first line below), or to convert the entire expression (the second line below):
n = (long) i + k;n = (int) (i + K);
Numeric constants
16-binary constants are typically used as masks or special bit values. If a 16-binary constant with no suffix is 32 bits and its high position is set, it can be defined as an unsigned integer.
For example, the constant OXFFFFFFFFL is a signed long type. On a 32-bit system, this will place all bits (1 per bit), but on a 64-bit system, only the low 32 bits are set, and the result is that the value is 0x00000000ffffffff.
If we want all bits to be set, then a portable method is to define a signed constant with a value of-1. This will place all bits in place, as it uses the twos complement algorithm.
long x = -1l;
Another problem that may arise is the setting of the highest bit. On a 32-bit system, we are using constant 0x80000000. But portability is a better way to use a displacement expression:
1L << (sizeof (LONG) * 8)-1);
Endianism
Endianism refers to the method used to store data, which defines how bytes are addressed in integer and floating-point data types.
The Little-endian is to store the low-level bytes in the lower address of memory, storing high-bit bytes in the memory's higher address.
Big-endian is to store high-level bytes in low-memory addresses, storing low-bit bytes in high-memory addresses.
Table 3 shows an example of the layout of a 64-bit long integer.
Table 3. Layout of 64-bit long int typeaddressHigh address
|
Low |
|
|
|
|
|
|
|
Little endian |
Byte 0 |
Byte 1 |
Byte 2 |
Byte 3 |
Byte 4 |
Byte 5 |
Byte 6 |
Byte 7 |
Big endian |
Byte 7 |
Byte 6 |
Byte 5 |
Byte 4 |
Byte 3 |
Byte 2 |
Byte 1 |
Byte 0 |
For example, the layout of the 32-bit word 0x12345678 on the big endian machine is as follows:
Table 4. 0x12345678 layout on the Big-endian system
Memory Offset |
0 |
1 |
2 |
3 |
Memory contents |
0x12 |
0x34 |
0x56 |
0x78 |
If 0x12345678 is treated as a two-half word, 0x1234 and 0x5678, respectively, then you will see the following situation on the big endian machine:
Table 5. 0x12345678 on the Big-endian system as a two-half-word view of the situation
Memory Offset |
0 |
2 |
Memory contents |
0x1234 |
0x5678 |
However, on the little endian machine, the layout of the word 0x12345678 is as follows:
Table 6. 0x12345678 layout on the Little-endian system
Memory Offset |
0 |
1 |
2 |
3 |
Memory contents |
0x78 |
0x56 |
0x34 |
0x12 |
Similarly, the two-word 0x1234 and 0x5678 are as follows:
Table 7. 0x12345678 seen on the Little-endian system as a two-half-word
Memory Offset |
0 |
2 |
Memory contents |
0x3412 |
0x7856 |
The following example explains the difference between the byte order on the big endian and little endian machines.
The following C program compiles and runs on a big endian machine and prints "big endian", which prints "little endian" when compiling and running on a little endian machine.
Listing 2. Big endian and Little endian
#include <stdio.h>
main () {
int i = 0x12345678;
if (*(char *)&i == 0x12)
printf ("Big endian\n");
else if (*(char *)&i == 0x78)
printf ("Little endian\n");
}
Endianism is important in the following situations:
- When using bit masks
- Indirect pointer address portion of an object
There are bit fields in C and C + + to help deal with endian problems. I recommend using bit fields instead of using masked fields or 16-decimal constants. Several functions can be used to convert 16-bit and 32-bit data from "host byte order" to "network byte order". For examplehtonl (3),ntohl (3)to convert a 32-bit integer. Similarly,htons (3)ntohs (3)used to convert a 16-bit integer. However, for 64-bit integers, there is no standard set of functions. However, on the big endian and little endian systems, Linux provides several macros:
- Bswap_16
- Bswap_32
- Bswap_64
Type definition
It is recommended that you do not write your application using data types that change size on 64-bit systems, using some type definitions or macros to explicitly describe the size and type of data contained in a variable. Some definitions can make your code more portable.
- ptrdiff_t:
This is a signed integer, which is the result of subtracting two pointers.
- size_t:
This is an unsigned integer and is thesizeofresult of performing the operation. This is used when passing arguments to some functions (for examplemalloc (3)), or from some functions (such asfred (2)).
- int32_t,uint32_tetc.:
Defines an integral type with predefined widths.
- intptr_tanduintptr_t:
Defines an integral type, and any valid pointer can be converted to that type.
Example 1:
In the following statement, when the assignment isbufferSizemade, thesizeof64-bit value returned from the return is truncated to 32 bits.
int bufferSize = (int) sizeof (something);
The solution is to use asize_ttype conversion of the return value and assign it to a buffersize declared as asize_ttype, as follows:
size_t bufferSize = (size_t) sizeof (something);
Example 2:
On 32-bit systems, int and long are the same size. Because of this, some developers exchange these two types. This may cause the pointer to be assigned to the int type, or vice versa. On a 64-bit system, however, assigning pointers to int types results in truncation of high 32-bit values.
The solution is to store pointers as pointer types or as special types defined for this purpose, for example,intptr_tanduintptr_t.
Displacement
An untyped integer constant is the (unsigned) int type. This can cause problems that are truncated when the displacement occurs.
For example, in the following code,athe maximum value can be 31. This is because it1 << ais of type int.
long t = 1 << a;
To make a displacement on a 64-bit system, you should use it1Las follows:
long t = 1L << a;
String formatting
Functionsprintf (3)and their related functions can be the root cause of the problem. For example, on a 32-bit system,%dyou can use to print a value of type int or long, but on a 64-bit platform, this causes a long value to be truncated to a low 32-bit value. For variables of type long, the correct usage is%ld.
Similarly, when a small integer (char, short, int) is passed toprintf (3), it expands to 64 bits, and the symbol expands appropriately. In the following example, theprintf (3)pointer is assumed to be 32 bits.
char *ptr = &something;
printf (%x\n", ptr);
The above code will fail on a 64-bit system, and it will only display content that is 4 bytes low.
The solution to this problem is to use%pas follows; This works well on both 32-bit and 64-bit systems:
char *ptr = &something;
printf (%p\n", ptr);
function parameters
There are several things to keep in mind when passing arguments to a function:
- In cases where the data type of a parameter is defined by a function prototype, the parameter should be converted to this type according to the Standard Rules.
- In cases where the parameter type is not specified, the parameter is converted to a larger type.
- On 64-bit systems, integers are converted to 64-bit integer values, and single-precision floating-point types are converted to double-precision floating-point types.
- If the return value is not specified, then the default return value of the function is of type int.
The problem occurs when you pass a signed integer and an unsigned integer and as a long type. Consider the following scenario:
Listing 3. Passing signed and unsigned integers and as a long type
long function (long l);
int main () {
int i = -2;
unsigned k = 1U;
long n = function (i + k);
}
The above code fails on a 64-bit system because the expression(i + k)is an unsigned 32-bit expression, and the symbol is not expanded when it is converted to a long type. The solution is to cast an operand to a 64-bit type.
There is also a problem with register-based systems: the system uses registers instead of stacks to pass parameters to functions. Consider the following example:
float f = 1.25;
printf ("The hex value of %f is %x", f, f);
In a stack-based system, this prints the corresponding 16 binary values. In a register-based system, however, this 16-binary value is read from an integer register, not from a floating-point register.
The solution is to cast the address of a floating-point variable to a pointer to an integral type, as follows:
printf ("The hex value of %f is %x", f, *(int *)&f);
Back to top of page
Conclusion
Mainstream hardware vendors have recently expanded their 64-bit offerings because 64-bit platforms provide better performance, value, and scalability. The limitations of 32-bit systems, especially the 4GB virtual memory limit, have greatly stimulated many companies to start thinking about migrating to 64-bit platforms. Understanding how to port an application to a 64-bit architecture can help us write more portable and more efficient code.
Reference Learning
- You can refer to the original English text on the DeveloperWorks global site in this article.
- 64-bit programming models:why LP64? Introduces more detailed knowledge about various 64-bit programming models and debates about LP64.
- Learn about the 2038 issues with 32-bit systems on Wikipedia.
- Read "Porting enterprise applications from UNIX to Linux" (developerworks,2005 year February) for tips and tricks on porting large multithreaded applications to Linux.
- "Porting Intel applications to a bit Linux PowerPC" discusses some of the issues to consider when porting Linux from IA32 to PowerPC.
- Linux Online (linux.org) Linux distributions site provides a wealth of information about the release, including distributions on 64-bit systems.
- DeveloperWorks Linux on Power Architecture developer ' s Corner is a reference for programmers and developers of applications running on power hardware-based Linux.
- Penguinppc.org is a community site that is designed for Linux users on PowerPC systems.
- More resources for Linux developers can be found in the DeveloperWorks Linux zone.
- Stay tuned for DeveloperWorks technical events and webcasts.
Access to products and technologies
- Use IBM trial software in your next Linux development project, which can be downloaded directly from DeveloperWorks.
Discuss
- Join the DeveloperWorks community by participating in DeveloperWorks blogs.
"Go" porting a Linux application to a 64-bit system