GCC and Linux are excellent combinations. Although they are independent software, Linux relies entirely on GCC to run on the new architecture. Linux also utilizes the features in GCC (calledExtension) To achieve more functions and optimization. This article discusses some important extensions and explains how to use them in the Linux kernel.
The current stable GCC version (version 4.3.2) supports three C-standard versions:
- International Organization for Standardization (ISO) original C language standard (ISO c89 or C90)
- ISO C90 with correction 1
- Current ISO c99 (this is the default standard used by GCC, and this standard is also assumed in this article)
Note:This document assumes that the ISO c99 standard is used. If you specify an earlier standard than ISO c99, some extensions described in this article may not be used. You can use-std
Specifies the actual standard used by GCC. You can use the GCC manual to view which extensions are supported in the standard version (see the reference documentation ).
Available versions
This article focuses on using gcc extensions in 2.6.27.1 Linux kernel and GCC 4.3.2. Each C extension references a file in the Linux kernel source code. You can find the example in it.
You can classify available c extensions in several ways. This article divides them into two categories:
- FunctionalityExtensions provide new features.
- OptimizationExtended to help generate more efficient code.
Function Scaling
First, we will discuss some GCC extensions for the Standard C language.
Type discovery
GCC allows you to identify types by referencing variables. This Operation SupportsGeneric programming. Similar functions can be found in many modern programming languages, such as C ++, Ada, and Java. Linuxtypeof
Buildmin
And
max
Depends on the type of operation. Listing 1 shows how to usetypeof
Build a generic macro (see./Linux/include/Linux/kernel. h ).
List 1. Usetypeof
Build a generic macro
#define min(x, y) ({\typeof(x) _min1 = (x);\typeof(y) _min2 = (y);\(void) (&_min1 == &_min2);\_min1 < _min2 ? _min1 : _min2; }) |
Extended Range
GCC support scope, which can be used in many aspects of C language. One of them isswitch
/case
Block
case
Statement. In complex condition structuresif
The statement implementation is the same as that in Listing 2 (see./Linux/Drivers/SCSI/SD. c), but Listing 2 is more concise. Useswitch
/case
You can also use the jump table to implement Compiler optimization.
Listing 2.case
Scope of use in statement
static int sd_major(int major_idx){switch (major_idx) {case 0:return SCSI_DISK0_MAJOR;case 1 ... 7:return SCSI_DISK1_MAJOR + major_idx - 1;case 8 ... 15:return SCSI_DISK8_MAJOR + major_idx - 8;default:BUG();return 0;/* shut up gcc */}} |
You can also use a range for initialization, as shown below (see./Linux/ARCH/CRIS/arch-v32/kernel/SMP. C ). In this example,spinlock_t
CreateLOCK_COUNT
. Each element of the array is initializedSPIN_LOCK_UNLOCKED
Value.
/* Vector of locks used for various atomic operations */spinlock_t cris_atomic_locks[] = { [0 ... LOCK_COUNT - 1] = SPIN_LOCK_UNLOCKED}; |
The range also supports more complex initialization. For example, the following code specifies the initial values of several sub-ranges in the array.
int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 }; |
Zero-length Array
In the C standard, at least one array element must be defined. This requirement often complicate the code design. However, GCC supports the concept of a zero-length array, which is particularly useful for structure definition. This concept is similar to flexible array members in ISO c99, but uses different syntaxes.
The following example declares an array with no members at the end of the structure (see./Linux/Drivers/ieee1394/raw1394-private.h ). This allows elements in the structure to reference the memory that follows the structure instance. This feature is useful when you need a variable number of array members.
struct iso_block_store { atomic_t refcount; size_t data_size; quadlet_t data[0];}; |
Determine the call address
In many cases, you need to determine the caller of a given function. GCC provides built-in functions for this purpose__builtin_return_address
. This function is usually used for debugging, but it has many other purposes in the kernel.
The following code shows,__builtin_return_address
Receivelevel
. This parameter defines the call stack level for obtaining the return address. For example, if you specifylevel
Is0
Is the return address of the current function. If you specify
level
Is1
Is the return address of the function to be called, and so on.
void * __builtin_return_address( unsigned int level ); |
In the following example (see./Linux/kernel/softirq. C ),local_bh_disable
Function disables Soft Interrupt on the local processor, thus prohibiting softirqs, tasklets, and bottom halves from running on the current processor. Use__builtin_return_address
Capture the return address to use this address for future tracking.
void local_bh_disable(void){ __local_bh_disable((unsigned long)__builtin_return_address(0));} |
Constant Detection
During compilation, you can use a built-in function provided by GCC to determine whether a value is a constant. This information is very valuable because it can be used to construct expressions that can be optimized by constant folding.__builtin_constant_p
Function is used to detect constants.
__builtin_constant_p
The prototype is as follows. Note,__builtin_constant_p
It cannot detect all constants, because GCC is not easy to prove whether some values are constants.
int __builtin_constant_p( exp ) |
Linux uses constant detection quite frequently. In the example shown in listing 3 (see./Linux/include/Linux/log2.h), use constant detection for optimization.
roundup_pow_of_two
Macro. If the expression is a constant, use a constant expression that can be optimized. If the expression is not a constant, call another macro function to round up the value to the power of 2.
Listing 3. Using constant detection to optimize macro functions
#define roundup_pow_of_two(n)\(\__builtin_constant_p(n) ? (\(n == 1) ? 1 :\(1UL << (ilog2((n) - 1) + 1))\ ) :\__roundup_pow_of_two(n)\) |
Function attribute
GCC provides many function-level attributes that can be used to provide more data to the compiler to help the compiler perform optimization. This section describes the attributes associated with functions. The next section describes the attributes that affect optimization.
As shown in Listing 4, attributes use other symbol definitions to specify aliases. For more information about how to use attributes, see./Linux/include/Linux/compiler-gcc3.h ).
Listing 4. Function attribute Definition
# define __inline__ __inline__ __attribute__((always_inline))# define __deprecated __attribute__((deprecated))# define __attribute_used__ __attribute__((__used__))# define __attribute_const__ __attribute__((__const__))# define __must_check __attribute__((warn_unused_result)) |
The definitions shown in Listing 4 are some function attributes available in GCC. They are also the most useful function attributes in the Linux kernel. The following explains how to use these attributes:
always_inline
Enables GCC to concurrently process specified functions, regardless of whether optimization is enabled.
deprecated
Indicates that the function has been deprecated and should not be used again. If you try to use an obsolete function, you will receive a warning. You can also apply this attribute to types and variables to encourage developers to use them as little as possible.
__used__
Tell the compiler whether or not GCC finds the call instance of this function to use this function. This is helpful for Calling C functions from assembly code.
__const__
Tells the compiler that a function is stateless (that is, it uses the parameter passed to it to generate the result to be returned ).
warn_unused_result
Let the compiler check whether all callers check the function results. This ensures that the caller can properly check the function results and handle errors as appropriate.
The following is an example of using these attributes in the Linux kernel.deprecated
The example is from a kernel (./Linux/kernel/resource. c) unrelated to the architecture ),const
The example is from the source code of the IA64 kernel (./Linux/ARCH/IA64/kernel/unwind. C ).
int __deprecated __check_region(struct resource *parent, unsigned long start, unsigned long n)static enum unw_register_index __attribute_const__ decode_abreg(unsigned char abreg, int memory) |
Optimized Scaling
Now we will discuss some GCC features that help generate better machine codes.
Branch Prediction prompt
One of the most common optimization technologies in Linux kernel is__builtin_expect
. When developers use conditional code, they often know which branch is most likely to be executed, and which branch is rarely executed. If the compiler knows this prediction information, it can generate the best code around the branch that is most likely to be executed.
As shown below,__builtin_expect
Is based on two macros.likely
Andunlikely
(See./Linux/include/Linux/compiler. h ).
#define likely(x)__builtin_expect(!!(x), 1)#define unlikely(x)__builtin_expect(!!(x), 0) |
Use__builtin_expect
The compiler can make command selection decisions that match the provided prediction information. This makes the executed code as close as possible to the actual situation. It can also improve the cache and command line.
For example, if a condition is labeled with "likely", the compiler can place the true part of the Code directly after the branch instruction (so that the branch instruction is not required ). Using branch commands to access the false part of the condition structure is not the optimal method, but it is unlikely to be accessed. In this way, the code is optimal for the most likely situation.
Listing 5 provides a usage example.likely
Andunlikely
Macro functions (see./Linux/NET/CORE/datax. C ). This function predictssum
The variable will be zero (the packet'schecksum
Is valid), and
ip_summed
Variable not equalCHECKSUM_HW
.
Listing 5. Examples of likely and unlikely macros
unsigned int __skb_checksum_complete(struct sk_buff *skb){ unsigned int sum; sum = (u16)csum_fold(skb_checksum(skb, 0, skb->len, skb->csum)); if (likely(!sum)) { if (unlikely(skb->ip_summed == CHECKSUM_HW)) netdev_rx_csum_fault(skb->dev); skb->ip_summed = CHECKSUM_UNNECESSARY; } return sum;} |
Pre-capture
Another important way to improve performance is to cache necessary data close to the processor. Caching can significantly reduce the time required to access data. Most modern processors have three types of memory:
- Level-1 cache usually supports single-cycle access
- Second-level cache supports two-period Access
- System memory supports longer access times
To minimize access latency and improve performance, it is best to put the data in the nearest memory. Manual execution of this task is calledPre-capture. GCC uses built-in functions__builtin_prefetch
Supports manual pre-capturing of data. Use this function to cache data before data is needed. As shown below,__builtin_prefetch
The function receives three parameters:
- Data address
rw
Parameter, which indicates whether the pre-captured data is used for read or write operations.
locality
Parameter, which specifies whether the data should be left in the cache or cleared after the data is used.
void __builtin_prefetch( const void *addr, int rw, int locality ); |
Prefetch is often used in linux kernels. Usually pre-capturing is used through macro and package functions. Listing 6 is an example of a helper function that uses the built-in function package (see./Linux/include/Linux/prefetch. h ). This function provides a pre-capturing mechanism for stream operations. Using this function can usually reduce cache missing and pauses, thus improving performance.
Listing 6. Range prefetch wrapper Functions
#ifndef ARCH_HAS_PREFETCH#define prefetch(x) __builtin_prefetch(x)#endifstatic inline void prefetch_range(void *addr, size_t len){#ifdef ARCH_HAS_PREFETCHchar *cp;char *end = addr + len;for (cp = addr; cp < end; cp += PREFETCH_STRIDE)prefetch(cp);#endif} |
Variable attributes
In addition to the Function Attributes discussed earlier in this article, GCC also provides attributes for variable and Type Definitions. One of the most important attributes isaligned
Attribute, which is used to realize object alignment in memory. In addition to being important to performance, some device or hardware configurations also require object alignment.aligned
The attribute has a parameter that specifies the desired alignment type.
The following example is used to suspend the software (see./Linux/ARCH/i386/MM/init. C ). Define when page alignment is requiredPAGE_SIZE
Object.
char __nosavedata swsusp_pg_dir[PAGE_SIZE]__attribute__ ((aligned (PAGE_SIZE))); |
The example in listing 7 describes two optimizations:
packed
Attribute to package elements of a structure to minimize the space they occupy. This means that if you definechar
Variable, which occupies no more than one byte (8 bits ). Bit fields are compressed into one bit without occupying more storage space.
- This source code uses
__attribute__
The statement is optimized. Multiple Attributes are defined in a comma-separated list.
Listing 7. Structure packaging and setting multiple attributes
static struct swsusp_header { char reserved[PAGE_SIZE - 20 - sizeof(swp_entry_t)]; swp_entry_t image; char orig_sig[10]; char sig[10];} __attribute__((packed, aligned(PAGE_SIZE))) swsusp_header; |
Conclusion
This article only discusses several GCC features that can be used in the Linux kernel. You can use the GNU gcc manual to learn more about all the extensions for C and C ++ languages (see
References ). In addition, although these extensions are often used in the Linux kernel, they can also be used in your own applications. With the development of GCC, new extensions will certainly emerge, which will further improve the performance and increase the functions of the Linux kernel.