1. Construct a generic macro (./Linux/include/Linux/kernel. h)
# Define min (x, y )({/
Typeof (x) _ min1 = (x );/
Typeof (y) _ min2 = (y );/
(Void) (& _ min1 = & _ min2 );/
_ Min1 <_ min2? _ Min1: _ min2 ;})
You can see what it means. But I still have some questions:
(1)
(Void) (& _ min1 = & _ min2); what is this line of code used?
(2) Why do I need to add () to the outside of {} and the compilation fails when no is added? What is the specific reason?
2. Extended Range
(1) switch statement
Switch ()
{
Case 1... 3:
Printf ("fafadsf ");
Break;
Case 4... 8:
Printf ("dsafaf ");
Break;
}
(2) array Initialization
Int widths [] = {[0... 9] = 1, [10... 99] = 2, [100] = 3 };
Many of the above kernels are used.
3. Zero-length Array
Struct iso_block_store {
Atomic_t refcount;
Size_t data_size;
Quadlet_t data [0];
};
This allows elements in the structure to reference the memory that follows the structure instance. This feature is useful when you need a variable number of array members.
Application instance:
Struct iso_block_store * P = (void *) malloc (sizeof (struct iso_block_store) + data_size );
4. Obtain the return address of the function.
The following code shows,__builtin_return_address
Receivelevel
. This parameter defines the call stack level for obtaining the return address. For example, if you specifylevel
Is0
Is the return address of the current function. If you specifylevel
Is1
Is the return address of the function to be called, and so on.
void * __builtin_turn_address( unsigned int level );
|
In the following example (see./Linux/kernel/softirq. C ),local_bh_disable
Function disables Soft Interrupt on the local processor, thus prohibiting softirqs, tasklets, and bottom halves from running on the current processor. Use__builtin_return_address
Capture the return address to use this address for future tracking.
void local_bh_disable(void) { __local_bh_disable((unsigned long)__builtin_return_address(0)); }
|
5. Constant Detection
During compilation, you can use a built-in function provided by GCC to determine whether a value is a constant. This information is very valuable because it can be used to construct expressions that can be optimized by constant folding.__builtin_constant_p
Function is used to detect constants.
__builtin_constant_p
The prototype is as follows. Note,__builtin_constant_p
It cannot detect all constants, because GCC is not easy to prove whether some values are constants.
int __builtin_constant_p( exp )
|
Linux uses constant detection quite frequently. In the example shown in listing 3 (see./Linux/include/Linux/log2.h), use constant detection for optimization.roundup_pow_of_two
Macro. If the expression is a constant, use a constant expression that can be optimized. If the expression is not a constant, call another macro function to round up the value to the power of 2.
#define roundup_pow_of_two(n)/ (/ __builtin_constant_p(n) ? (/ (n == 1) ? 1 :/ (1UL << (ilog2((n) - 1) + 1))/ ) :/ __roundup_pow_of_two(n)/ )
|
6. Function Attributes
GCC provides many function-level attributes that can be used to provide more data to the compiler to help the compiler perform optimization. This section describes the attributes associated with functions.
Alias is specified through other symbol definitions. For more information about how to use attributes, see./Linux/include/Linux/compiler-gcc3.h ).
# define __inline__ __inline__ __attribute__((always_inline)) # define __deprecated __attribute__((deprecated)) # define __attribute_used__ __attribute__((__used__)) # define __attribute_const__ __attribute__((__const__)) # define __must_check __attribute__((warn_unused_result))
|
It defines some function attributes available in GCC. They are also the most useful function attributes in the Linux kernel. The following explains how to use these attributes:
always_inline
Enables GCC to concurrently process specified functions, regardless of whether optimization is enabled.
deprecated
Indicates that the function has been deprecated and should not be used again. If you try to use an obsolete function, you will receive a warning. You can also apply this attribute to types and variables to encourage developers to use them as little as possible.
__used__
Tell the compiler whether or not GCC finds the call instance of this function to use this function. This is helpful for Calling C functions from assembly code.
__const__
Tells the compiler that a function is stateless (that is, it uses the parameter passed to it to generate the result to be returned ).
warn_unused_result
Let the compiler check whether all callers check the function results. This ensures that the caller can properly check the function results and handle errors as appropriate.
The following is an example of using these attributes in the Linux kernel.deprecated
The example is from a kernel (./Linux/kernel/resource. c) unrelated to the architecture ),const
The example is from the source code of the IA64 kernel (./Linux/ARCH/IA64/kernel/unwind. C ).
int __deprecated __check_region(struct resource *parent, unsigned long start, unsigned long n)
static enum unw_register_index __attribute_const__ decode_abreg(unsigned char abreg, int memory)
|
7 branch prediction prompt
One of the most common optimization technologies in Linux kernel is__builtin_expect
. When developers use conditional code, they often know which branch is most likely to be executed, and which branch is rarely executed. If the compiler knows this prediction information, it can generate the best code around the branch that is most likely to be executed.
As shown below,__builtin_expect
Is based on two macros.likely
Andunlikely
(See./Linux/include/Linux/compiler. h ).
#define likely(x)__builtin_expect(!!(x), 1) #define unlikely(x)__builtin_expect(!!(x), 0)
|
Use__builtin_expect
The compiler can make command selection decisions that match the provided prediction information. This makes the executed code as close as possible to the actual situation. It can also improve the cache and command line.
Example
For example, if a condition is labeled with "likely", the compiler can set the true value of the Code.
The part is directly placed behind the branch command (so that the branch command does not need to be executed ). Access the false condition structure through the branch command
This is not the optimal method, but it is unlikely to be accessed. In this way, the code is optimal for the most likely situation.
The following is a usage example:likely
Andunlikely
Macro functions (see./Linux/NET/CORE/datax. C ). This function predictssum
The variable will be zero (the packet'schecksum
Is valid), andip_summed
Variable not equalCHECKSUM_HW
.
unsigned int __skb_checksum_complete(struct sk_buff *skb) { unsigned int sum;
sum = (u16)csum_fold(skb_checksum(skb, 0, skb->len, skb->csum)); if (likely(!sum)) { if (unlikely(skb->ip_summed == CHECKSUM_HW)) netdev_rx_csum_fault(skb->dev); skb->ip_summed = CHECKSUM_UNNECESSARY; } return sum; }
|
8 pre-capturing
Another important way to improve performance is to cache necessary data close to the processor. Caching can significantly reduce the time required to access data. Most modern processors have three types of memory:
- Level-1 cache usually supports single-cycle access
- Second-level cache supports two-period Access
- System memory supports longer access times
To minimize access latency and improve performance, it is best to put the data in the nearest memory. Manual execution of this task is calledPre-capture
. GCC uses built-in functions__builtin_prefetch
Supports manual pre-capturing of data. Use this function to cache data before data is needed. As shown below,__builtin_prefetch
The function receives three parameters:
- Data address
rw
Parameter, which indicates whether the pre-captured data is used for read or write operations.
locality
Parameter, which specifies whether the data should be left in the cache or cleared after the data is used.
void __builtin_prefetch( const void *addr, int rw, int locality );
|
Linux
Pre-capturing is often used in the kernel. Usually pre-capturing is used through macro and package functions. The following is an example of an auxiliary function that uses the built-in function package (see
./Linux/include/Linux/prefetch. h ). This function provides a pre-capturing mechanism for stream operations. Using this function can usually reduce cache missing and pauses, thus
Improve performance.
#ifndef ARCH_HAS_PREFETCH #define prefetch(x) __builtin_prefetch(x) #endif
static inline void prefetch_range(void *addr, size_t len) { #ifdef ARCH_HAS_PREFETCH char *cp; char *end = addr + len;
for (cp = addr; cp < end; cp += PREFETCH_STRIDE) prefetch(cp); #endif }
|
10 variable attributes
In addition to the Function Attributes discussed earlier in this article, GCC also provides attributes for variable and Type Definitions. One of the most important attributes isaligned
Attribute, which is used to realize object alignment in memory. In addition to being important to performance, some device or hardware configurations also require object alignment.aligned
The attribute has a parameter that specifies the desired alignment type.
The following example is used to suspend the software (see./Linux/ARCH/i386/MM/init. C ). Define when page alignment is requiredPAGE_SIZE
Object.
char __nosavedata swsusp_pg_dir[PAGE_SIZE] __attribute__ ((aligned (PAGE_SIZE)));
|
packed
Attribute to package elements of a structure to minimize the space they occupy. This means that if you definechar
Variable, which occupies no more than one byte (8 bits ). Bit fields are compressed into one bit without occupying more storage space.
- This source code uses
__attribute__
The statement is optimized. Multiple Attributes are defined in a comma-separated list.
static struct swsusp_header {
char reserved[PAGE_SIZE - 20 - sizeof(swp_entry_t)];
swp_entry_t image;
char orig_sig[10];
char sig[10];
} __attribute__((packed, aligned(PAGE_SIZE))) swsusp_header;