Some compiler optimization features of GCC

Source: Internet
Author: User
Tags case statement deprecated prefetch wrapper

GCC and Linux are excellent combinations. Although they are stand-alone software, Linux relies entirely on GCC to run on the new architecture. Linux also leverages features (called extensions) in GCC to achieve more functionality and optimization. This article discusses some important extensions that explain how to use them in the Linux kernel.

GCC's current stable version (version 4.3.2) supports three versions of the C standard: International Organization for Standardization (ISO) original C language standard (ISO C89 or C90) with revision 1 ISO C90 current ISO C99 (this is the default standard used by GCC, which is also assumed in this article)

Note: This article assumes the use of ISO C99 standards. If you specify an older standard than the ISO C99 version, you may not be able to use some of the extensions described in this article. You can use the-STD option on the command line to specify the actual criteria used by GCC. You can see in the GCC Manual which extensions are supported by which standard version (see Resources for links). Applicable version

This article focuses on using GCC extensions in the 2.6.27.1 Linux kernel and the 4.3.2 version of GCC. Each C extension refers to a file in the Linux kernel source code where the sample can be found.

You can categorize the available C extensions in several ways. This article divides them into two broad categories: Functional extensions provide new functionality. Optimizing extensions helps to generate more efficient code.

Functional extension

Let's discuss some of the GCC extensions that extend the standard C language.

Type discovery

GCC allows you to recognize types by reference to variables. This type of operation supports generic programming. Similar features can be found in many modern programming languages, such as C + +, Ada, and Java™ languages. Linux uses typeof to build a type-dependent operation such as Min and Max. Listing 1 shows how to use typeof to build a generic macro (see./linux/include/linux/kernel.h).
Listing 1. Building a generic macro using typeof

				
#define MIN (x, y) ({				\
	typeof(x) _min1 = (x);			\
	typeof(y) _min2 = (y);			\
	(void) (&_min1 = = &_min2);		\
	_min1 < _min2 _min1: _min2;})

Scope extension

GCC support scope, in many aspects of C language can be used. One of these is the case statement in the Switch/case block. In a complex conditional structure, it is common to rely on nested IF statements to achieve the same results as listing 2 (see./linux/drivers/scsi/sd.c), but listing 2 is simpler. You can also use Switch/case to perform compiler optimizations by using a jump table implementation.
Listing 2. Using Scopes in case statements

				
static int sd_major (int major_idx)
{
	switch (MAJOR_IDX) {case
	0: return
		scsi_disk0_major;
	1.. 7: Return
		scsi_disk1_major + major_idx-1;
	8.. : Return
		scsi_disk8_major + major_idx-8;
	Default:
		BUG ();
		return 0;	/* Shut up GCC */
	}
}

You can also use scopes for initialization, as shown below (see./linux/arch/cris/arch-v32/kernel/smp.c). In this example, spinlock_t creates an array of size lock_count. Each element of the array is initialized to the spin_lock_unlocked value.

/* Vector of locks used for various atomic operations/
spinlock_t cris_atomic_locks[] = {[0 ... Lock_count-1] = spin_lock_unlocked};

The scope also supports more complex initialization. For example, the following code specifies the initial values of several child scopes in an array.

[0 ... 9][10 ... [MB] = 3};

0-length Array

In the C standard, you must define at least one array element. This requirement often complicates the design of the code. However, GCC supports the concept of a 0-length array, which is especially useful for structural definitions. This concept is similar to the flexible array members in ISO C99, but uses different syntax.

The following example declares an array with no members at the end of the structure (see./linux/drivers/ieee1394/raw1394-private.h). This allows elements in the structure to refer to the memory immediately following the structure instance. This feature is useful when you need a variable number of array members.

struct Iso_block_store {
        atomic_t refcount;
        size_t data_size;
        quadlet_t data[0];
};

To determine the calling address

In many cases, the caller of a given function needs to be judged. GCC provides built-in function __builtin_return_address for this purpose. This function is typically used for debugging, but it has many other uses in the kernel.

As shown in the following code, __builtin_return_address receives a parameter called level. This parameter defines the call stack level that you want to get the return address. For example, if you specify a level of 0, you are requesting the return address of the current function. If you specify a level of 1, then the return address of the function that requested the call, and so on.

__builtin_return_address (unsigned int level);

In the following example (see./linux/kernel/softirq.c), the local_bh_disable function disables soft interrupts on the local processor, thus preventing the running of Softirqs, tasklets, and bottom halves on the current processor. Use __builtin_return_address to capture the return address so that it can be used at a later time for tracing.

void local_bh_disable (void)
{
        __local_bh_disable (unsigned long)__builtin_return_address (0);
}

Constant detection

At compile time, you can use a built-in function provided by GCC to determine whether a value is a constant. This information is valuable because it constructs expressions that can be optimized by constant stacking (constant folding). The __builtin_constant_p function is used to detect constants.

The prototype of the __builtin_constant_p is shown below. Note that __builtin_constant_p does not detect all constants because GCC does not easily prove that certain values are constants.

__builtin_constant_p (exp)

Linux uses constant detection fairly frequently. In the example shown in Listing 3 (see./linux/include/linux/log2.h), use constant detection to optimize ROUNDUP_POW_OF_TWO macros. If you find that an expression is a constant, you use a constant expression that you can optimize. If the expression is not a constant, call another macro function to turn the value up to the power of 2.
Listing 3. Using constant detection to optimize macro functions

				
#define ROUNDUP_POW_OF_TWO (n)			\
	__builtin_constant_p (n) ? (		\
		(n = = 1)? 1:			\
		(1UL << (Ilog2 ((n)-1) + 1)	\
				   ):		\
	__roundup_pow_of_two (n)			\
)

function properties

GCC provides a number of function-level properties that enable the compiler to perform optimizations by providing more data to the compiler. This section describes some of the properties associated with a feature. The following section describes the properties that affect optimization.

As shown in Listing 4, the properties specify aliases through other symbol definitions. You can use this to help you read the source code reference and understand how the properties are used (see./linux/include/linux/compiler-gcc3.h).
Listing 4. Function Property Definition

				
# define __INLINE__     __inline__      __attribute__ ((always_inline))
# define __deprecated __attribute__           ( (deprecated))
# define __ATTRIBUTE_USED__     __attribute__ ((__used__))
# define __ATTRIBUTE_CONST__ __attribute__     (__ const__))
# define __must_check            __attribute__ ((Warn_unused_result))

The definition shown in Listing 4 is some of the function properties available in GCC. They are also the most useful function properties in the Linux kernel. The following explains how to use these properties: Always_inline lets GCC process the specified function inline, regardless of whether optimizations are enabled or not. Deprecated indicates that the function has been deprecated and should not be reused. If you try to use a function that has been deprecated, you will receive a warning. This property can also be applied to types and variables, prompting developers to use them as little as possible. __used__ tells the compiler to use this function regardless of whether GCC discovers the invocation instance of the function. This is helpful for calling C functions from assembly code. __const__ tells the compiler that a function is stateless (that is, it uses the arguments passed to it to generate the result to be returned). Warn_unused_result Let the compiler check whether all callers check the results of the function. This ensures that the caller properly verifies the result of the function so that errors can be handled appropriately.

The following is an example of using these properties in the Linux kernel. The deprecated sample comes from an architecture-independent kernel (./linux/kernel/resource.c), and the Const sample comes from the IA64 kernel source code (./linux/arch/ia64/kernel/unwind.c).

__deprecated __check_region (struct resource 
    *parent, unsigned long start, unsigned long n)

__ attribute_const__ 
    decode_abreg (unsigned char abreg, int memory)

Back to the top of the page

Optimizing extensions

Now, discuss some of the GCC features that help generate better machine code.

Branch Prediction Tips

One of the most common optimization techniques in the Linux kernel is __builtin_expect. When developers use conditional code, they often know which branch is most likely to execute, and which branch is rarely executed. If the compiler knows this predictive information, it can generate the best code around the most likely branch of execution.

As shown below, the use of __builtin_expect is based on two macro likely and unlikely (see./linux/include/linux/compiler.h).

#define LIKELY (x)	__builtin_expect (!! (x), 1)
#define UNLIKELY (x)	__builtin_expect (!! (x), 0)
			

By using __builtin_expect, the compiler can make instruction selection decisions that conform to the provided predictive information. This allows the executed code to be as close to the actual situation as possible. It can also improve caching and instruction pipelining.

For example, if a condition is labeled "likely," the compiler can place the True part of the code directly behind the branch instruction (so that it does not need to execute branch directives). It is not an optimal way to access the False part of a conditional structure through a branch instruction, but there is little likelihood of accessing it. In this way, the code is optimal for the most likely scenario.

Listing 5 shows a function that uses likely and unlikely macros (see./linux/net/core/datagram.c). This function predicts that the sum variable will be 0 (the packet's CHECKSUM is valid), and that the ip_summed variable is not equal to CHECKSUM_HW.
Listing 5. Examples of the use of likely and unlikely macros

				
unsigned int __skb_checksum_complete (struct sk_buff *skb)
{
        unsigned int sum;

        sum = (U16) csum_fold (Skb_checksum (SKB, 0, Skb->len, skb->csum));
        if (likely(!sum)) {
                if (unlikely(skb->ip_summed = = CHECKSUM_HW))
                        Netdev_rx_csum_fault ( Skb->dev);
                skb->ip_summed = checksum_unnecessary;
        }
        return sum;
}

Pre-crawl

Another important performance improvement approach is to cache the necessary data in a location close to the processor. Caching can significantly reduce the time it takes to access data. Most modern processors have three types of memory: first-level caching typically supports one-cycle access level two cache support for two-week access system memory support for longer access time

In order to minimize access latency and thereby improve performance, it is best to put the data in the most recent memory. Manual execution of this task is called pre-crawl. GCC supports manual prefetching of data through built-in functions __builtin_prefetch. Use this function to put the data in the cache before you need it. As shown below, the __builtin_prefetch function receives three parameters: the address RW parameter of the data, using it to indicate whether the prefetch data is to perform a read operation, or to perform a write locality parameter, using it to specify whether the data should remain in the cache after using the data, or whether it should be cleared

__builtin_prefetch (const void *ADDR, int rw, int locality);

Pre-crawl is often used by the Linux kernel. Pre-crawling is typically used through macros and wrapper functions. Listing 6 is an example of an auxiliary function that uses the wrapper for the built-in function (see./linux/include/linux/prefetch.h). This function implements the pre-crawl mechanism for streaming operations. Using this function can generally reduce cache deletions and pauses, thereby improving performance.
Listing 6. Wrapper function for range pre-crawl

 #ifndef arch_has_prefetch #define PREFETCH (x)  __builtin_prefetch (x)  #endif static inline void pref
	Etch_range (void *addr, size_t len) {#ifdef arch_has_prefetch char *cp;

	Char *end = addr + len; for (cp = addr CP < end; CP = Prefetch_stride)  P  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.