Managing the affinity of processors (affinity)

Source: Internet
Author: User
Tags bitmask set set

Http://www.ibm.com/developerworks/cn/linux/l-affinity.html


Simply put, CPU affinity (affinity) is the tendency for a process to run for as long as possible on a given CPU without being migrated to another processor. The Linux kernel process Scheduler is inherently endowed with features called soft CPU affinity (affinity), which means that processes typically do not migrate frequently between processors. This state is what we want, because the low frequency of process migration means that the load is small.

The 2.6 Linux kernel also includes a mechanism that allows developers to programmatically implement hard CPU affinity (affinity). This means that an application can explicitly specify which processor the process is running on.

What is the Linux kernel hard affinity (affinity).

In the Linux kernel, all processes have an associated data structure called task_struct. This structure is very important for a number of reasons, where the most relevant to affinity (affinity) is the cpus_allowed bit mask. This bitmask consists of n bits, which correspond to n logical processor one by one in the system. Systems with 4 physical CPUs can have 4 bits. If these CPUs are enabled for Hyper-threading, the system has a 8-bit bitmask.

If a given bit is set for a given process, the process can run on the associated CPU. Therefore, if a process can be run on any CPU and can migrate between processors as needed, the bitmask will be all 1. In fact, this is the default state for processes in Linux.

The Linux kernel API provides methods that allow users to modify bitmask or view current bitmask: Sched_set_affinity () (used to modify bit masks) sched_get_affinity () (Used to view current bitmask)

Note that cpu_affinity is passed to the child thread, so sched_set_affinity should be called appropriately.

Back to the top of the page

Why you should use hard affinity (affinity).

Typically, the Linux kernel is able to schedule processes well, running processes where they should run (that is, running on available processors and getting good overall performance). The kernel contains algorithms for detecting workload migration between CPUs and enables process migration to reduce the pressure on busy processors.

In general, you only need to use the default scheduler behavior in your application. However, you may want to modify these default behaviors to achieve performance optimization. Let's take a look at 3 reasons for using hard affinity (affinity).

Reason 1. Have a lot of calculations to do

Situations that are based on a large number of computations are often found in scientific and theoretical calculations, but this can also occur in the calculation of common areas. A common sign is that you find your applications to spend a lot of computing time on multiprocessor machines.

Reason 2. You are testing a complex application

Testing complex software is another reason why we are interested in kernel affinity (affinity) technology. Consider an application that needs to be tested for linear scalability. Some product declarations can be performed better when more hardware is used.

Instead of buying multiple machines (one machine for each processor configuration), we can: buy a multiprocessor machine, increase the number of allocated processors, measure the transactions per second, and evaluate the scalability of the results.

If your application can scale linearly as the CPU increases, then the number of transactions per second and the number of CPUs should be linearly related (for example, slash-see the next section). This modeling can determine whether an application can use the underlying hardware effectively. Amdahl Law

The Amdahl law is about the principle of using parallel processors to solve problems relative to the acceleration ratio of using only one serial processor to solve problems. The speedup ratio (speedup) equals the time of serial execution (using only one processor) divided by the time the program executes in parallel (using multiple processors):

      T (1)
S =------
      T (j)
      

where T (j) is the time it takes to execute a program with a J processor.

The Amdahl rule shows that this acceleration may not occur in reality, but it can be very close to that value. For the most part, we can deduce that each program has some serial components. As the problem set becomes larger, the serial component will eventually reach an upper limit in optimizing the solution time.

The Amdahl rule is especially important when you want to keep high CPU cache hit rates. If a given process migrates elsewhere, it loses the advantage of taking advantage of the CPU cache. In fact, if the CPU you are using needs to cache some special data for itself, then all other CPUs will invalidate the data in their own cache.

Therefore, if multiple threads require the same data, it makes sense to bind the threads to a particular CPU, ensuring that they have access to the same cached data (or at least the cache's hit Ratio). Otherwise, these threads may be executed on different CPUs, which can frequently invalidate other cache entries.

Reason 3. You are running a time sensitive, deterministic process

The last reason we are interested in CPU affinity (affinity) is the real-time (time-sensitive) process. For example, you might want to use hard affinity (affinity) to specify a processor on a 8-way host, while allowing the other 7 processors to handle all normal system scheduling. This approach ensures that long-running, time-sensitive applications can be run while allowing other applications to monopolize the rest of the computing resources.

The sample application below shows how this works.

Back to the top of the page

How to make use of hard affinity (affinity)

Now let's design a program that can make Linux systems very busy. You can build this application using the system calls described earlier and other APIs that describe how many processors are in the system. In fact, our goal is to write a program that can make every processor in the system busy for a few seconds. You can download the sample program from the "Downloads" section later.
Listing 1. Make the processor busy

                
/* This method would create threads, then bind each to its own CPU. * *
bool do_cpu_stress (int numthreads)
{
   int ret = TRUE;
   int created_thread = 0;
   /* We need a thread for each CPU we have ... *
   /while (Created_thread < numthreads-1)
   {
      int mypid = Fork ();
      if (mypid = 0)/* Child process/
       {
          printf ("\tcreating Child Thread: #%i\n", created_thread);
          break;
      else/* Only parent Executes * * * *
          Continue Looping until we spawned enough threads! * *;
          created_thread++
      }
   }
   /* Note:all threads Execute code from here down! */

As you can see, this code simply creates a set of threads through the fork call. Each thread executes the code that follows this method. Now we have each thread set affinity (affinity) to its own CPU.
Listing 2. Set CPU affinity for each thread (affinity)

                
   cpu_set_t Mask;
   /* Cpu_zero initializes all "bits in the" Mask to ZERO. * *
        Cpu_zero (&mask);
   /* Cpu_set sets only the bit corresponding to CPU. * *
        Cpu_set (created_thread, &mask);
   /* Sched_setaffinity returns 0 in success *
        /if (sched_setaffinity (0, sizeof (mask), &mask) = = 1)
   {
      pri NTF ("Warning:could not set CPU Affinity, continuing...\n");
   }

If the program can execute here, then our thread has set its affinity (affinity). Calling sched_setaffinity Sets the CPU affinity (affinity) mask for the process referenced by the PID. If the PID is 0, then the current process is used.

The affinity (affinity) mask is represented using a bitmask stored in mask. The lowest bit corresponds to the first logical processor in the system, and the highest bit corresponds to the last logical processor in the system.

Each set of bits corresponds to a CPU that can be legitimately dispatched, while a bit that is not set corresponds to an unscheduled CPU. In other words, processes are bound to run only on the processors that have the corresponding bits set up. Typically, all the bits in the mask are placed. The affinity of these threads (affinity) is passed to the child processes that derive from them.

Note You should not modify the bitmask directly. You should use the following macros. Although not all of these macros are used in our example, these macros are listed in detail in this article and you may need them in your own programs.
Listing 3. Macros that indirectly modify bit masks

                
void Cpu_zero (cpu_set_t *set)
initializes the CPU set set to the empty set.
void Cpu_set (int CPU, cpu_set_t *set)
This macro adds the CPU to the CPU set set.
void cpu_clr (int CPU, cpu_set_t *set)
This macro removes the CPU from the CPU set set.
int cpu_isset (int CPU, const cpu_set_t *set)
If the CPU is a member of the CPU set set, the macro returns a value other than 0 (true), otherwise it returns 0 (false).

For this article, the sample code will continue to let each thread perform some computationally significant operations.
listing 4. Each thread performs a compute-sensitive operation

                
    /* Now we have a single thread bound to each CPU on the system *
    /int computation_res = Do_cpu_expensive_op (a);
    cpu_set_t Mycpuid;
    Sched_getaffinity (0, sizeof (MYCPUID), &mycpuid);
    if (Check_cpu_expensive_op (computation_res))
    {
      printf ("Success:thread completed, and passed integrity check! \ n ",
         mycpuid);
      ret = TRUE;
    }
    else
    {
      printf ("Failure:thread failed integrity check!\n",
         mycpuid);
      ret = FALSE;
    }
   return ret;
}

Now you've learned the basics of setting CPU affinity (affinity) in the Linux 2.6 version of the kernel. Next, we use a main program to encapsulate these methods, which use a user-specified parameter to indicate how many CPUs are busy. We can use another method to determine how many processors are in the system:

int num_procs = sysconf (_sc_nprocessors_conf);

This method allows the program to determine for itself how many processors to keep busy, such as by default, all processors are busy, and allows the user to specify a subset of the actual processor range in the system.

Back to the top of the page

Run the sample program

When you run the sample program described earlier, you can use a number of tools to see if the CPU is busy. If you are simply testing, you can use the Linux command top. When you run the top command, press the "1" key to see the percentage of each CPU execution process.

Back to the top of the page

Conclusion

Although this sample program is very simple, it shows the basics of using hard affinity (affinity) implemented in the Linux kernel. (Any application that uses this code will undoubtedly do something more meaningful.) Knowing the basics of the CPU affinity (affinity) kernel API, you can extract the last bit of performance from a complex application.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.