[Windows] thread discussion-atomic access to thread synchronization

Source: Internet
Author: User

This series is intended to record the knowledge points of windwos threads, including thread basics, thread scheduling, thread synchronization, TLS, and thread pool.

Multi-thread synchronization

We know that a single-core processor can only process one command at a time. The operating system schedules multiple tasks and threads through time slice. In this process, the operating system will interrupt a thread at any time (this interruption is in the unit of instruction), that is to say, it is entirely possible that the thread has used up the time slice at an unknown time, the control is handed over to another thread. When the other thread runs out of time slices, the control is switched back. However, the value of a shared global variable may have changed! This may have disastrous consequences, maybe not. Therefore, at the system level, whenever the time slice of the thread is used up, the system must set the value of the current CPU register (such as the command register and stack pointer register) write the thread kernel object to "save the scene". After the thread obtains the time slice again, it should restore the last "field" from the kernel object to the CPU register.

It should be emphasized that the thread interruption time is completely unknown. For the CPU, the true "atomic operation" should be an instruction rather than an advanced language statement. Assume that C statement operations such as G_x ++ require the following Assembly commands:

 
MoV eax, [G_x] Inc eaxmov [G_x], eax

The second instruction may be executed. The new G_x value has not been written back to the memory. The thread's time slice has arrived, and the control is handed over to another thread. The other thread also needs to operate G_x, the results are unpredictable.

It can be seen that thread synchronization is more difficult than we think. Fortunately, windows, various languages, or various class libraries provide us with multi-threaded Synchronization Methods. This topic begins to discuss thread synchronization under Win32.

 

Atomic access: interlocked Functions

To solve the problem of atomic access to G_x ++ (that is, G_x ++ will not be interrupted), you can use the following method:

 
Long G_x = 0; DWORD winapi threadfunc1 (pvoid pvparam) {interlockedexchangeadd (& G_x, 1); Return (0) ;}dword winapi threadfunc2 (pvoid pvparam) {interlockedexchangeadd (& G_x, 1); Return (0 );}

AboveCodeOfInterlockedexchangeaddMake sure that the addition operation is performed in the form of "Atomic access. Interlockedexchangeadd works on different CPUs. However, we must ensure that the variable addresses passed to these interlocked functions are aligned.

The so-called alignment means that the address Division data size of the index data should be 0. For example, the starting address of word should be divisible by 2, and the address of DWORD can be divisible by 4. X86 architecture CPU can automatically handle data dislocation, while IA-64 processor can not handle, but will throw errors to Windows, Windows can decide whether to throw an exception or help CPU processing dislocation. In short, data dislocation will not cause errors, but because the CPU will consume at least one read memory operation, it will affectProgramPerformance.

Interlockedexchange is used to set a 32-bit value as an atom and return its previous value. It can be used to implementSpinlock):

 
// Global variable indicates whether shared resources are occupied bool g_fresourceinuse = false ;... void func1 () {// wait for the shared resource to be released while (interlockedexchange (& g_fresourceinuse, true) = true) sleep (0); // access the shared resource... // release interlockedexchange (& g_fresourceinuse, false) when you no longer need to share resources );}

 

The while loop keeps running, and the value of g_fresourceinuse is set to true. If the returned value is true, the resource is occupied. Therefore, the thread sleep (0) means that the thread immediately gives up its own time slice, this will cause the CPU to schedule other threads. If the returned value is flase, it indicates that the resource is not currently in use and you can access the shared resource. But be careful when using this technology, because the rotation lock will waste CPU time.

 

 

High-speed cache row and volatile

As we all know, the CPU has a high-speed cache, and the size of the CPU high-speed cache is an indicator of the CPU performance. Currently, the CPU generally has three levels of cache, And the CPU always gives priority to reading data from the first-level cache. If the read fails, it will read data from the second-level cache and finally read data from the memory. The cache of the CPU is composed of many cache lines. For the cpu Of The X86 architecture, the cache line is usually 32 bytes. When the CPU needs to read a variable, the memory data of the variable in 32-byte groups will be read into the cache line together. Therefore, for programs with strict performance requirements, it is very important to make full use of the advantages of high-speed cache rows. Alignment 32-byte frequently accessed data at one time and read it into the cache to reduce data exchange between the CPU advanced cache and low-level cache and memory.

However, for computers with multiple CPUs, the situation is different. For example:

    1. Cpu1 reads a byte and Its Adjacent bytes are read into the cache of cpu1.
    2. Cpu2 has done the same job. In this way, the cache of cpu1 and cpu2 has the same data.
    3. CPU 1 modifies the byte. After modification, the byte is put back to the cache line of CPU 1. However, this information is not written into RAM.
    4. Cpu2 accesses this byte, but because cpu1 does not write data to ram, data is not synchronized.

Of course, CPU designers fully consider this. When a CPU modifies the bytes in the cache row, other CPUs in the computer will be notified, and their cache will be considered invalid. Therefore, in the above situation, cpu2 finds that the data in its cache is invalid, and cpu1 will immediately write its data back to ram, and then cpu2 reads the data again. It can be seen that the high-speed cache row may cause some disadvantages on the multi-processor.

The above background knowledge has at least two meanings for programming:

1. Some compilers optimize the variables. Such optimization may cause the CPU to always point to the high-speed cache, rather than the memory. In this way, when a variable is shared by multiple threads, the setting of variables in one thread may never be reflected in another thread, because the other thread runs on another CPU, and the variable value is in the cache of this CPU!VolatileKeyword indicates that the code generated by the compiler always reads variables from the memory, instead of doing similar optimization.

2. Reasonable settings in multi-CPU EnvironmentsCache alignmentTo minimize cache synchronization between CPUs to improve performance. To align the high-speed cache, you must first know the size of the high-speed cache row of the target CPU, and then use _ declspec (align (#)) to tell the compiler to specify the data size that matches the cache row size for the variable or structure settings. For example:

 
Struct cache_align S1 {// cache align all instances of S1 int A, B, C, D ;}; struct S1 S1; // S1 is 32-byte cache aligned

For more information, see: http://msdn.microsoft.com/en-us/library/83ythb65.aspx

 

Specifically, the goal of High-speed cache row alignment can be: in the structure, the fields with frequent read operations are separated from those with frequent write operations, so that the read operation field and write operation field appear in different high-speed cache rows. This reduces the number of high-speed cache row synchronization times and improves performance to a certain extent.

Labor fruit, reproduced please indicate the source: http://www.cnblogs.com/P_Chou/archive/2012/06/17/interlocked-in-thread-sync.html

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.