Transferred from: http://msdn.microsoft.com/zh-cn/magazine/cc817398.aspxconcurrency dangerous solutions 11 common problems in the multi-thread code Joe Dudu y
This article introduces the following:
- Basic concurrency concepts
- Concurrency problems and suppression measures
- Security Mode
- Concept of cross-cutting
|
This article uses the following technologies: Multithreading,. NET Framework |
Directory data contention
Forgot to synchronize
Granularity Error
Read/write splitting
Unlocked re-order
Reenter
Deadlock
Lock Protection
Stamp
Two-step dance music
Priority inversion
Security Mode
Immutability
Purity
Isolation
Concurrency is everywhere. For a long time, server programs have to handle basic concurrent programming models. With the increasing popularity of multi-core processors, client programs will also need to execute some tasks. As concurrent operations continue to increase, security issues also emerge. That is to say, in the face of a large number of logic concurrent operations and changing physical hardware concurrency, the program must continue to maintain the same level of stability and reliability. Compared with the corresponding sequential code, the correct design of the concurrent code must follow some additional rules. The synchronization mechanism must be used to control memory read/write and access to shared resources to prevent conflicts. In addition, it is usually necessary to coordinate threads to complete a job collaboratively. The direct result of these additional requirements is that it can fundamentally ensure that the threads are always consistent and ensure that they are smoothly moving forward. Synchronization and coordination depend heavily on time, which leads to uncertainty and difficulty in prediction and testing. These attributes make people feel a little difficult, just because people's thinking has not changed. There is no dedicated API for learning, and there is no code segment that can be copied and pasted. In fact, there is indeed a set of basic concepts that you need to learn and adapt. It is very likely that some languages and libraries will hide some concepts over time, but if you start executing concurrent operations now, this will not happen. This article will introduce some of the more common challenges you need to pay attention to, and give some suggestions on how to use them in the software. First, I will discuss a type of problem that often occurs in concurrent programs. I call them "security risks" because they are easy to discover and have serious consequences. These risks may cause your program to be interrupted due to crashes or memory problems.
Data contention (or competition condition) occurs when data is accessed concurrently from multiple threads ). In particular, when one or more threads write a piece of data, this situation occurs if one or more threads are reading the data. This problem occurs because Windows programs (such as C ++ and Microsoft.. NET Framework) is basically based on the concept of shared memory. All threads in a process can access data residing in the same virtual address space. Static variables and heap allocation can be used for sharing. Consider the following typical example:
Copy code
static class Counter { internal static int s_curr = 0; internal static int GetNext() { return s_curr++; }}
Counter may want to distribute a new unique number to each call of getnext. However, if two threads in the program call getnext at the same time, the two threads may be given the same number. The reason is that s_curr ++ compilation includes three independent steps:
- Read the current value from the shared s_curr variable to the Processor register.
- Increment this register.
- Write the register value to the shared s_curr variable again.
The two threads executed in this order may read the same value (for example, 42) from s_curr locally and increase it to a specific value (for example, 43 ), then release the same result value. In this way, getnext returns the same number for the two threads, resulting in algorithm interruption. Although the simple statement s_curr ++ seems to be inseparable, this is not the case.
Forgetting to synchronize is the simplest case of data contention: synchronization is completely forgotten. This competition is rarely benign. That is to say, although they are correct, most of them are due to the underlying problems of such correctness. This problem is usually not very obvious. For example, an object may be part of a large complex object chart, which can be accessed using static variables, you can also pass an object as part of a closure when creating a new thread or releasing a job into a thread pool to become a shared chart. When an object (Chart) changes from private to shared, pay more attention to it. This is called publishing and will be discussed later in the isolation context. The opposite is called privatization, that is, the object (Chart) is changed from sharing to private again. The solution to this problem is to add the correct synchronization. In the counter example, I can use simple interlocking:
Copy code
static class Counter { internal static volatile int s_curr = 0; internal static int GetNext() { return Interlocked.Increment(ref s_curr); }}
It works because the update is limited to a single memory location, and because (this is very convenient) There is a hardware command (Lock Inc), which is equivalent to the software statement I tried to perform atomic operations. Alternatively, I can use mature locks:
Copy code
static class Counter { internal static int s_curr = 0; private static object s_currLock = new object(); internal static int GetNext() { lock (s_currLock) { return s_curr++; } }}
The lock statement ensures that all threads attempting to access getnext are mutually exclusive, and it uses the CLR system. Threading. Monitor class. The C ++ program uses critical_section for the same purpose. Although locking is not required for this particular example, it is almost impossible to merge multiple operations into a single lock operation.
Granularity errors even if you use the correct synchronization to access the shared state, the resulting behavior may still be incorrect. The granularity must be large enough to encapsulate operations that must be considered atomic in this region. This will cause a conflict between the correctness and the reduced area, because the reduced area will reduce the time for other threads to wait for synchronization to enter. For example, let's take a lookFigure 1Bank account abstraction. Everything works normally. The two methods of the object (deposit and withdraw) do not seem to have concurrent errors. Some banking applications may use them, and do not worry that the balance will be damaged due to concurrent access.
Figure 1 bank account
Copy code
class BankAccount { private decimal m_balance = 0.0M; private object m_balanceLock = new object(); internal void Deposit(decimal delta) { lock (m_balanceLock) { m_balance += delta; } } internal void Withdraw(decimal delta) { lock (m_balanceLock) { if (m_balance < delta) throw new Exception("Insufficient funds"); m_balance -= delta; } }}
However, what if you want to add a transfer method? An naive (or incorrect) idea would think that since deposit and withdraw are securely isolated, it is easy to merge them:
Copy code
class BankAccount { internal static void Transfer( BankAccount a, BankAccount b, decimal delta) { Withdraw(a, delta); Deposit(b, delta); } // As before }
This is incorrect. In fact, funds will be completely lost within a period of time between withdraw and deposit calls. The correct method is to lock a and B in advance and then execute the method call:
Copy code
class BankAccount { internal static void Transfer( BankAccount a, BankAccount b, decimal delta) { lock (a.m_balanceLock) { lock (b.m_balanceLock) { Withdraw(a, delta); Deposit(b, delta); } } } // As before }
Facts have proved that this method can solve the issue of granularity, but is prone to deadlocks. Later, you will learn how to fix it.
As described above, a benign competition allows you to access variables without synchronization. For words that are aligned and split naturally-for example, the content that is split by pointer is 32-bit (4 bytes) in a 32-bit processor ), in a 64-bit processor, It is 64-bit (8 bytes)-The read and write operations are atomic. If a thread only reads a single variable to be written by other threads, but does not involve any complex unchanged body, you can skip synchronization in some cases based on this guarantee. But pay attention to it. If you try to do this in an unaligned memory location or a natural split size location, read/write splitting may occur. The reason for the split is that the read or write operations at such locations actually involve multiple physical memory operations. They may be updated concurrently, and the result may be that the previous value and the subsequent value are combined in some form. For example, if threada is in a loop, you only need to write 0x0l and 0 xaaaabbbbccccddddl to the 64-bit variable s_x. Threadb reads it in a loop (seeFigure 2).
Figure 2 tearing
Copy code
internal static volatile long s_x;void ThreadA() { int i = 0; while (true) { s_x = (i & 1) == 0 ? 0x0L : 0xaaaabbbbccccddddL; i++; }}void ThreadB() { while (true) { long x = s_x; Debug.Assert(x == 0x0L || x == 0xaaaabbbbccccddddL); }}
You may be surprised to find that the threadb statement may be triggered. The reason is that threada's write operation contains two parts (high 32-bit and low 32-bit), the specific sequence depends on the compiler. This is also true for reading threadb. Therefore, threadb can witness the value 0xaaaabbbb00000000l or 0x00000000aaaabbbbl.
Lock-free re-sorting sometimes writing lock-free code to achieve better scalability and reliability is a very attractive idea. To do this, you need to have an in-depth understanding of the Memory Model of the Target Platform (for more information, see the article "memory models: understand the impact of low-lock techniques in multithreaded apps"
Msdn.microsoft.com/magazine/cc163715 ). If you do not understand or do not pay attention to these rules, Memory re-sorting errors may occur. These errors occur because the compiler and processor can freely re-Sort memory operations during processing or optimization. For example, assume that both s_x and s_y are initialized to 0, as shown below:
Copy code
internal static volatile int s_x = 0;internal static volatile int s_xa = 0;internal static volatile int s_y = 0;internal static volatile int s_ya = 0;void ThreadA() { s_x = 1; s_ya = s_y;}void ThreadB() { s_y = 1; s_xa = s_x;}
Is it possible that s_ya and s_xa contain 0 values after both threada and threadb are completed? It seems ridiculous. Or s_x = 1 or s_y = 1 will occur first. In this case, other threads will witness this update when starting to process their own updates. At least theoretically. Unfortunately, the processor may re-sort the code at any time to make loading operations more effective before writing. You can use an explicit memory barrier to avoid this problem:
Copy code
void ThreadA() { s_x = 1; Thread.MemoryBarrier(); s_ya = s_y;}
. NET Framework provides a specific API for this purpose. c ++ provides _ memorybarrier and similar macros. However, this example does not mean that you should insert a memory barrier everywhere. It should be noted that, before fully understanding the memory model, do not use unlocked code, and proceed with caution even after fully understanding.
In Windows (including Win32 and. NET Framework), most locks support recursion. This only means that even if the current thread already holds the lock, its requirements will still be met when it tries to obtain it again. This makes it easier to create a large atomic operation through a small atomic operation. In fact, the preceding bankaccount example relies on Recursive retrieval: transfer calls both withdraw and deposit, and each of them repeats the obtained locks of transfer. However, if a recursive operation occurs and you do not actually want it to do so, this may be the root cause of the problem. This may be due to re-entry, and the re-entry may be caused by dynamic code (such as virtual methods and delegation) explicit call or implicit re-input code (such as sta message extraction and asynchronous process call ). Therefore, it is best not to call dynamic methods from the locked area. For example, assume that a method temporarily destroys the unchanged body and then calls the delegate:
Copy code
class C { private int m_x = 0; private object m_xLock = new object(); private Action m_action = ...; internal void M() { lock (m_xLock) { m_x++; try { m_action(); } finally { Debug.Assert(m_x == 1); m_x--; } } }}
The m method of c ensures that m_x does not change. However, for a short period of time, m_x will first increase by 1 and then re-decrease. The m_action call seems to be okay. Unfortunately, if it is a delegate accepted from Class C users, it means that any code can execute the operation it requested. This includes the M method for callback to the same instance. If this happens, the finally statement may be triggered. Multiple calls for M activity may exist in the same stack (even if you did not directly perform this operation ), this will inevitably cause m_x
The value must be greater than 1.
When multiple threads encounter a deadlock, the system will stop responding directly. Multiple msdn magazinesThe article describes the cause of the deadlock and some methods to make it acceptable, including my own article "no more hangs: advanced techniques to avoid and detect deadlocks in. net apps
Msdn.microsoft.com/magazine/cc163618) and Stephen toub's October 2007. Net related issue column (website:
Msdn.microsoft.com/magazine/cc163352. All in all, as long as there is a loop wait chain-for example, threada is waiting for resources held by threadb, and threadb is also waiting for resources held by threada (maybe indirectly waiting for the third threadc or other resources) -All the forward actions may stop.
A common cause of this problem is mutex lock. In fact, the bankaccount example shown previously encountered this problem. If threada tries to transfer $500 from account #1234 to account #5678 while threadb tries to transfer $500 from #5678 to #1234, the Code may experience a deadlock. Use consistent order to avoid deadlocks, as shown in figure
Figure 3. This logic can be summarized as names such as "synchronization lock acquisition". Through this operation, multiple lockable objects can be dynamically sorted according to a certain sequence of locks, so that when two locks are obtained in the same order, the positions of the two locks must be maintained. Another solution, known as lock correction, can be used to reject locks that are identified to be completed in an inconsistent order.
Figure 3 consistent order
Copy code
class BankAccount { private int m_id; // Unique bank account ID. internal static void Transfer( BankAccount a, BankAccount b, decimal delta) { if (a.m_id < b.m_id) { Monitor.Enter(a.m_balanceLock); // A first Monitor.Enter(b.m_balanceLock); // ...and then B } else { Monitor.Enter(b.m_balanceLock); // B first Monitor.Enter(a.m_balanceLock); // ...and then A } try { Withdraw(a, delta); Deposit(b, delta); } finally { Monitor.Exit(a.m_balanceLock); Monitor.Exit(b.m_balanceLock); } } // As before ...}
But the lock is not the only root cause of the deadlock. Wake-up loss is another phenomenon. At this time, an event is missed, causing the thread to sleep forever. This often happens in Win32 automatic reset and manual reset events, condition_variable, CLR monitor. Wait, pulse, pulseall calls, and other synchronization events. Wake-up loss is usually a sign that the synchronization is incorrect, the waiting condition cannot be reset, or it is used when wake-All (wakeallconditionvariable or monitor. pulseall) is more suitable.
The wake-single primitive (wakeconditionvariable or monitor. Pulse ). Another common cause of this problem is the loss of automatic reset events and manual reset event signals. Since such events can only be in one State (with or without signals), redundant calls used to set this event will be ignored. If the Code determines that the two calls to be set always need to be converted to two wake-up threads, the result may be lost.
Lock protection when the arrival rate of a lock remains high compared with the Lock unlock rate, it may generate lock protection. In extreme cases, waiting for a lock thread to exceed its capacity will lead to disastrous consequences. For programs on the server side, this often happens if the demand for some lock-protected data structures required by the client increases. For example, consider the following scenario: on average, 8 requests are generated every 100 milliseconds. We use eight threads for service requests (because we use 8-CPU computers ). Each of these eight threads must obtain a lock and maintain it for 20 milliseconds before they can start the substantive work. Unfortunately, the access to this lock needs to be serialized. Therefore, it takes 160 milliseconds for all eight threads to enter and exit the lock. After the first exit, it takes 140 milliseconds for the ninth thread to access the lock. In essence, this solution cannot be adjusted, so backup requests will continue to grow. With the passage of time, if the arrival rate does not decrease, the client request will start to time out, causing catastrophic consequences. As we all know, the lock is protected by fairness in the lock. The reason is that the lock is artificially closed within the time period when the lock has been available, so that the arriving thread must wait, until the owner thread of the selected lock can wake up, switch the context, and obtain and release the lock. To solve this problem, Windows has gradually changed all internal locks to unfair locks, and CLR monitors are also unfair. The only effective solution to this basic protection problem is to reduce lock hold time and break down the system to minimize hot locks (if any ). Although it is easy to say, it is very important for scalability.
"Flocking" means that a large number of threads are awakened so that they all compete for attention from the Windows Thread scheduler at the same time. For example, if there are 100 blocked threads in a single manual setting event, and you set this event... Well, you may make a mess, especially when most of the threads have to wait again. One way to implement blocking queues is to manually set events. When the queue is empty, it becomes no signal, and when the queue is not empty, it becomes a signal. Unfortunately, if there are a large number of waiting threads in the transition from zero element to one element, it may flock. This is because only one thread will get this single element. This process will leave the queue empty and therefore the event must be reset. If there are 100 threads waiting, 99 of them will be awakened and the context will be switched (leading to the loss of all cache), all of which will only have to wait again.
Sometimes you need to notify an event when you hold the lock. If the wake-up thread needs to obtain the held lock, this may be unfortunate, because after it is awakened, it only finds that it must wait again. This is a waste of resources and will increase the total number of context switches. This situation is called a two-step dance. If many locks and events are involved, they may be far beyond the scope of the two steps. Both Win32 and CLR conditional variables support two-step dance in essence. It is usually inevitable or difficult to solve. The problem with two-step dance is worse on a single processor computer. When an event is involved, the kernel will apply the priority to the wake-up thread. This ensures that the thread is occupied first so that the event can be set before the lock is released. This is a two-step dance in extreme cases, where threada is set to switch out the context, so that threadb can try to get the lock; of course, it cannot, therefore, it will perform context switching so that threada can run again; eventually, threada will release the lock, which will again increase the priority of threadb so that it takes precedence over
Threada so that it can run. As you can see, this involves multiple useless context switches.
Priority inversion modifying thread priority is often self-defeating. When many threads with different priorities share access to the same lock and resource, priority inversion may occur, that is, threads with lower priorities actually stop the progress of higher-priority threads indefinitely. The reason for this example is to avoid changing the thread priority as much as possible. The following is an extreme example of priority reversal. Suppose threada with a lower priority gets a lock L. Then the high-priority threadb is involved. It tries to get l, but it cannot get it because threada occupies it. The following is the "reverse" part: It seems threada is temporarily given a priority higher than threadb, because it holds the lock required by threadb. When threada releases the lock, this situation will be resolved on its own. Unfortunately, if threadc with a medium priority is involved, imagine what will happen. Although threadc does not need to lock l, its existence may fundamentally stop threada from running, which indirectly prevents high-priority threadb from running. In the end, the Windows balance set manager thread will notice this situation. Even if threadc is always running, threada will eventually receive a temporary priority escalation command from the operating system (after four seconds. Hopefully this is enough to make it complete and release the lock. However, the latency (four seconds) is huge. If any user interface is involved, the application user will certainly notice this problem.
Now I have found one problem after another. The good news is that I have several design patterns that you can follow to reduce the above problems (especially the correctness risks) the frequency of occurrence. The key to most problems is that the State is shared among multiple threads. Worse, this state can be controlled at will, and can be converted from a consistent state to an inconsistent state, and then (hopefully) re-converted back, with surprising regularity. This is useful when developers write code for a Single-threaded program. As you move towards the correct goal, shared memory is likely to be used as a temporary memory. The C-language imperative programming language has been working in this way for many years. However, as concurrency is increasing, you need to pay close attention to these habits. You can follow Haskell, lisp, scheme, ML, or even F # (a compliance. net) and other functional programming languages, that is, using immutability, purity and isolation as a type of design concept.
A Data Structure with immutability is a structure that will not change after construction. This is a wonderful attribute of a concurrent program, because if the data does not change, there is no risk of conflict even if many threads access it at the same time. This means that synchronization is not a consideration. Const is supported in C ++ and read-only modifier is supported in C. For example, only the. Net Type with read-only fields remains unchanged. By default, F # creates a fixed type unless you use a variable modifier. Further, if each field itself points to another type where the field is read-only (and only points to the deep immutable type), the type is deeply immutable. This will generate a complete object chart that never changes, and it will be very useful. All of this shows that immutability is a static property. By convention, the object can also be fixed, that is, to some extent, the state will not change in a certain period of time. This is a dynamic attribute. The freeze function of Windows Presentation Foundation (WPF) can achieve this, it also allows parallel access without synchronization (but it cannot be checked in a way that handles static support ). Dynamic immutability is usually useful for objects that need to be converted between fixed and variable throughout their lifetime. Immutability also has some drawbacks. As long as the content needs to be changed, a copy of the original object must be generated and the change must be applied during this process. In addition, loops (except dynamic immutability) are usually not allowed in object charts ). For example, assume that you have an immutablestack <t>, as shown in figureFigure 4. You need to return a new immutablestack <t> object from the object that contains the application changes, rather than a set of changed push and pop methods. In some cases, you can flexibly use some tips (same as the stack) to share memory between instances.
Figure 4 Use immutablestack
Copy code
public class ImmutableStack<T> { private readonly T m_value; private readonly ImmutableStack<T> m_next; private readonly bool m_empty; public ImmutableStack() { m_empty = true; } internal ImmutableStack(T value, Node next) { m_value = value; m_next = next; m_empty = false; } public ImmutableStack<T> Push(T value) { return new ImmutableStack(value, this); } public ImmutableStack<T> Pop(out T value) { if (m_empty) throw new Exception("Empty."); return m_next; }}
When a node is pushed, a new object must be assigned to each node. You must perform this operation in the standard Link List Implementation of the stack. However, when you pop up an element from the stack, you can use an existing object. This is because each node in the stack is fixed. Fixed types are everywhere. The system. string class of CLR is fixed and there is also a design guiding principle, that is, all new value types should be fixed. The guiding principle provided here is to use immutability and resist the temptation to execute changes when feasible and appropriate, and the latest generation of languages will make it very convenient.
Purity even if a fixed data type is used, most of the operations performed by the program are still method calls. Method calls may have some side effects, which may cause problems in the concurrent code, because the side effects mean some form of changes. This usually only indicates writing to the shared memory, but it may also be actually changing operations, such as database transactions, Web service calls, or file system operations. In many cases, I want to be able to call a method without having to worry about its risk of concurrency. Some good examples are gethashcode and tostring on system. object.
And other simple methods. Many people do not want their side effects. Pure methods can always be run in concurrent settings without adding synchronization. Although purity is not supported in any common language, you can easily define pure methods:
- It only reads data from the shared memory and only reads data in the unchanged or normal state.
- It must be able to write local variables.
- It can only call other pure methods.
Therefore, pure methods provide very limited functions. However, when combined with the same type, purity becomes possible and very convenient. Some functional languages use purity by default, especially Haskell, and all its content is pure. Any content that requires side effects must be encapsulated into a special content called monad. But most of us do not use Haskell, so we must follow the purity conventions.
Isolation we only briefly mentioned release and privatization, but they hit the core of a very important issue. Because the State is usually shared among multiple threads, synchronization is essential (immutability and purity are also interesting ). If the status is restricted to a single thread, no synchronization is required. This causes the software to be more scalable in nature. In fact, if the status is isolated, it can be changed freely. This is very convenient, because changes are the basic built-in functions of most C-style languages. Programmers are used to this. This requires training to focus on functional styles in programming, which is quite difficult for most developers. Try it, but do not pretend that the world will change to functional programming overnight. Ownership is a difficult task to track. When is the object shared? During initialization, this is done by a single thread, and the object itself cannot be accessed from other threads. Stores references to an object in a static variable, a location shared when a thread creates or arranges a queue, or an object field that is accessible from a location in the queue. after, this object becomes a shared object. Developers must pay special attention to these conversions between private and sharing, and be careful when handling all sharing statuses.
Joe DuffyMicrosoft is the Development Director of. Net parallel expansion. He spent most of his time attacking code, supervising library design, and managing the dream development team. His latest book is concurrent programming on Windows..