From: http://blog.csdn.net/fengge8ylf/article/details/6896380
作者:王東
1.1 什麼是條件變數和條件等待。
簡單的說:
條件變數(condition variable)是利用線程間共用的全域變數進行同步的一種機制,主要包括兩個動作:一個線程等待某個條件為真,而將自己掛起;另一個線程使的條件成立,並通知等待的線程繼續。為了防止競爭,條件變數的使用總是和一個互斥鎖結合在一起。
Wiki中的定義如下:
Conceptually a condition variable is a queue of threads, associated with a monitor, on which a thread may wait for some condition to become true. Thus each condition variable c is associated with an assertion P. While a thread is waiting on a condition variable, that thread is not considered to occupy the monitor, and so other threads may enter the monitor to change the monitor's state. In most types of monitors, these other threads may signal the condition variable c to indicate that assertion P is true in the current state[1].
條件變數(condition variable)是一種特殊的同步變數,它是與一個互斥量(monitor)關聯的線程隊列,條件變數都與一個斷言(assertion) P關聯,因為其中的線程隊列中有一個線程在等待這個斷言P為真。當一個線程處於等待條件變數(condition variable)時,該線程不再佔用互斥量(monitor),讓其他線程能夠進入互斥區去改變條件狀態。
在條件變數上有兩種基本操作:
l 等待(wait):一個線程因為等待斷言(assertion) P為真而處於等待在條件變數上,此時線程不會佔用互斥量(monitor);
l 通知(signal/notify):另一個線程在使得斷言(assertion) P為真的時候,通知條件變數。
一個線程發生signal時,另一個線程被啟用,那麼兩個線程都佔用的互斥量(monitor), 選擇哪個線程來佔用互斥,這就分為了Blocking condition variables(把優先順序給被通知的線程)和Nonblocking condition variables(把優先順序給發出signal通知的線程[1]。
使用條件等待有如下的情境:
多線程訪問一個互斥地區內的資源,如果擷取資源的條件不夠時,則線程需要等待,直到其他線程釋放資源後,喚醒等待條件,使得線程得以繼續。例如:
Thread1:
Lock (mutex)
while (condition is false) {
//為什麼要在這裡用while而不是if呢?
//參考1.2.1條件變數存在的問題
Cond_wait(cond, mutex, timeout)
}
DoSomething()
Unlock (mutex)
Thread2:
Lock (mutex)
…
condition is true
Cond_signal(cond)
Unlock (mutex)
例如 Thread1從一個大小為50的連結池中擷取一個連結,如果已經用的連結達到50時,那該線程必須等待一個條件。 Thread2 用完一個連結時,將該連結還給連結池,然後發送條件notify,告訴Thread1 可以繼續了。. 1.1.1 關於條件變數(condition variable)和訊號量(Semaphore)
訊號量(Semaphore)是一個非負的整數計數器,被用於進程或線程間的同步與互斥。
通過訊號量可以實現 “PV操作”這種進程或線程間的同步機制。
P操作是獲得資源,將訊號量的值減1,如果結果不為負則繼續執行,線程獲得資源,否則線程被阻塞,處於睡眠狀態,直到等待的資源被別的線程釋放;
V操作則是釋放資源,給訊號量的值加1,釋放一個因執行P操作而等待的線程。
最簡單的號誌形式,號誌的值只能取0或1,類似於mutex。
當訊號量的值為任意非負值(大於1),其值就代表可用資源的個數。
可以將訊號量Semaphore和互斥鎖(mutex)來實現一個來實現對一個池的同步和保護。使用mutex來實現同步,使用semaphore用於實現對資源記數。
獲得資源的線程:
sem_wait (semaphore1)
Lock (mutex)
…
Unlock (mutex)
sem_post (semaphore2)
釋放資源的線程:
sem_wait (semaphore2)
Lock (mutex)
…
Unlock (mutex)
sem_post (semaphore1)
這個模型很像多線程的生產者與消費者模型,這裡的semaphore2是為了防止過度釋放。
比起訊號量來說,條件變數可以實現更為複雜的等待條件。當然,條件變數和互斥鎖也可以實現訊號量的功能(window下的條件變數只能實現線程同步不能實現進程同步)。
在Posix.1基本原理一文聲稱,有了互斥鎖和條件變數還提供訊號量的原因是:“本標準提供訊號量的而主要目的是提供一種進程間同步的方式;這些進程可能共用也可能不共用記憶體區。互斥鎖和條件變數是作為線程間的同步機制說明的;這些線程總是共用(某個)記憶體區。這兩者都是已廣泛使用了多年的同步方式。每組原語都特別適合於特定的問題”。儘管訊號量的意圖在於進程間同步,互斥鎖和條件變數的意圖在於線程間同步,但是訊號量也可用於線程間,互斥鎖和條件變數也可用於進程間。應當根據實際的情況進行決定。訊號量最有用的情境是用以指明可用資源的數量[11]。
個人的感覺是:由於起源不同,導致了兩種理念,一中理念力挺條件變數(condition variable),覺得訊號量沒有什麼用(例如POSIX Thread模型中沒有訊號量的概念,雖然也提出了Posix Semaphore,但是為什麼一開始不把它放在一起呢。);另一理念恰好相反(例如window剛開始沒有條件變數的概念,只有訊號量的概念)。
進化到後來,目前的linux和window都同時具備了這二者。
1.2 Linux中的條件等待函數是那些。
Linux提供了的條件等待函數和notify函數。
l pthread_cond_timedwait(cond, mutex, abstime);
l pthread_cond_wait(cond, mutex);
l pthread_cond_signal(cond); 將至少解鎖一個線程(阻塞在條件變數上的線程)。
l pthread_cond_broadcast(cond) : 將對所有阻塞在條件變數上的線程解鎖。
線程1調用pthread_cond_wait() 所做的事 三個部分:
1. 同時對mutex解鎖,
2. 並等待條件 cond 發生
3. 獲得通知後,對mutex加鎖;
調用pthread_cond_wait()後,同時對mutex解鎖,並等待條件 cond 發生(要求解鎖並阻塞是一個原子操作)
現在互斥對象已被解鎖,其它線程可以進入互斥地區,修改條件。
此時,pthread_cond_wait() 調用還未返回。等待條件 mycond是一個阻塞操作,這意味著線程將睡眠,在它蘇醒之前不會消耗 CPU 週期。直到特定條件發生[3]。
假設另一個線程2對mutex加鎖, 並改變條件, 然後調用函數 pthread_cond_signal() 啟用等待條件。這意味著線程1現在將蘇醒。此時線程1試圖對mutex加鎖,由於線程2還沒有對mutex解鎖,所以線程1隻有等待,只有線上程2對mutex解鎖後,線程1優先獲得mutex加鎖,然後就能做想做的事情了。
這裡是存在問題的:如何讓線程1優先獲得mutex加鎖,而不是其他線程,pthread_mutex_lock 的虛擬碼[4]中展示了這種實現的可能性,signal函數中優先啟用了wait中的線程。
pthread_cond_wait(mutex, cond):
value = cond->value;
pthread_mutex_unlock(mutex);
pthread_mutex_lock(cond->mutex);
if (value == cond->value) {
me->next_cond = cond->waiter;
cond->waiter = me;
pthread_mutex_unlock(cond->mutex);
unable_to_run(me);
} else
pthread_mutex_unlock(cond->mutex);
pthread_mutex_lock(mutex);
pthread_cond_signal(cond):
pthread_mutex_lock(cond->mutex);
cond->value++;
if (cond->waiter) {
sleeper = cond->waiter;
cond->waiter = sleeper->next_cond;
able_to_run(sleeper);
}
pthread_mutex_unlock(cond->mutex);
下面的例子展示了使用條件變數的範例程式碼[2]:
其中一個或多個線程負責count數增加(inc_count),另一個線程負責監聽count數,一旦達到COUNT_LIMIT,就報告(watch_count)。
void inc_count (void) {
…
pthread_mutex_lock(&count_mutex);
count++;
if (count == COUNT_LIMIT) {
pthread_cond_signal(&count_threshold_cv);
printf("inc_count(): thread %ld, count = %d Threshold reached./n",
my_id, count);
}
printf("inc_count(): thread %ld, count = %d, unlocking mutex/n",
my_id, count);
pthread_mutex_unlock(&count_mutex);
…
}
void watch_count (void) {
…
pthread_mutex_lock(&count_mutex);
while (count<COUNT_LIMIT) {
pthread_cond_wait(&count_threshold_cv, &count_mutex);
printf("watch_count(): thread %ld Condition signal received./n", my_id);
count += 125;
printf("watch_count(): thread %ld count now = %d./n", my_id, count);
}
pthread_mutex_unlock(&count_mutex);
…
} 1.2.1 條件變數中存在的問題:虛假喚醒
Linux中協助中提到的:
在多核處理器下,pthread_cond_signal可能會啟用多於一個線程(阻塞在條件變數上的線程)。 On a multi-processor, it may be impossible for an implementation of pthread_cond_signal() to avoid the unblocking of more than one thread blocked on a condition variable.
結果是,當一個線程調用pthread_cond_signal()後,多個調用pthread_cond_wait()或pthread_cond_timedwait()的線程返回。這種效應成為”虛假喚醒”(spurious wakeup) [4]
The effect is that more than one thread can return from its call to pthread_cond_wait() or pthread_cond_timedwait() as a result of one call to pthread_cond_signal(). This effect is called "spurious wakeup". Note that the situation is self-correcting in that the number of threads that are so awakened is finite; for example, the next thread to call pthread_cond_wait() after the sequence of events above blocks.
雖然虛假喚醒在pthread_cond_wait函數中可以解決,為了發生機率很低的情況而降低邊緣條件(fringe condition)效率是不值得的,糾正這個問題會降低對所有基於它的所有更進階的同步操作的並發度。所以pthread_cond_wait的實現上沒有去解決它。
While this problem could be resolved, the loss of efficiency for a fringe condition that occurs only rarely is unacceptable, especially given that one has to check the predicate associated with a condition variable anyway. Correcting this problem would unnecessarily reduce the degree of concurrency in this basic building block for all higher-level synchronization operations.
所以通常的標準解決辦法是這樣的:
將條件的判斷從if 改為while
pthread_cond_wait中的while()不僅僅在等待條件變數前檢查條件變數,實際上在等待條件變數後也檢查條件變數。
這樣對condition進行多做一次判斷,即可避免“虛假喚醒”.
這就是為什麼在pthread_cond_wait()前要加一個while迴圈來判斷條件是否為假的原因。
有意思的是這個問題也存在幾乎所有地方,包括: linux 條件等待的描述, POSIX Threads的描述, window API(condition variable), java等等。
l 在linux的協助中對條件變數的描述是[4]:
添加while檢查的做法被認為是增加了程式的健壯性,在IEEE Std 1003.1-2001中認為spurious wakeup是允許的。
An added benefit of allowing spurious wakeups is that applications are forced to code a predicate-testing-loop around the condition wait. This also makes the application tolerate superfluous condition broadcasts or signals on the same condition variable that may be coded in some other part of the application. The resulting applications are thus more robust. Therefore, IEEE Std 1003.1-2001 explicitly documents that spurious wakeups may occur.
l 在POSIX Threads中[5]:
David R. Butenhof 認為多核系統中 條件競爭(race condition [8])導致了虛假喚醒的發生,並且認為完全消除虛假喚醒本質上會降低了條件變數的操作效能。
“…, but on some multiprocessor systems, making condition wakeup completely predictable might substantially slow all condition variable operations. The race conditions that cause spurious wakeups should be considered rare”
l 在window的條件變數中[6]:
MSDN協助中描述為,spurious wakeups問題依然存在,條件需要重複check。
Condition variables are subject to spurious wakeups (those not associated with an explicit wake) and stolen wakeups (another thread manages to run before the woken thread). Therefore, you should recheck a predicate (typically in a while loop) after a sleep operation returns.
l 在Java中 [7],對等待的寫法如下:
synchronized (obj) {
while (<condition does not hold>)
obj.wait();
... // Perform action appropriate to condition
}
Effective java 曾經提到Item 50: Never invoke wait outside a loop.
顯然,虛假喚醒是個問題,但它也是在JLS的第三版的JDK5的修訂中才得以澄清。在JDK 5的Javadoc進行更新
A thread can also wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. While this will rarely occur in practice, applications must guard against it by testing for the condition that should have caused the thread to be awakened, and continuing to wait if the condition is not satisfied. In other words, waits should always occur in loops.
Apparently, the spurious wakeup is an issue (I doubt that it is a well known issue) that intermediate to expert developers know it can happen but it just has been clarified in JLS third edition which has been revised as part of JDK 5 development. The javadoc of wait method in JDK 5 has also been updated
1.3 &