In-depth parsing of the thread synchronization method in Python

Source: Internet
Author: User
Python can create multiple threads, but due to the existence of GIL, multiple Python threads cannot run simultaneously, so synchronization between threads becomes more important, here we will go deep into parsing the thread synchronization method in Python. For more information, see Synchronous access to shared resources

When using threads, a very important issue is to avoid access conflicts between multiple threads on the same variable or other resources. Once you are not careful, overlapping access and modifying (sharing resources) in multiple threads will lead to various problems. more seriously, these problems generally occur only when they are extreme (such as high concurrency, production servers, and even hardware devices with better performance.
For example, you need to track the number of times an event is handled.

counter = 0def process_item(item):  global counter  ... do something with item ...  counter += 1

If you call this function in multiple threads at the same time, you will find that the counter value is not so accurate. In most cases, it is right, but sometimes it is a few fewer than actually.
The reason for this is that the count increase operation is actually executed in three steps:

  • The interpreter obtains the current counter value.
  • Calculate new value
  • Write the new calculated value back to the counter variable.

Consider this situation: After the current thread obtains the counter value, the other thread grabs the CPU, and then obtains the counter value, and further recalculates the counter value and completes the write-back; after that, the time slice re-turns to the current thread (here only for identification, not the actual current). at this time, the current thread obtains the counter value, after completing the next two steps, the counter value is actually added with only 1.
Another common case is incomplete or inconsistent access. This occurs mainly when a thread is initializing or updating data, but another process tries to read the data being changed.

Atomic operation
The simplest way to achieve synchronous access to shared variables or other resources is to rely on the atomic operations of the interpreter. An atomic operation is performed in one step. in this step, other threads cannot obtain the shared resource.
Generally, this synchronization method is only valid for shared resources that only consist of a single core data type, such as string variables, numbers, lists, or dictionaries. The following are several thread-safe operations:

  • Read or replace an instance property
  • Read or replace a global variable
  • Retrieve an element from the list
  • Modify a list in the original position (for example, add a list item using append)
  • Obtain an element from the dictionary.
  • Modify a dictionary in the original state (for example, add a dictionary item and call the clear method)

Note: As mentioned above, it is not thread-safe to read a variable or attribute, modify it, and write it back. This is because another thread will change the shared variable/attribute before it completes reading but not modifying it or before the write-back is complete.

Lock

The lock is the most basic synchronization mechanism provided by the threading module of Python. At any time, a lock object may be acquired by a thread or not by any thread. If a thread tries to obtain a lock object that has been obtained by another thread, the thread that wants to obtain the lock object can only temporarily terminate the execution until the lock object is released by another thread.
A lock is usually used for synchronous access to shared resources. Create a Lock object for each shared resource. when you need to access the resource, call The acquire method to obtain the Lock object (if other threads have obtained the Lock, the current thread needs to wait for it to be released). after the resource is accessed, call the release method to release the lock:

lock = Lock()lock.acquire() #: will block if lock is already held... access shared resourcelock.release()

Note that the lock should be released even if an error occurs during access to shared resources. try-finally:

lock.acquire()try:  ... access shared resourcefinally:  lock.release() #: release lock, no matter what

In Python 2.5 and later versions, you can use the with statement. When the lock is used, the with statement will automatically obtain the lock object before entering the statement block, and then automatically release the lock after the statement block is executed:

from __future__ import with_statement #: 2.5 onlywith lock:  ... access shared resource

The acquire method has an optional wait identifier, which can be used to set whether to block when other threads occupy the lock. If you set the value to False, The acquire method will not be blocked, but if the lock is occupied, it will return False:

If not lock. acquire (False):... failed to lock the resource else: try:... access shared resource finally: lock. release ()

You can use the locked method to check whether a lock object has been obtained. Note that you cannot use this method to determine whether the acquire method will be blocked, this lock may be occupied by other threads between the completion of the locked method call and the execution of the next statement (such as acquire.

If not lock. locked (): #: other threads may occupy the lock before the execution of the next statement. lock. acquire () #: may block

Disadvantages of simple lock
The standard lock object does not care which thread occupies the lock. if the lock has been occupied, any other thread trying to obtain the lock will be blocked, even the thread that occupies the lock. Consider the following example:

Lock = threading. lock () def get_first_part (): lock. acquire () try :... obtain the first part of data from the shared object finally: lock. release () return datadef get_second_part (): lock. acquire () try :... obtain the second part of data from the shared object finally: lock. release () return data

In this example, we have a shared resource and two functions that take the first part and the second part of the shared resource respectively. Both access functions use locks to ensure that no other threads modify the corresponding shared data when obtaining data.
Now, if we want to add a third function to obtain the data in two parts, we will be stuck in the quagmire. A simple method is to call these two functions in sequence and then return the combined result:

def get_both_parts():  first = get_first_part()  seconde = get_second_part()  return first, second

The problem here is that if a thread modifies shared resources between two function calls, we will eventually get inconsistent data. The most obvious solution is to use lock in this function:

  def get_both_parts():    lock.acquire()    try:      first = get_first_part()      seconde = get_second_part()    finally:      lock.release()    return first, second

However, this is not feasible. The two access functions will be blocked because the outer statement already occupies the lock. To solve this problem, you can use the tag to release the lock for the outer statement in the access function, but this will easily lead to loss of control and errors. Fortunately, the threading module contains a more practical lock implementation: re-entrant lock.
Re-Entrant Locks (RLock)

The RLock class is another version of the simple lock. it features that the same lock object will be blocked only when it is occupied by other threads and sometimes it is obtained; simple locks can only be occupied once in the same thread. If the current thread already occupies an RLock object, the current thread can still obtain the RLock object again.

Lock = threading. lock () lock. acquire () lock. acquire () #: lock = threading will be blocked here. RLock () lock. acquire () lock. acquire () #: blocking will not occur here

RLock is mainly used to solve the problem of nested access to shared resources, as shown in the preceding example. To solve the problem in the previous example, we only need to replace the Lock with the RLock object, so that the nested call will also be OK.

lock = threading.RLock()def get_first_part():  ... see abovedef get_second_part():  ... see abovedef get_both_parts():  ... see above

In this way, two data parts can be accessed separately or simultaneously without being blocked by locks or inconsistent data.
Note that RLock will trace the recursive level, so remember to perform the release operation after acquire.
Semaphores

Semaphores are a more advanced lock mechanism. There is a counter inside the semaphore, unlike the lock identifier inside the lock object, and the thread is blocked only when the number of threads occupying the Semaphore exceeds the semaphore. This allows multiple threads to access the same code zone at the same time.

Semaphore = threading. BoundedSemaphore () semaphore. acquire () #: counter decreases

... Access shared resources

Semaphore. release () #: increase counter

When the semaphore is obtained, the counter decreases. when the semaphore is released, the counter increases. When the semaphore is obtained, if the counter value is 0, the process will be blocked. When a semaphore is released and the counter value is increased to 1, one of the blocked threads (if any) can continue running.
Semaphores are usually used to restrict access to resources with limited capacities, such as a network connection or database server. In such scenarios, you only need to initialize the counter to the maximum value, and the semaphore implementation will complete the rest for you.

max_connections = 10semaphore = threading.BoundedSemaphore(max_connections)


If you do not pass any initialization parameters, the counter value will be initialized to 1.
The threading module of Python provides two Semaphore implementations. The Semaphore class provides an infinite volume of semaphores. you can call release to increase the counter value any time. To avoid errors, it is best to use the BoundedSemaphore class, so that when you call release more than the number of acquire times, the program will send an error reminder.
Thread synchronization

The lock can be used for synchronization between threads. The threading module contains classes used for synchronization between threads.
Events

An event is a simple synchronization object. an event is represented as an internal identifier (internal flag). the thread waits for this identifier to be set by other threads, or sets or clears it by itself.

Event = threading. Event () #: a client thread waits for the flag to be set event. wait () #: The server thread sets or clears flagevent. set () event. clear ()

Once the identifier is set, the wait method does not process (does not block). when the identifier is cleared, wait will be blocked until it is reset. Any number of threads may wait for the same event.
Conditions

The condition is the advanced version of the event object. A condition is a state change in the program. the thread can wait for the signal of a given condition or condition.
The following is a simple producer/consumer instance. First, you need to create a condition object:

#: Condition = threading. condition () the producer thread must obtain the Condition before notifying the consumer thread to generate a new resource: #: producer thread... production Resource Item condition. acquire ()... add the resource item to the resource. running Y () #: sends a signal with available resources, condition. release () the consumer must obtain the condition (and the associated lock), and then try to obtain the resource item from the resource: #: Consumer thread condition. acquire () while True :... obtain the resource item if item: break condition from the resource. wait () #: sleep until a new resource condition exists. release ()... process resources

The wait method releases the lock and blocks the current thread until other threads call the Y or yyall methods of the same condition object, and then obtain the lock again. If multiple threads are waiting at the same time, the Y method will only wake up one of the threads, while the notifyAll method will wake up all threads.
To avoid blocking in the wait method, you can pass in a timeout parameter, a floating point number in seconds. If the timeout parameter is set, wait will return at the specified time, even if notify is not called. Once timeout is used, you must check the resource to determine what has happened.
Note that the condition object is associated with a lock and you must obtain the lock before the access condition. Similarly, you must release the lock when you complete access to the condition. In the production code, you should use try-finally or.
You can associate a condition with an existing lock by using the lock object as a parameter of the condition constructor. This allows multiple conditions to share one resource:

lock = threading.RLock()condition_1 = threading.Condition(lock)condition_2 = threading.Condition(lock)

Mutex lock synchronization
Let's take an example:

#! /Usr/bin/env python #-*-coding: UTF-8-*-import time, threading # assume this is your bank deposit: balance = 0 muxlock = threading. lock () def change_it (n): # first save and then fetch. The result should be 0: global balance = balance + n balance = balance-ndef run_thread (n ): # once the number of loops increases, the last number is not 0 for I in range (100000): change_it (n) t1 = threading. thread (target = run_thread, args = (5,) t2 = threading. thread (target = run_thread, args = (8,) t3 = threading. thread (target = run_thread, args = (9,) t1.start () t2.start () t3.start () t1.join () t2.join () t3.join () print balance

Result:

[/data/web/test_python]$ python multhread_threading.py0[/data/web/test_python]$ python multhread_threading.py61[/data/web/test_python]$ python multhread_threading.py0[/data/web/test_python]$ python multhread_threading.py24

The above example introduces the most common problem of multi-threaded programming: data sharing. When multiple threads modify a shared data, synchronous control is required.
Thread synchronization ensures secure access to competing resources by multiple threads. the simplest synchronization mechanism is to introduce mutex locks. The mutex lock introduces a status for the resource: locked/unlocked. When a thread wants to change the shared data, it first locks it. at this time, the resource status is "locked" and other threads cannot be changed until the thread releases resources, changes the resource status to "unlocked", and other threads can lock the resource again. The mutex lock ensures that only one thread writes data at a time, thus ensuring data correctness in the case of multiple threads.

The threading module defines the Lock class to facilitate Lock handling:

# Create a Lock mutex = threading. Lock () # Lock mutex. acquire ([timeout]) # release mutex. release ()

The locking method acquire can have an optional timeout parameter timeout. If timeout is set, you can use the returned value to determine whether the lock is obtained after the timeout, so that you can perform other processing.
The code for using mutex to implement the above example is as follows:

Balance = 0 muxlock = threading. lock () def change_it (n): # obtain the Lock and ensure that only one thread operates on this number muxlock. acquire () global balance = balance + n balance = balance-n # release the lock and continue to operate muxlock on other blocked threads. release () def run_thread (n): for I in range (10000): change_it (n)

After the lock, the data is correct:

[/data/web/test_python]$ python multhread_threading.py0[/data/web/test_python]$ python multhread_threading.py0[/data/web/test_python]$ python multhread_threading.py0[/data/web/test_python]$ python multhread_threading.py0

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.