Introduction & Motivation
Considering this scenario, we have 10,000 data to deal with, it takes 1 seconds to process each piece of data, but it takes only 0.1 seconds to read the data, and each data does not interfere. How does it take to spend the shortest time?
Before multi-threaded (MT) http://www.php.cn/wiki/44.html "target=" _blank > Programming occurs, the computer program is run by an execution sequence that executes sequentially in the host's central processing unit (CPU). The program executes in this manner, whether the task itself requires sequential execution or if the entire program is composed of multiple subtasks. This is true even if the subtasks are independent of each other (that is, the result of one subtask does not affect the results of other subtasks).
For the above problem, if we use an execution sequence to complete, we will need to spend 10000*0.1 + 10000 = 11,000 seconds. This time is obviously too long.
Is it possible for us to take data while performing calculations? Or do you work on several data at the same time? If you can, this can greatly improve the efficiency of the task. This is the purpose of multithreaded programming.
For essentially asynchronous, multiple concurrent transactions are required, and the order in which each transaction runs can be an indeterminate, random, unpredictable problem, and multithreading is the ideal solution. Such a task can be divided into multiple execution flows, each of which has a goal to be completed, and then merges the resulting results to get the final result.
Threads and processes
What is a process
A process (sometimes called a heavyweight process) is a single execution of a program. Each process has its own address space, memory, data stack, and other auxiliary data that records its running trajectory. The operating system manages all the processes running on it and distributes the time fairly for those processes. The process can also perform other tasks through fork and spawn operations. However, each process has its own memory space, data stack, etc., so it can only use interprocess communication (IPC), and cannot share information directly.
What is a thread
Threads (sometimes referred to as lightweight processes) are somewhat similar to processes, but all threads run in the same process and share the same running environment. They can be imagined as "mini-processes" running in parallel in the main process or "main thread".
Thread state
Line threads begins, sequentially executes and ends three parts. It has an instruction pointer of its own, which records where it runs. The running of the thread may be preempted (interrupted), or temporarily suspended (also called sleep), allowing other threads to run, which is called concession. Each thread in a process shares the same piece of data space, so the threads can more easily share data and communicate with each other than between processes.
Of course, such sharing is not entirely free of risk. If multiple threads are accessing the same piece of data together, the problem of inconsistent data results can be caused by different order of data access. This is called a race condition (race condition).
Threads are generally executed concurrently, but in a single CPU system, real concurrency is not possible, and each thread is scheduled to run only a small session at a time, then let the CPU out and let the other threads run. Since some functions will block before completion, this "greedy" function will skew the CPU's time allocation without making modifications specifically for multithreading. The running times that are assigned to each thread may vary and are not fair.
Python, thread, and global interpreter locks
Global interpreter Lock (GIL)
The first thing to make clear is that the Gil is not a Python feature, it is a concept introduced when implementing the Python parser (CPython). Just like C + + is a set of language (syntax) standards, but can be compiled into executable code with different compilers. The same piece of code can be executed through different Python execution environments such as Cpython,pypy,psyco (where the Jpython has no Gil).
So what is the Gil in CPython implementation? Gil full name Global interpreter lock to avoid misleading, let's take a look at the official explanation:
In CPython, the global interpreter lock, or GIL, was a mutex that prevents multiple native threads from executing Python by Tecodes at once. This lock is necessary mainly because CPython ' s memory management are not thread-safe. (However, since the GIL exists, other features has grown to depend on the guarantees that it enforces.)
Although Python fully supports multithreaded programming, the C-language implementation of the interpreter is not thread-safe in fully parallel execution. In fact, the interpreter is protected by a global interpreter lock, which ensures that only one Python thread executes at any time.
In a multithreaded environment, the Python virtual machine executes as follows:
Set Gil
Switch to a thread to execute
Run
Set the thread to sleep state
Unlock Gil
Repeat the above steps again
For all I/O-oriented programs that call the built-in operating system C code, the GIL is freed before the I/O call to allow other threads to run while the thread waits for I/O. If a thread does not use many I/O operations, it will occupy the processor (and GIL) on its own time slice. In other words, I/O intensive Python programs take advantage of multithreaded environments more than computationally intensive programs.
Exit thread
When a thread ends the calculation, it exits. A thread can call an exit function such as thread.exit () or use Python to exit a process's standard method, such as sys.exit () or throwing a systemexit exception. However, you may not directly "kill" a thread.
Using Threads in Python
Python supports multithreaded programming when running on most Unix-like systems, such as Win32 and Linux, Solaris, MacOS, *bsd, and so on. Python uses POSIX-compliant threads, known as pthreads.
By default, as long as the interpreter
>> Import Thread
If there is no error, the thread is available.
Python's Threading Module
Python provides several modules for multithreaded programming, including thread, threading, and Queue. The thread and threading modules allow programmers to create and manage threads. The thread module is intended for basic threading and lock support, while the threading provides a higher level of functionality for more powerful thread management. The queue module allows a user to create a data structure that can be used to share information between multiple threads.
Core example: Avoid using the thread module
For the following considerations, we do not recommend that you use the thread module.
Higher-level threading modules are more advanced, support for threading is more complete, and the use of attributes in the thread module may conflict with threading. Second, low-level thread modules have very few synchronization primitives (actually only one), while threading modules have many.
When your process should end completely out of control, when the main thread ends, all threads will be forced to end, without warning and without proper cleanup work. As we have said before, at least the threading module ensures that important child threads exit after the process exits.
Thread module
In addition to generating threads, the thread module also provides basic synchronization data structure lock objects (lock object, also called primitive lock, simple lock, mutex, mutex, binary semaphore).
Thread module functions
Start_new_thread (function, args, kwargs=none): Generates a new thread that invokes the function with the specified parameters and optional Kwargs in the new thread.
Allocate_lock (): Assigning a Lock object of type LockType
Exit (): Let the thread exit
Acquire (Wait=none): Attempt to get lock object
Locked (): Returns True if the lock object is obtained, otherwise False
Release (): Releasing the lock
Here is an example of using thread:
Import threadfrom time import sleep, timedef loop (num): print (' Start loop at: ', Time ()) sleep (num) print (' Loop done at: ', Time ()) def LOOP1 (num): print (' Start loop 1 at: ', Time ()) sleep (num) print (' Loop 1 do at: ', t IME ()) def main (): print (' Starting at: ', Time ()) Thread.start_new_thread (Loop, (4,)) Thread.start_new_ Thread (LOOP1, (5,)) sleep (6) print (' All do at: ', Time ()) If name = = ' main ': Main () (' starting at: ', 1489387024.886667) (' Start loop at: ', 1489387024.88705) (' Start loop 1 at: ', 1489387024.887277) (' Loop done at: ', 1489387028.888182) (' Loop 1 done @: ', 1489387029.888904) (' All done at: ', 1489387030.889918)
Start_new_thread () requirements must have the first two parameters. So, even if the function that we want to run does not have arguments, we have to pass an empty tuple.
Why add sleep (6) to this sentence? Because, if we don't make the main thread stop, the main thread will run the next statement, show "All done", and then close the two threads running Loop () and LOOP1 () and exit.
Is there a better way to replace this unreliable synchronization with sleep? The answer is to use a lock, and we can exit immediately after two threads have exited.
#!-*-coding:utf-8-*-import threadfrom time import sleep, timeloops = [4, 2]def Lo OP (Nloop, Nsec, lock): print (' Start loop%s at:%s '% (Nloop, Time ())) ' Sleep (nsec) print (' Loop%s did at:%s '% (Nloop, Time ())) # Each thread is assigned a pre-acquired lock, and after the time of sleep () the corresponding lock is released to inform the main thread that it is over. Lock.release () def main (): Print (' Starting at: ', Time ()) locks = [] Nloops = range (len (loops)) for I in Nloops: # Call the Thread.allocate_lock () function to create a list of locks lock = Thread.allocate_lock () # Get the Acquire () function that calls each lock separately, get the lock representation " Lock the Locks "lock.acquire () Locks.append (lock) for i in Nloops: # Create threads, each thread uses its own loop number, sleep time and lock parameters to call the loop () function Thread.start_new_thread (Loop, (I, loops[i], locks[i])) for i in Nloops: # The thread is going to do the unlock operation by itself at the end of the line # The current loop is just sitting there waiting (for the purpose of pausing the main thread) until two locks are unlocked before they continue to run. While Locks[i].locked (): Pass print (' All do at: ', Time ()) If name = = ' Main ': Main ()
Why don't we create a thread in the loop that creates the lock? For the following reasons:
We want to synchronize the threads so that "all horses are out of the fence at the same time."
Getting a lock takes some time, and if your thread exits "too fast", it can cause the thread to end up without getting the lock.
Threading Module
The threading module provides not only the Thread class, but also a variety of very useful synchronization mechanisms.
Here are all the objects in the threading module:
Thread: An object that represents the execution of a thread
Lock: The lock Primitive object (same as the lock object in the thread module)
Rlock: The lock object can be re-entered. Enables a single thread to obtain the acquired lock again (recursive locking).
Condition: A conditional variable object allows a thread to stop waiting for another thread to satisfy a certain "condition." For example, change of state or change in value.
Event: A generic condition variable. Multiple threads can wait for an event to occur, and all threads will be activated after the event.
Semaphore: A "waiting room" structure for a thread waiting for a lock
Boundedsemaphore: Similar to Semaphore, except that it is not allowed to exceed the initial value
Timer: Similar to Thread, except that it waits some time before it starts to run.
Daemon Threads
Another reason to avoid using the thread module is that it does not support daemon threads. When the main thread exits, all child threads are forced to quit, regardless of whether they are still working. Sometimes we don't expect this behavior, and then we introduce the concept of the daemon thread
The threading module supports daemon threads, and they work like this: The daemon is usually a server waiting for a client to request, and it waits there without a client requesting it. If you set a thread as the daemon, it means that you are saying that this thread is unimportant and that the process exits without waiting for the thread to exit.
If your main thread is going to exit without waiting for those child threads to complete, set the Daemon property of those threads. That is, the call to the Setdaemon () function to set the thread's daemon flag (Thread.setdaemon (True)) indicates that the thread is "unimportant" before it starts (calls Thread.Start ()).
If you want to wait for the child thread to finish and then exit, do nothing, or explicitly call Thread.setdaemon (false) to ensure that its daemon flag is false. You can call the Thread.isdaemon () function to determine the value of its daemon flag. The new child thread inherits the daemon flag of its parent thread. The entire Python will not end until all non-daemon threads have exited, that is, there is no non-daemon thread in the process.
Thread class
The thread class provides the following methods:
Run (): The method used to represent thread activity.
Start (): Initiates thread activity.
Join ([TIME]): Waits until the thread aborts. This blocks the calling thread until the thread's join () method is called abort-gracefully exits or throws an unhandled exception-or an optional timeout occurs.
Is_alive (): Returns whether the thread is active.
Name (): Sets/returns the thread name.
Daemon (): Returns/sets the daemon flag of the thread, be sure to set it before calling the start () function
With the thread class, you can create threads in a variety of ways. We are here to introduce three more similar methods.
Create an instance of thread and pass it to a function
Create an instance of the thread and pass it to a callable class object
Derive a subclass from thread and create an instance of this subclass
Below is an example of creating threads in three different ways:
#!-*-coding:utf-8-*-# Create an instance of thread, pass it a function import threadingfrom time Import Sleep, Timeloops = [4, 2]def Loop (Nloop, Nsec, lock): print (' Start loop%s at:%s '% (Nloop, Time ())) Sleep (nsec) Print (' Loop%s do at:%s '% (Nloop, Time ()) # Each thread is assigned a pre-acquired lock, and after the time of sleep () it releases the corresponding lock to inform the main thread that it has ended. def main (): Print (' Starting at: ', Time ()) threads = [] Nloops = range (len (loops)) for i in nloops:t = Threading. Thread (Target=loop, args= (i, loops[i])) Threads.append (t) for I in Nloops: # Start Threads Threads [I].start () for I-nloops: # Wait for all # join () waits for the thread to end, or when the timeout parameter is given, wait until the timeout expires. # using Join () will look a bit clearer (this lock is also known as "Spinlock") than using an infinite loop that waits for a lock to be released Threads[i].join () # Threads to finish print : ', Time ()) If name = = ' Main ': Main ()
Another way
is similar to passing a function is when creating a thread, passing an instance of a callable class for execution when the thread starts-a more object-oriented approach to multithreaded programming. This method is more flexible than one or several functions, because the class object can use the powerful functions of the class, which can save more information.
#! -*-Coding:utf-8-*-# Creates an instance of Thread that is passed to it a callable class object from threading import threadfrom time import sleep, timeloops = [4, 2]c Lass ThreadFunc (object): Def init (self, func, args, name= ""): Self.name = name Self.func = Func SE Lf.args = args def call (self): # When a new thread is created, the thread object calls our ThreadFunc object, and a special function called () is used. Self.func (*self.args) def loop (Nloop, nsec): Print (' Start loop%s at:%s '% (Nloop, Time ())) sleep (nsec) print (' Lo Op%s done at:%s '% (Nloop, Time ())) def main (): Print (' Starting at: ', Time ()) threads = [] Nloops = range (Len (lo OPS))) for i in nloops:t = Thread (Target=threadfunc (Loop, (I, loops[i]), loop.name)) Threads.append (t) For i in Nloops: # Start Threads Threads[i].start () for i in Nloops: # Wait for all # join () waits until the thread finishes, or when the timeout parameter is given, until the timeout is reached. # using Join () will look a bit clearer (this lock is also known as "Spinlock") than using a waiting lock to release the Infinite Loop Threads[i].join () # Threads to finish print ('All do at: ', Time ()) If name = = ' Main ': Main ()
The
Last example shows how to subclass the Thread class, which is very much like creating a callable class in the previous example. Using subclasses to create threads (第29-30 rows) makes the code look clearer.
#! -*-Coding:utf-8-*-# Creates an instance of Thread that is passed to it a callable class object from threading import threadfrom time import sleep, timeloops = [4, 2]c Lass MyThread (Thread): Def init (self, func, args, name= ""): Super (MyThread, self). init () Self.name = name Self.func = Func Self.args = args def getresult (self): return self.res def run (self): # When a new thread is created, the thread object calls our ThreadFunc object, and a special function called call () is used. print ' starting ', Self.name, ' at: ', time () Self.res = Self.func (*self.args) print Self.name, ' finished at: ', Time () def loop (Nloop, nsec): Print (' Start loop%s at:%s '% (Nloop, Time ())) sleep (nsec) print (' Loop%s-done at: %s '% (Nloop, Time ())) def main (): Print (' Starting at: ', Time ()) threads = [] Nloops = range (len (loops)) for I In nloops:t = MyThread (loops, (I, loops[i]), loop.name) Threads.append (t) for I in Nloops: # Star T threads Threads[i].start () for i in Nloops: # Wait for AlThe join () waits until the thread ends, or when the timeout parameter is given, until the time-out expires. # using Join () will look a bit clearer (this lock is also known as "Spinlock") than using an infinite loop that waits for a lock to be released Threads[i].join () # Threads to finish print : ', Time ()) If name = = ' Main ': Main ()
In addition to various synchronization objects and thread objects, the threading module also provides some functions.
Active_count (): The number of currently active thread objects
Current_thread (): Returns the current thread object
Enumerate (): Returns the list of currently active threads
Settrace (func): Set up a trace function for all threads
Setprofile (func): Set a profile function for all threads
Lock & Rlock
The primitive lock is a synchronous primitive, and the state is locked or unlocked. Two methods Acquire () and release () are used for locking and releasing locks.
Rlock a reentrant lock is a synchronization primitive similar to a lock object, but the same thread can be called multiple times.
Lock does not support recursive locking, which means that even in the same thread, you must wait for the lock to be released. It is generally recommended to change Rlock, which handles the "owning thread" and "recursion level" states, with multiple requests for the same thread being locked, only accumulating
Counter. Each release () decrements the counter until 0 o'clock to release the lock, so acquire () and release () must appear in pairs.
From time Import sleepfrom threading Import current_thread, Threadlock = Rlock () def show (): With Lock: print Curre Nt_thread (). Name, I Sleep (0.1) def test (): With Lock: For I in a range (3): Show (i) for I in Range (2): Thread (Target=test). Start ()
Event
Events are used to communicate between threads. One thread sends a signal, and one or more threads wait.
Event coordinates multi-line Cheng by using an internal tag. The wait () block thread holds until it is marked as True. Set () sets the Mark to True,clear () to change the mark to False. IsSet () is used to determine the flag status.
From threading Import Eventdef test_event (): e = event () def Test (): For I in range (5): print ' Start wait ' e.wait () e.clear () # if Clear () is not called, then the tag has been true,wait () and there will be no blocking behavior print ithread (target=test). Start () return EE = test_event ()
Condition
The condition variable, like the lock parameter, is also a synchronous primitive that uses this lock when it is necessary for a thread to focus on a particular state change or event occurrence.
It can be assumed that, in addition to the lock pool with lock, the condition also contains a wait pool in which the thread in the pool waits for blocking state until another thread calls notify ()/notifyall () notification, and the thread enters the lock pool to wait for the lock after being notified.
Construction Method:
Condition ([Lock/rlock])
Condition has the following methods:
Acquire ([timeout])/release (): The appropriate method for invoking the associated lock.
Wait ([timeout]): Calling this method will cause the thread to enter the condition waiting pool to wait for notification and release the lock. The use of the front thread must have been locked or an exception will be thrown.
Notify (): Calling this method picks up a thread from the wait pool and notifies that the thread that receives the notification will automatically call acquire () to attempt to obtain a lock (into the lock pool); Other threads are still waiting in the pool. Calling this method does not release the lock. The use of the front thread must have been locked or an exception will be thrown.
Notifyall (): Calling this method notifies all threads waiting in the pool that will enter the lock pool to attempt to obtain a lock. Calling this method does not release the lock. The use of the front thread must have been locked or an exception will be thrown.
From threading import Condition, current_thread, Threadcon = Condition () def TC1 (): With con: For i in range (5):
print Current_thread (). Name, I sleep (0.3) if i = = 3: con.wait () def TC2 (): With con: For i in range (5): print Current_thread (). Name, I sleep (0.1) con.notify () thread (TARGET=TC1). Start () thread (target= TC2). Start () Thread-1 0thread-1 1thread-1 2thread-1 3 # yield lock Thread-2 0thread-2 1thread-2 2thread-2 3thread-2 4thread-1 4 # regain lock, continue
Only the thread that acquires the lock can call Wait () and notify (), so it must be called before the lock is released.
When wait () releases the lock, other threads can enter the wait state. Notifyall () Activates all waiting threads, allowing them to grab locks and complete subsequent executions.
Producer-consumer issues and Queue modules
Now let's introduce the queue module with a classic (producer consumer) example.
The scenario for producer consumers is that producers produce goods and then put the goods into a data structure such as a queue, and the time it takes to produce goods cannot be predetermined. The time for consumers to consume goods produced by producers is also uncertain.
Properties of the common Queue module:
Queue (size): Creates a queue object of size.
Qsize (): Returns the size of the queue (this value is approximate because the queue may be modified by another thread when it returns)
Empty (): Returns True if the queue is empty, otherwise False
Full (): Returns False if the queue returns True
Put (item,block=0): Put the item in the queue, if the block (not 0), the function will be blocked until there is space in the queue
Get (Block=0): Takes an object from the queue, and if a block (not 0) is given, the function will block until there are objects in the queue
The Queue module can be used to communicate between threads, allowing data to be shared between threads.
Now, let's create a queue where the producer (thread) puts the newly produced goods into use by the consumer (thread).
#! -*-coding:utf-8-*-from Queue import queuefrom Random import randintfrom time import sleep, timefrom threading Import Th ReadClass MyThread (Thread): Def init (self, func, args, name= ""): Super (MyThread, self). init () Self.name = Name Self.func = Func Self.args = args def getresult (self): return self.res def run (self): # When a new thread is created, the thread object calls our ThreadFunc object, and a special function called call () is used. print ' starting ', Self.name, ' at: ', time () Self.res = Self.func (*self.args) print Self.name, ' finished at: ', The time () # Writeq () and READQ () functions are used to put objects in the queue and an object in the consumption queue, respectively. Here we use the string ' xxx ' to represent the objects in the queue. def writeq (queue): print ' producing object for Q ... ' queue.put (' xxx ', 1) print "Size Now", Queue.qsize () def READQ (queue): Queue.get (1) print ("Consumed object from Q ... size now", Queue.qsize ()) def writer (queue, loops): # Write The R () function only does one thing, which is to put an object in the queue one at a time, wait for a while, and then do the same thing for I in range (loops): Writeq (queue) sleep (1) def reader (queUE, loops): # Reader () The only thing to do is to take an object out of the queue one at a time, wait for a while, and then do the same thing for I in range (loops): READQ (queue) sleep (r Andint (2, 5) # Set how many threads to be run Funcs = [Writer, Reader]nfuncs = range (len (FUNCS)) def main (): Nloops = Randint (Ten) Q = Queue (+) threads = [] for i in nfuncs:t = MyThread (Funcs[i], (Q, Nloops), Funcs[i].name) THREADS.A Ppend (t) for I in Nfuncs:threads[i].start () for I in Nfuncs:threads[i].join () print threads[i ].getresult () print ' All done ' if name = = ' Main ': Main ()
FAQ
Processes and threads. What is the difference between a thread and a process?
A process (sometimes called a heavyweight process) is a single execution of a program. Each process has its own address space, memory, data stack, and other auxiliary data that records its running trajectory.
Threads (sometimes referred to as lightweight processes) are somewhat similar to processes, but all threads run in the same process and share the same running environment. They can be imagined as "mini-processes" running in parallel in the main process or "main thread".
This article is a good explanation of the thread and process differences, recommended reading: Http://www.ruanyifeng.com/blo ...
A Python thread. In Python, which multithreaded program behaves better, I/o-intensive or computationally intensive?
Because of the Gil, the Gil is freed before this I/O call for all I/O-oriented programs (which call the built-in OS C code) to allow other threads to run while the thread waits for I/O. If a thread does not use many I/O operations, it will occupy the processor (and GIL) on its own time slice. In other words, I/O intensive Python programs take advantage of multithreaded environments more than computationally intensive programs.
Thread. What do you think is the difference between a multi-CPU system and a general system? How do multithreaded programs behave on this system?
The Python thread is a pthread of the C language and is dispatched via the operating system scheduling algorithm (for example, Linux is CFS). To allow each thread to take advantage of CPU time on average, Python calculates the number of micro-codes currently executing, forcing the Gil to be freed after a certain threshold is reached. At this point, the operating system's thread scheduling is also triggered (although it is true that the context switch is self-determined by the operating system).
Pseudo code
While True: acquire GIL-I in the : do something release GIL/ * Give Operating System a chance to D o Thread Scheduling */
This mode has no problem in the case of only one CPU core. Any thread that is aroused can successfully get to the Gil (because only the Gil is freed to cause thread scheduling).
But when the CPU has multiple cores, the problem comes. From the pseudo-code you can see that there is almost no gap between the release Gil and the acquire Gil. So when other threads on other cores are awakened, most of the time the main thread has acquired the Gil again. The thread that was awakened at this time can only waste CPU time in vain, watching another thread carry the Gil happily. Then reach the switching time after entering the state to be scheduled, and then be awakened, and then wait, in order to this cycle of cyclic.
The simple summary is that Python's multithreading on a multicore CPU only has a positive effect on IO-intensive computing, and when there is at least one CPU-intensive thread present, the multi-threading efficiency can be greatly reduced by the Gil.
The thread pool. The code that modifies the producer's consumer is no longer a manufacturer and a consumer, but can have any consumer thread (a thread pool), and each thread can process or consume any number of products at any one time.