Python thread Learning Record

Source: Internet
Author: User
Tags semaphore
Introduction & amp; motivation consider this scenario. we have 10000 pieces of data to process. it takes 1 second to process each piece of data, but it takes only 0.1 seconds to read the data, each piece of data does not interfere with each other. How can we perform this operation in the shortest time? Before the emergence of multi-thread (MT) programming, the running of computer programs... introduction & motivation

In this scenario, we have 10000 pieces of data to process. it takes 1 second to process each piece of data, but only 0.1 seconds to read the data. each piece of data does not interfere with each other. How can we perform this operation in the shortest time?

Before the emergence of multi-threaded (MT) # wiki/44.html "target =" _ blank "> programming, the running of computer programs consists of an execution sequence, the execution sequence runs in the central processor (CPU) of the host in sequence. Whether the task requires sequential execution or the entire program is composed of multiple subtasks, the program is executed in this way. This is true even when subtasks are independent and independent of each other (that is, the results of a subtask do not affect the results of other subtasks.

If an execution sequence is used to solve the above problem, it takes about 10000*0.1 + 10000 = 11000 seconds. This time is obviously too long.

Is it possible that we can retrieve data while executing the computation? Or how many pieces of data are processed simultaneously? If yes, the task efficiency can be greatly improved. This is the purpose of multi-threaded programming.

In essence, it is asynchronous and requires multiple concurrent transactions. the running sequence of each transaction can be uncertain, random, and unpredictable. multithreading is the best solution. Such a task can be divided into multiple execution streams, each of which has a target to be completed, and then the results are merged to obtain the final result.

What are threads and processes?

A process (sometimes called a heavyweight process) is an execution of a program. Each process has its own address space, memory, data stack, and other auxiliary data that records its running track. The operating system manages all processes running on them and assigns time to these processes fairly. The fork and spawn operations can also be used to complete other tasks. However, each process has its own memory space and data stack, so it can only use inter-process communication (IPC) instead of directly sharing information.

What is thread

A thread (sometimes called a lightweight process) is similar to a process. The difference is that all threads run in the same process and share the same runtime environment. They can be imagined as a "mini process" that runs in parallel in the main process or the "main thread ".

Thread status

The thread has three parts: start, sequential execution, and end. It has its own command pointer to record where it runs. Thread running may be preemptible (interrupted), or temporarily suspended (also called sleep), so that other threads can run. this is called concession. Each thread in a process shares the same piece of data space, so it is easier to share data and communicate with each other between threads than between processes.

Of course, such sharing is not completely harmless. If multiple threads access the same piece of data together, different data access requests may cause inconsistent data results. This is called race condition ).

Threads are generally executed concurrently. However, in a single CPU system, real concurrency is impossible. each thread is scheduled to run only for a short time, then let the CPU out and let other threads run. Some functions will be blocked before they are completed, so if there is no special modification for multithreading, this greedy function will skew the CPU time allocation. As a result, the running time allocated by each thread may be different and unfair.

Python, thread, and global interpreter lock GIL)

The first thing to note is that GIL is not a Python feature, but a concept introduced when implementing the Python parser (CPython. Just like C ++ is a set of language (syntax) standards, but different compilers can be used to compile executable code. The same piece of code can be executed in different Python execution environments such as CPython, PyPy, and Psyco (JPython does not have GIL ).

So what is GIL in CPython implementation? GIL stands for Global Interpreter Lock. to avoid misleading, let's take a look at the official explanation:

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. this lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces .)

Although Python fully supports multi-threaded programming, the C language implementation of the interpreter is not thread-safe during full parallel execution. In fact, the interpreter is protected by a global interpreter lock, which ensures that only one Python thread is executed at any time.

In a multi-threaded environment, the Python virtual machine is executed as follows:

  1. Set GIL

  2. Switch to a thread for execution

  3. Run

    • Specified number of bytecode commands

    • The thread actively controls outbound traffic (time. sleep (0) can be called ))

  4. Set the thread sleep status

  5. Unlock GIL

  6. Repeat the preceding steps again

For all I/O-oriented (the built-in operating system C code will be called) programs, GIL will be released before this I/O call, to allow other threads to run while this thread is waiting for I/O. If a thread does not use many I/O operations, it will occupy the processor (and GIL) in its own time slice ). That is to say, I/O-intensive Python programs make full use of the advantages of multi-threaded environments than computing-intensive programs.

Exit thread

When a thread finishes computing, it exits. A thread can call exit functions such as thread. exit () or use standard Python methods to exit a process, such as sys. exit () or throw a SystemExit exception. However, you cannot directly "kill" a thread.

Use threads in Python

Python supports multi-thread programming when running on Win32, Linux, Solaris, MacOS, * BSD, and other Unix-like systems. Python uses POSIX compatible threads, namely pthreads.

By default

>> import thread

If no error is reported, the thread is available.

Threading module of Python

Python provides several modules for multi-threaded programming, including thread, threading, and Queue. The thread and threading modules allow programmers to create and manage threads. The thread module supports basic threads and locks, while threading provides higher-level and more powerful thread management functions. The Queue module allows you to create a Queue data structure that can be used to share data between multiple threads.

Core instructions: Avoid using the thread module

We do not recommend that you use the thread module for the following considerations.

  1. Higher levels of threading modules are more advanced, more comprehensive thread support, and the use of attributes in the thread module may conflict with threading. Second, the low-level thread module has few synchronization primitives (in fact only one), while the threading module has many.

  2. There is no control over when your process should end. when the main thread ends, all threads will be forced to end, and normal cleanup will not be performed without warning. As we have said before, at least the threading module can ensure that important sub-threads exit before the process exits.

Thread module

In addition to thread generation, the thread module also provides the basic synchronization data structure lock object (lock object, also known as primitive lock, simple lock, mutex lock, mutex volume, binary semaphore ).

Thread module functions

  • Start_new_thread (function, args, kwargs = None): generates a new thread and calls this function using the specified parameters and optional kwargs in the new thread.

  • Allocate_lock (): Assigns a lock object of the LockType.

  • Exit (): let the thread exit

  • Acquire (wait = None): attempts to obtain the lock object

  • Locked (): If the lock object is obtained, True is returned; otherwise, False is returned.

  • Release (): release the lock

The following is an example of using thread:

import threadfrom time import sleep, timedef loop(num):    print('start loop at:', time())    sleep(num)    print('loop done at:', time())def loop1(num):    print('start loop 1 at:', time())    sleep(num)    print('loop 1 done at:', time())def main():    print('starting at:', time())    thread.start_new_thread(loop, (4,))    thread.start_new_thread(loop1, (5,))    sleep(6)    print('all DONE at:', time())if name == 'main':    main()('starting at:', 1489387024.886667)('start loop at:', 1489387024.88705)('start loop 1 at:', 1489387024.887277)('loop done at:', 1489387028.888182)('loop 1 done at:', 1489387029.888904)('all DONE at:', 1489387030.889918)

Start_new_thread () must have the first two parameters. Therefore, even if the function we want to run is not a parameter, an empty tuples must be passed.
Why should we add sleep (6? Because, if we do not stop the main thread, the main thread will run the next statement, display "all done", and then close the two threads running the loop () and loop1, quit.

Is there a better way to replace sleep () with unreliable synchronization methods? The answer is to use the lock. with the lock, we can exit immediately after both threads exit.

#! -*-Coding: UTF-8-*-import threadfrom time import sleep, timeloops = [4, 2] def loop (nloop, nsec, lock ): print ('Start loop % s at: % s' % (nloop, time () sleep (nsec) print ('loop % s done: % s' % (nloop, time () # each thread is allocated with a pre-obtained lock in sleep () when the time is reached, the corresponding lock will be released to notify the main thread that the thread has ended. Lock. release () def main (): print ('Starting at: ', time () locks = [] nloops = range (len (loops) for I in nloops: # Call thread. allocate_lock () function creates a lock list lock = thread. allocate_lock () # Call The acquire () function of each lock to obtain the lock. acquire () locks. append (lock) for I in nloops: # Create a thread. each thread uses its own cycle number. the sleep time and lock parameters call the loop () function thread. start_new_thread (loop, (I, loops [I], locks [I]) for I in nloops: # at the end of the thread, the thread should do it by itself Unlock operation # The current loop is waiting until the two locks are unlocked. While locks [I]. locked (): pass print ('All DONE at: ', time () if name = 'main': main ()

Why don't we create threads in the lock creation loop? There are several reasons:

  1. We want to implement thread synchronization, so we need to make "all horses rush out of the fence at the same time ".

  2. It takes some time to get the lock. if your thread exits "too fast", it may cause the thread to be finished before the lock is obtained.

Threading module

The threading module not only provides Thread classes, but also provides a variety of very useful synchronization mechanisms.

The following are all objects in the threading module:

  1. Thread: indicates the execution object of a Thread.

  2. Lock: Lock primitive object (same as the Lock object in the thread module)

  3. RLock: reentrant lock object. So that a single thread can obtain the acquired lock again (recursive lock ).

  4. Condition: The Condition variable object can stop a thread and wait for other threads to meet a certain Condition ". For example, the status or value changes.

  5. Event: a common condition variable. Multiple threads can wait for an event. after the event occurs, all threads will be activated.

  6. Semaphore: provides a structure similar to "waiting room" for the thread waiting for lock

  7. BoundedSemaphore: Similar to Semaphore, but it cannot exceed the initial value.

  8. Timer: Similar to Thread, but it only starts running after a period of time.

Daemon thread

Another reason to avoid using the thread module is that it does not support daemon threads. When the main thread exits, all sub-threads will be forced out regardless of whether they are still working. Sometimes, we do not expect this kind of behavior. at this moment, we introduce the concept of a daemon thread.
The threading module supports daemon threads. they work like this: the daemon thread is generally a server waiting for customer requests. If no customer sends a request, it will wait. If you set a thread as the daemon thread, it indicates that this thread is not important. when the process exits, you do not have to wait for the thread to exit.
If your main thread needs to exit without waiting for the sub-threads to complete, set the daemon attribute of these threads. That is, when the thread starts (call thread. before the start (), call the setDaemon () function to set the daemon flag (thread. setDaemon (True) indicates that this thread is "unimportant"
If you want to wait for the sub-thread to finish and then exit, you do not need to do anything, or explicitly call thread. setDaemon (False) to ensure that its daemon flag is False. You can call the thread. isDaemon () function to determine the value of the daemon flag. The new sub-thread inherits the daemon flag of its parent thread. The whole Python will end after all the non-daemon threads exit, that is, it will end when there are no non-daemon threads in the process.

Thread class

The Thread class provides the following methods:

  • Run (): a method used to indicate thread activity.

  • Start (): start the thread activity.

  • Join ([time]): wait until the thread is terminated. This blocks the call thread until the join () method of the thread is called to stop-exit normally or throw an unhandled exception-or an optional timeout occurs.

  • Is_alive (): returns whether the thread is active.

  • Name (): Set/return the thread name.

  • Daemon (): returns/sets the daemon flag of the thread. it must be set before calling the start () function.

With the Thread class, you can use multiple methods to create a Thread. Here we will introduce three similar methods.

  • Create a Thread instance and pass it a function

  • Create a Thread instance and pass it a callable class object

  • A subclass is derived from Thread to create an instance of this subclass.

The following example shows how to create a thread in three different ways:

#! -*-Coding: UTF-8-*-# Create a Thread instance and pass it a function import threadingfrom time import sleep, timeloops = [4, 2] def loop (nloop, nsec, lock): print ('Start loop % s at: % s' % (nloop, time () sleep (nsec) print ('loop % s done: % s' % (nloop, time () # each thread is allocated with a pre-obtained lock in sleep () when the time is reached, the corresponding lock will be released to notify the main thread that the thread has ended. Def main (): print ('Starting at: ', time () threads = [] nloops = range (len (loops) for I in nloops: t = threading. thread (target = loop, args = (I, loops [I]) threads. append (t) for I in nloops: # start threads [I]. start () for I in nloops: # wait for all # join () will wait until the thread ends, or when the timeout parameter is given, until the timeout. # Using join () seems clearer than using an infinite loop that waits for the lock to be released (This lock is also called "spinlock") threads [I]. join () # threads to finish print ('All DONE at: ', time () if name = 'main': main ()

Another method that is similar to passing a function is to create a thread, upload an instance of the callable class for execution when the thread starts-this is a more object-oriented method for multi-threaded programming. Compared to one or more functions, this method is more flexible because class objects can use the powerful functions of classes to save more information.

#! -*-Coding: UTF-8-*-# Create a Thread instance and pass it a callable class object from threading import Threadfrom time import sleep, timeloops = [4, 2] class ThreadFunc (object): def init (self, func, args, name = ""): self. name = name self. func = func self. args = args def call (self): # When creating a new Thread, the Thread object will call our ThreadFunc object, and a special function call () will be used (). Self. func (* self. args) def loop (nloop, nsec): print ('Start loop % s at: % s' % (nloop, time () sleep (nsec) print ('loop % s done at: % s' % (nloop, time () def main (): print ('Starting at: ', time ()) threads = [] nloops = range (len (loops) for I in nloops: t = Thread (target = ThreadFunc (loop, (I, loops [I]), loop. name) threads. append (t) for I in nloops: # start threads [I]. start () for I in nloops :# Wait for all # join () waits until the thread ends or the timeout parameter is given. # Using join () seems clearer than using an infinite loop that waits for the lock to be released (This lock is also called "spinlock") threads [I]. join () # threads to finish print ('All DONE at: ', time () if name = 'main': main ()

The last example describes how to subclass the Thread class, which is very similar to creating an callable class in the previous example. Use subclass to create a thread (29th-30 rows) to make the code clearer and clearer.

#! -*-Coding: UTF-8-*-# Create a Thread instance and pass it a callable class object from threading import Threadfrom time import sleep, timeloops = [4, 2] class MyThread (Thread): def init (self, func, args, name = ""): super (MyThread, self ). init () self. name = name self. func = func self. args = args def getResult (self): return self. res def run (self): # When creating a new Thread, the Thread object will call our ThreadFunc object, and a special function call () will be used (). Print 'starting', self. name, 'At: ', time () self. res = self. func (* self. args) print self. name, 'Finished at: ', time () def loop (nloop, nsec): print ('Start loop % s at: % s' % (nloop, time ())) sleep (nsec) print ('loop % s done at: % s' % (nloop, time () def main (): print ('Starting :', time () threads = [] nloops = range (len (loops) for I in nloops: t = MyThread (loop, (I, loops [I]), loop. name) threads. appe Nd (t) for I in nloops: # start threads [I]. start () for I in nloops: # wait for all # join () will wait until the thread ends, or when the timeout parameter is given, until the timeout. # Using join () seems clearer than using an infinite loop that waits for the lock to be released (This lock is also called "spinlock") threads [I]. join () # threads to finish print ('All DONE at: ', time () if name = 'main': main ()

In addition to various synchronization objects and thread objects, the threading module also provides some functions.

  • Active_count (): number of currently active thread objects

  • Current_thread (): returns the current thread object

  • Enumerate (): returns the list of active threads.

  • Settrace (func): sets a trace function for all threads.

  • Setprofile (func): sets a profile function for all threads.

Lock & RLock

A primitive lock is a synchronization primitive in the locked or unlocked state. Two methods, acquire () and release (), are used to lock and release the lock.
RLock reentrant Lock is a synchronization primitive similar to the Lock object, but the same thread can call it multiple times.

Lock does not support recursive locking. that is to say, the Lock must be released even in the same thread. It is usually recommended to change the RLock to process the "owning thread" and "recursion level" statuses. for multiple request locks of the same thread, only accumulate
Counter. Each call to release () will decrease the counter until 0 is released, so the acquire () and release () must appear in pairs.

from time import sleepfrom threading import current_thread, Threadlock = Rlock()def show():    with lock:        print current_thread().name, i        sleep(0.1)def test():    with lock:        for i in range(3):            show(i)for i in range(2):    Thread(target=test).start()
Event

Events are used for communication between threads. One thread sends a signal, and one or more other threads wait.
Event coordinates multi-thread operations through an internal flag. Wait () blocking thread execution until it is marked as True. Set () sets the flag to True and clear () to False. IsSet () is used to determine the Flag status.

From threading import Eventdef test_event (): e = Event () def test (): for I in range (5): print 'start wait' e. wait () e. clear () # if clear () is not called, the flag is always True, and wait () will not block print iThread (target = test ). start () return ee = test_event ()
Condition

The condition variable is the same as the Lock parameter and also a Synchronization Primitive. This Lock is used when the thread needs to pay attention to specific state changes or events.

It can be considered that in addition to the Lock pool contained in the Lock, the Condition also contains a waiting pool. the threads in the pool are in the waiting and blocking status in the status chart until another thread calls Y ()/notifyAll () notification. the thread enters the lock pool and waits for the lock.

Constructor:
Condition ([lock/rlock])

Condition has the following methods:

  • Acquire ([timeout])/release (): Call the corresponding method of the associated lock.

  • Wait ([timeout]): calling this method will enable the thread to enter the Condition wait pool for notification and release the lock. The thread must be locked before use; otherwise, an exception will be thrown.

  • Y (): Call this method to select a thread from the wait pool and notify the thread that receives the notification to automatically call acquire () to try to get the lock (enter the lock pool ); other threads are still waiting in the pool. The lock will not be released when this method is called. The thread must be locked before use; otherwise, an exception will be thrown.

  • Yyall (): calling this method will notify all threads in the wait pool that will enter the lock pool to try to get the lock. The lock will not be released when this method is called. The thread must be locked before use; otherwise, an exception will be thrown.

From threading import Condition, current_thread, Threadcon = Condition () def tc1 (): with con: for I in range (5): print current_thread (). name, I sleep (0.3) if I = 3: con. wait () def tc2 (): with con: for I in range (5): print current_thread (). name, I sleep (0.1) con. Y () Thread (target = tc1 ). start () Thread (target = tc2 ). start () Thread-1 0Thread-1 1Thread-1 2Thread-1 3 # let out the lock Thread-2 0Thread-2 1Thread-2 2Thread-2 3Thread-2 4Thread-1 4 # re-obtain the lock and continue to execute

Only the thread that obtains the lock can call wait () and Y (). Therefore, it must be called before the lock is released.
When wait () releases the lock, other threads can also enter the wait status. NotifyAll () activates all the waiting threads, so that they can get the lock and then complete subsequent execution.

Producer-consumer issues and Queue module

Now we will use a classic (producer and consumer) example to introduce the Queue module.

The scenario of a producer consumer is that the producer produces the goods and then places the goods in a data structure such as a queue. the time required to produce the goods cannot be determined in advance. The time when the consumer consumes the goods produced by the producer is also uncertain.

Attributes of common Queue modules:

  • Queue (size): Creates a size-based Queue object.

  • Qsize (): the size of the returned queue (this value is an approximate value because the queue may be modified by other threads when it is returned)

  • Empty (): If the queue is empty, True is returned; otherwise, False is returned.

  • Full (): Returns True if the queue is full; otherwise, False.

  • Put (item, block = 0): put the item in the queue. if the block is given (not 0), the function will always block until there is space in the queue.

  • Get (block = 0): get an object from the queue. if the block is given (not 0), the function will always block until there are objects in the queue.

The Queue module can be used for inter-thread communication to share data between threads.

Now, we create a queue for the producer (thread) to put the newly produced goods for the consumer (thread) to use.

#! -*-Coding: UTF-8-*-from Queue import Queuefrom random import randintfrom time import sleep, timefrom threading import Threadclass MyThread (Thread): def init (self, func, args, name = ""): super (MyThread, self ). init () self. name = name self. func = func self. args = args def getResult (self): return self. res def run (self): # When creating a new Thread, the Thread object will call our ThreadFunc object, and a special function call () will be used (). Print 'starting', self. name, 'At: ', time () self. res = self. func (* self. args) print self. name, 'Finished at: ', time () # writeQ () and readQ () functions are used to put objects into the queue and consume one object in the queue respectively. Here we use the string 'XXX' to represent objects in the queue. Def writeQ (queue): print 'producing object for Q... 'queue. put ('XXX', 1) print "size now", queue. qsize () def readQ (queue): queue. get (1) print ("consumed object from Q... size now ", queue. qsize () def writer (queue, loops): # only one thing for the writer () function is to put an object in the queue at a time and wait for a while, then do the same thing for I in range (loops): writeQ (queue) sleep (1) def reader (queue, loops): # The reader () function only does one thing, it is one time to get an object from the queue, wait for a while, and then do the same thing for I in range (loops): readQ (queue) sleep (randint (2, 5 )) # set the number of threads to run funcs = [writer, reader] nfuncs = range (len (funcs) def main (): nloops = randint (10, 20) q = Queue (32) threads = [] for I in nfuncs: t = MyThread (funcs [I], (q, nloops), funcs [I]. name) threads. append (t) for I in nfuncs: threads [I]. start () for I in nfuncs: threads [I]. join () print threads [I]. getResult () print 'all done' if name = 'main': main ()
FAQ process and thread. What is the difference between a thread and a process?

A process (sometimes called a heavyweight process) is an execution of a program. Each process has its own address space, memory, data stack, and other auxiliary data that records its running track.
A thread (sometimes called a lightweight process) is similar to a process. The difference is that all threads run in the same process and share the same runtime environment. They can be imagined as a "mini process" that runs in parallel in the main process or the "main thread ".

This article is a good explanation of the difference between the thread and process, read: http://www.ruanyifeng.com/blo...

Python thread. In Python, which multi-threaded program performs better, I/O-intensive or computing-intensive?

Due to GIL, GIL will be released before the I/O call for all I/O-oriented (the built-in operating system C code will be called) programs, to allow other threads to run while this thread is waiting for I/O. If a thread does not use many I/O operations, it will occupy the processor (and GIL) in its own time slice ). That is to say, I/O-intensive Python programs make full use of the advantages of multi-threaded environments than computing-intensive programs.

Thread. What is the difference between a multi-CPU system and a general system? How does a multi-threaded program behave in this system?

The Python thread is a pthread in C language and is scheduled by the operating system scheduling algorithm (for example, linux is CFS ). To enable each thread to use the average CPU time, python calculates the number of executed micro-code and forces GIL to be released after a certain threshold is reached. This will also trigger a thread scheduling of the operating system (of course, whether context switching is actually determined by the operating system ).
Pseudocode

while True:    acquire GIL    for i in 1000:        do something    release GIL    /* Give Operating System a chance to do thread scheduling */

This mode has only one CPU core. When a thread is invoked, GIL can be obtained successfully (because thread scheduling is triggered only when GIL is released ).
However, when the CPU has multiple cores, the problem arises. The pseudocode shows that there is almost no gap between release GIL and acquire GIL. Therefore, when other threads on other cores are awakened, the main thread has again acquired GIL in most cases. At this time, the thread that is awakened and executed can only waste CPU time in vain, watching the other thread execute happily with GIL. After the switching time is reached, the system enters the waiting for scheduling status and is awakened again. then, the system waits for a vicious circle.
In a simple summary, the multiple threads of Python are on multi-core CPUs and only have a positive effect on IO-intensive computing. when at least one CPU-intensive thread exists, the efficiency of multithreading will be greatly reduced by GIL.

Thread pool. Modify the consumer code of the producer. instead of a producer and a consumer, the producer can have any consumer thread (a thread pool ), each thread can process or consume any number of products at any time.

The above is the details about the python thread Learning Record. For more information, see other related articles in the first PHP community!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.