First, what is a thread:
In traditional operating systems, each process has an address space, and there is a control thread by default
The thread as the name implies, is a pipeline work process, a pipeline must belong to a workshop, a workshop work process is a process
The workshop is responsible for the integration of resources together, is a resource unit, and a workshop has at least one pipeline
The line of work requires power, the power is equivalent to the CPU
Therefore, a process is simply used to centralize resources together (a process is just a resource unit, or a resource collection), and the thread is the executing unit on the CPU.
The concept of multi-threaded (that is, multiple control threads) is that there are multiple control threads in a process, and multiple control threads share the process's address space, which is equivalent to a workshop with multiple pipelines, sharing a workshop resource.
For example, the Beijing Metro and the Shanghai Metro are different processes, while the Beijing metro line Line 13 is a thread, the Beijing Metro all the lines share all the resources of the Beijing Metro, such as all passengers can be pulled by all lines.
Second, the thread creation overhead is small:
If our software is a factory, the factory has a number of lines, the pipeline work requires power supply, only one CPU (single core CPU)
A workshop is a process, a workshop at least one pipeline (a process at least one thread)
Creating a process is creating a workshop (application space, building at least one pipeline within that space)
and build a thread, just in a workshop built a pipeline, no need to apply for space, so the creation of a small cost
Between processes is a competitive relationship, between threads is a collaborative relationship?
The workshop is directly competitive/grab the power of the relationship, competition (different processes are directly competitive relationships, are different programmers write programs run, Thunderbolt preempt other processes of speed, 360 of other processes as the virus to dry dead)
A workshop of different assembly line collaborative work relationship (the same process of the thread is a partnership, is the same program written within the program to start, the thread within the Thunderbolt is a cooperative relationship, will not do their own)
Three, the difference between the thread and the process:
1. Thread sharing creates the address space of its process, and the process has its own address space.
2. The thread accesses the data segment of the process directly, and the process has its own copy of the data segment of the parent process.
3. Threads can process directly with other threads, and processes must use interprocess communication with siblings.
4. It is easy to create new threads; The new process requires the parent process to replicate.
5. Threads can have considerable control over the threads of the same process, and the process can only be controlled by the child process.
6. Changes to the main thread (cancellation, priority changes, and so on) may affect the behavior of other threads of the process, and changes to the parent process do not affect the child process.
Iv. Why to use Multithreading:
Multithreading refers to the opening of multiple threads in a process, simply speaking: If multiple tasks share a single address space, then multiple threads must be opened within a process. The detailed lecture is divided into 4 points:
1. Multithreading share the address space of a process
2. Threads are more lightweight than processes, threads are easier to create revocable than processes, and in many operating systems, creating a line turndown creates a process 10-100 times faster, which is useful when a large number of threads require dynamic and rapid modification
3. If multiple threads are CPU-intensive, there is no performance gain, but if there is a lot of computation and a lot of I/O processing, having multiple threads allows these activities to overlap each other, which speeds up the execution of the program.
4. In a multi-CPU system, in order to maximize the use of multicore, you can open multiple threads, much less than the open process overhead. (This one does not apply to Python)
Five, multi-threaded application examples:
Open a word processing software process, the process must do more than one thing, such as listening to keyboard input, processing text, automatically save the text to the hard disk, the three tasks are the same piece of data, and therefore can not be used multi-process. Only in one process can open three threads concurrently, if it is a single thread, it can only be, keyboard input, can not handle text and auto-save, automatically save the text can not be entered and processed.
Six, Threading module introduction:
The Multiprocess module completely imitates the interface of the threading module, which has a great similarity in the use level, so it is no longer described in detail
Seven, the two ways to open the thread:
#Way One fromThreadingImportThreadImport TimedefSayhi (name): Time.sleep (2) Print('%s Say hello'%name)if __name__=='__main__': T=thread (target=sayhi,args= ('Egon',)) T.start ()Print('Main Thread') mode one
#Mode two fromThreadingImportThreadImport TimeclassSayhi (Thread):def __init__(Self,name): Super ().__init__() Self.name=namedefRun (self): Time.sleep (2) Print('%s Say hello'%self.name)if __name__=='__main__': T= Sayhi ('Egon') T.start ()Print('Main Thread') mode two
Eight, the difference between opening multiple threads under one process and opening multiple sub-processes under one process:
fromThreadingImportThread fromMultiprocessingImportProcessImportOSdefWork ():Print('Hello')if __name__=='__main__': #to open a thread under the main processT=thread (target=Work ) T.start ()Print('Main thread /master process') " "print Result: Hello main thread/main process" " #to open a child process under the main processT=process (target=Work ) T.start ()Print('Main thread /master process') " "Print Result: main thread/main process Hello" "whose opening speed is fast
fromThreadingImportThread fromMultiprocessingImportProcessImportOSdefWork ():Print('Hello', Os.getpid ())if __name__=='__main__': #part1: Open multiple threads under the main process, each with the same PID as the main processT1=thread (target=Work ) T2=thread (target=Work ) T1.start () T2.start ()Print('Main thread /master process PID', Os.getpid ())#part2: Open multiple processes with different PID for each processP1=process (target=Work ) P2=process (target=Work ) P1.start () P2.start ()Print('Main thread /master process PID', Os.getpid ()) look at the PID.
fromThreadingImportThread fromMultiprocessingImportProcessImportOSdefWork ():GlobalN N=0if __name__=='__main__': #n=100 #p=process (target=work) #P.start () #P.join () #print (' main ', N) #毫无疑问子进程p已经将自己的全局的n改成了0, but only to its own, to view the parent process n is stillN=1T=thread (target=Work ) T.start () T.join ()Print('Master', N)#View the result as 0 because the in-process data is shared among threads in the same processDo threads in the same process share the process's data?
Nine, the Guardian thread
Whether it is a process or a thread, follow: Guardian xxx will wait for the main xxx to be destroyed after the completion of the operation
It should be emphasized that the operation is not terminated
1. For the main process, run complete refers to the main process code is finished running 2. To the main thread, said, run complete refers to the main thread in the process of all non-daemon threads run complete, the main thread is run complete
Detailed Explanation:
1 The main process is finished after its code is finished (the daemon is recycled at this point), and then the main process will wait until the non-daemon child processes have finished running to reclaim the child process's resources (otherwise it will produce a zombie process), will end, 2 The main thread runs after the other non-daemon threads have finished running (the daemon is recycled at this point). Because the end of the main thread means the end of the process, the resources of the process as a whole are recycled, and the process must ensure that the non-daemon threads are finished before they end.
fromThreadingImportThreadImport TimedefSayhi (name): Time.sleep (2) Print('%s Say hello'%name)if __name__=='__main__': T=thread (target=sayhi,args= ('Egon',)) T.setdaemon (True)#must be set before T.start ()T.start ()Print('Main Thread') Print(T.is_alive ())" "Main thread True" "
Ten, Gil Introduction
Gil is the essence of a mutex, since it is a mutex, all the nature of the mutex is the same, all the concurrent operation into serial, in order to control the same time shared data can only be modified by a task, and thus ensure data security.
One thing is certain: to protect the security of different data, you should add a different lock.
To understand the Gil, first make a point: each time you execute a python program, you create a separate process. For example, Python Test.py,python Aaa.py,python bbb.py will produce 3 different Python processes
# 1 All data is shared, where the code as a data is shared by all threads (all code for test.py and all code for the CPython interpreter) For example: test.py defines a function work (code content), where all threads in the process can access the code of the business, so we can open three threads and target points to that code, which means it can be executed. #2 Threads of the task, all need to use the task code as a parameter to the interpreter code to execute, that is, all the thread to run their own tasks, the first thing to do is to have access to the interpreter code.
Comprehensive:
If multiple threads are target=work, then the execution process is
Multiple lines enters upgradeable access to the interpreter's code, that is, get execute permission, and then give the target code to the interpreter code to execute
The code of the interpreter is shared by all threads, so the garbage collection thread can also access the interpreter's code to execute, which leads to a problem: for the same data 100, it is possible that thread 1 executes the x=100 while garbage collection performs the recovery of 100 operations, there is no clever way to solve this problem , is to lock processing, such as Gil, to ensure that the Python interpreter can only execute one task at a time code
Xi. Gil and Lock:
Il protects the data at the interpreter level and protects the user's own data by locking itself
12. Gil and Multithreading
With Gil's presence, at the same moment only one thread in the same process is executed
Heard here, some students immediately questioned: The process can take advantage of multicore, but the overhead, and Python's multithreaded overhead, but can not take advantage of multicore advantage, that is, Python is useless, PHP is the most awesome language?
Don't worry, Mom's not finished yet.
To solve this problem, we need to agree on several points:
1. Is the CPU used for computing, or is it used for I/O? 2. Multi-CPU means that multiple cores can be computed in parallel, so multicore boosts compute performance by 3. Once each CPU encounters I/O blocking, it still needs to wait, so it's useless to check I/O operations more
A worker is equivalent to the CPU, at this time the calculation is equivalent to workers in the work, I/O blocking is equivalent to work for workers to provide the necessary raw materials, workers work in the process if there is no raw materials, the workers need to work to stop the process until the arrival of raw materials.
If your factory is doing most of the tasks of preparing raw materials (I/O intensive), then you have more workers, the meaning is not enough, it is not as much as a person, in the course of materials to let workers to do other work,
Conversely, if your plant has a full range of raw materials, the more workers it is, the more efficient it is.
Conclusion:
For computing, the more CPU, the better, but for I/O, no more CPU is useless
Of course, to run a program, with the increase in CPU performance will certainly be improved (regardless of the increase in size, there is always improved), this is because a program is basically not pure computing or pure I/O, so we can only compare to see whether a program is computationally intensive or I/o-intensive, Further analysis of the Python multithreading in the end there is no useful
# Analysis: we have four tasks to deal with, the processing method must be to play a concurrency effect, the solution can be: Scenario One: Open four process Scenario two: a process, open four threads # If the four tasks are computationally intensive, no multicore to parallel computing, the scenario increases the cost of creating a process, and if the four tasks are I/o intensive, the cost of the scenario one creation process is large, and the process is not much faster than the thread, the scheme wins # Multi-core scenario, analysis results: If four tasks are computationally intensive, multicore means parallel computing, in Python a process in which only one thread executes at the same time does not use multicore, scheme one wins if four tasks are I /O intensive, no more cores can not solve i/# Conclusion: Now the computer is basically multicore, Python for compute-intensive task open multi-threading efficiency does not bring much performance improvement, or even better than serial ( Without a lot of switching), however, there is a significant increase in efficiency for IO-intensive tasks.
Python Day36 python multithreading