Python basics-process and thread, python basics thread

Source: Internet
Author: User

Python basics-process and thread, python basics thread

I. Concepts of processes and threads

First, the concept of "multi-task" is introduced: multi-task processing means that users can run multiple applications at the same time. Each application is called a task. Linux and windows are multi-task operating systems, which are much more powerful than single-task systems.

For example, if you are using a browser to access the Internet while listening to Yiyun music on the Internet and using Word to catch up with jobs, this is a multi-task. At least three jobs are running at the same time. There are also many tasks that are quietly running in the background at the same time, but they are not displayed on the desktop.

But are these tasks running at the same time? As we all know, the cpu is required to run a task, so multiple CPUs are required to run multiple tasks simultaneously? If 100 tasks need to run at the same time, do I have to buy a 100-core cpu? Obviously, no!

Nowadays, multi-core CPUs are very popular, but even single-core CPUs can execute multiple tasks. Since the code executed by the CPU is executed sequentially, how does one single-core CPU execute multiple tasks?

The answer is that the operating system allows tasks to run in turn. Task 1 is executed for 0.01 seconds, Task 2 is executed for 0.01 seconds, Task 3 is switched to task 2, and task 2 is executed for 0.01 seconds ...... In this way, it is executed repeatedly. On the surface, each task is executed alternately, but because the CPU execution speed is too fast, we feel like all the tasks are executed simultaneously.

Summary: A single cpu can only run one "task" at a time. Real parallel execution of multiple tasks can only be achieved on multi-core CPUs. However, because the number of tasks is far greater than the number of CPU cores, the operating system automatically schedules many tasks to each core for execution in turn.

For the operating system, a task is a Process. For example, to open a browser, a browser Process is started. To open a notepad, a notepad Process is started, start two notepad processes, open a Word, and start a Word process.

Some processes do more than one thing at the same time, such as Word, which can perform typing, spelling check, printing, and so on at the same time. In a process, to do multiple tasks at the same time, you need to run multiple "subtasks" at the same time. We call these "subtasks" in the process as threads ).

Since each process must do at least one thing, a process must have at least one thread. Of course, a complicated process like Word can have multiple threads and multiple threads can be executed at the same time. The method of multi-thread execution is the same as that of multiple processes, it also allows the operating system to quickly switch between multiple threads, so that each thread can run cyclically for a short time. It looks like running at the same time. Of course, multi-core CPU is required to execute multiple threads simultaneously.

Summary:

  • A process is a dynamic execution process of a program on a dataset. A process generally consists of three parts: a program, a dataset, and a process control block.
  • A thread is also called a lightweight process. It is a basic CPU Execution Unit and the smallest unit in the program execution process. It consists of the thread ID, program counter, register set, and stack. The introduction of threads reduces the overhead of concurrent program execution and improves the concurrency performance of the operating system. The thread does not have its own system resources.

2. Relationship between processes and threads

A process is a running activity of a program on a dataset in a computer. It is the basic unit for the system to allocate and schedule resources and the basis of the operating system structure. A process is a running activity of a program with certain independent functions. A process is an independent unit for the system to allocate and schedule resources.
A thread is an entity of a process and the basic unit of CPU scheduling and dispatching. It is a smaller basic unit that can run independently than a process.

Summary:

  • A thread can belong to only one process, and a process can have multiple threads, but at least one thread.

  • Resources are allocated to the process. All threads of the same process share all resources of the process.
  • The CPU is allocated to the thread, that is, the thread that actually runs on the CPU.

Iii. Parallel (xing) and concurrency

Parallel Processing is a computing method in a computer system that can execute two or more processes at the same time. Parallel processing can work on different aspects of the same program at the same time. The main purpose of parallel processing is to save time for solving large and complex problems.

Concurrency Processing indicates that several programs in a time period are between started and running, and these programs are all running on the same CPU, however, at any time point, only one program runs on the CPU.

The key to concurrency is that you have the ability to process multiple tasks, not necessarily at the same time. The key to parallelism is that you have the ability to process multiple tasks at the same time. Therefore, parallelism is a subset of concurrency.

Iv. synchronous and asynchronous

In the computer field, synchronization means that when a process executes a request, if the request takes some time to return information, the process will remain waiting, the execution will not continue until the response is received.

Asynchronous means that the process does not need to wait until it continues to execute other operations, regardless of the status of other processes. When a message is returned, the system notifies the process to process the message, which improves the execution efficiency. For example, synchronous communication is used for phone calls and asynchronous communication is used for sending short messages.

For example:

Because the speed of CPU and memory is much higher than the speed of peripherals, in IO programming, there is a serious issue of speed mismatch. For example, to write 0.01 m of data to a disk, it takes only seconds for the CPU to output M of data, but it may take 10 seconds for the disk to receive the M data. There are two solutions:

V. threading Module

Threads are the execution units directly supported by the operating system. Therefore, high-level languages usually have built-in multi-threaded support, and Python is no exception. In addition, Python threads are real Posix Threads, instead of the simulated thread.

The Python Standard Library provides two modules:_threadAndthreading,_threadIs a low-level module,threadingIs an advanced module._threadEncapsulation. In most cases, we only need to usethreadingThis advanced module.

1. Call the Thread class to directly create

When a thread is started, a function is passed in and created.ThreadInstance, and then callstart()Start execution:

1 import time, threading 2 3 # code executed by the new thread: 4 def loop (): 5 print ('thread % s is running... '% threading. current_thread (). name) 6 n = 0 7 while n <5: 8 n = n + 1 9 print ('thread % s >>> % s' % (threading. current_thread (). name, n) 10 time. sleep (1) 11 print ('thread % s ended. '% threading. current_thread (). name) 12 13 print ('thread % s is running... '% threading. current_thread (). name) 14 t = threading. thread (target = loop, name = 'loopthread') 15 t. start () 16 t. join () 17 print ('thread % s ended. '% threading. current_thread (). name) 18 19 20 # running result: 21 # thread MainThread is running... 22 # thread LoopThread is running... 23 # thread LoopThread >>> 124 # thread LoopThread >>> 225 # thread LoopThread >>> 326 # thread LoopThread >>>> 427 # thread LoopThread >>> 528 # thread LoopThread ended.29 # thread MainThread ended.
Instance 1

By default, any process starts a thread. We call this thread as the main thread, and the main thread can start a new thread.threadingThe module hascurrent_thread()Function, which returns the instance of the current thread forever. The name of the main thread instance isMainThreadThe sub-thread name is specified during creation. We useLoopThreadName the sub-thread. The name is only used for display during printing. It has no other meaning at all. If it is not named, Python will automatically name the threadThread-1,Thread-2......

1 import threading 2 import time 3 4 def countNum (n): # define the function 5 6 print ("running on number: % s" % n) to be run by a thread) 7 8 time. sleep (3) 9 10 if _ name _ = '_ main _': 11 12 t1 = threading. thread (target = countNum, args = (23,) # generate a Thread instance 13 t2 = threading. thread (target = countNum, args = (34,) 14 15 t1.start () # Start Thread 16 t2.start () 17 18 print ("ending! ") 19 20 21 # running result: the program prints" ending !" Wait 3 seconds until the end 22 # running on number: 2323 # running on number: 3424 # ending!
Instance 2

There are three threads in this instance: the main thread, t1 and t2 subthreads.

 

2. Custom Thread class inheritance Creation

1 # inherit Thread to create 2 3 import threading 4 import time 5 6 class MyThread (threading. thread): 7 8 def _ init _ (self, num): 9 threading. thread. _ init _ (self) # inherit from the parent class _ init _ 10 self. num = num11 12 def run (self): # The run Method 13 print ("running on number: % s" % self. num) 14 time. sleep (3) 15 16 t1 = MyThread (56) 17 t2 = MyThread (78) 18 19 t1.start () 20 t2.start () 21 print ("ending ")
View Code

3. Thread class instance method

Join and dameon

1 import threading 2 from time import ctime, sleep 3 4 def Music (name): 5 6 print ("Begin listening to {name }. {time }". format (name = name, time = ctime () 7 sleep (3) 8 print ("end listening {time }". format (time = ctime () 9 10 def Blog (title): 11 12 print ("Begin recording the {title }. {time }". format (title = title, time = ctime () 13 sleep (5) 14 print ('end recording {time }'. format (time = ctime () 15 16 17 threads = [] 18 19 20 t1 = threading. thread (target = Music, args = ('fill me',) 21 t2 = threading. thread (target = Blog, args = ('',) 22 23 threads. append (t1) 24 threads. append (t2) 25 26 if _ name _ = '_ main _': 27 28 # t2.setDaemon (True) 29 30 for t in threads: 31 32 # t. setDaemon (True) # Note: 33 t must be set before start. start () 34 35 # t. join () 36 37 # t1.join () 38 # t2.join () # consider the results under the three join locations? 39 40 print ("all over % s" % ctime ())
Join and setDaemon

Other methods:

1. Thread Instance Object method 2 # isAlive (): returns whether the Thread is active. 3 # getName (): the name of the returned thread. 4 # setName (): Set the thread name. 5 6. Some methods provided by the threading module: 7 # threading. currentThread (): returns the current thread variable. 8 # threading. enumerate (): returns a list containing running threads. Running means that the threads before and after startup and termination are not included. 9 # threading. activeCount (): returns the number of running threads, with the same results as len (threading. enumerate.

Vi. GIL

''' Definition: In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. this lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces .) '''

The threads in Python are the native threads of the operating system. The Python virtual machine uses a Global Interpreter Lock (Global Interpreter Lock) to mutex the usage of the threads on the Python virtual machine. To support multithreading, a basic requirement is to implement mutual exclusion between different threads for shared resource access. Therefore, GIL is introduced.
GIL: After a thread has the interpreter's access, all other threads must wait for it to release the interpreter's access, even if the next instruction of these threads does not affect each other.
Obtain GIL before calling any Python C API
Disadvantage of GIL: the multi-processor degrades to a single processor; advantage: avoids a large number of Lock unlock operations.

1. Early Design of GIL

Python supports multithreading. The simplest way to solve data integrity and state synchronization between multiple threads is locking. So with GIL, the super lock, and when more and more code library developers accept this setting, they begin to rely heavily on this feature (that is, the internal object of python is thread-safe by default, you do not need to consider extra memory locks and synchronization operations when implementing these functions ). Slowly, this implementation method is found to be cool and inefficient. However, when we try to split and remove GIL, we find that a large number of library code developers have been heavily dependent on GIL and it is very difficult to remove it. How hard is it? Make an analogy, for small projects like MySQL, it took nearly five years to split the large lock in the Buffer Pool Mutex into various small locks, from 5.5 to 5.6, and then to more than 5.7 major editions, and continues. Why is it so difficult for a product backed by a company with a fixed development team to develop MySQL? What's more, a team with a high degree of community for core development and code contributors like Python?

2. Influence of GIL

No matter how many threads you start and how many CPUs you have, Python allows only one thread to run at a time while executing a process.
Therefore, python cannot use multi-core CPUs to implement multithreading.
In this way, python is less efficient in developing multiple threads for computing-intensive tasks than in serial mode (with no large number of switches). However, the efficiency of IO-intensive tasks is significantly improved.

Computing-intensive instances:

1 # coding: utf8 2 from threading import Thread 3 import time 4 5 def counter (): 6 I = 0 7 for _ in range (100000000 ): 8 I = I + 1 9 return True10 11 12 def main (): 13 l = [] 14 start_time = time. time () 15 for I in range (2): 16 17 t = Thread (target = counter) 18 t. start () 19 l. append (t) 20 t. join () 21 22 for t in l: 23 t. join () 24 # counter () 25 # counter () 26 end_time = time. time () 27 print ("Total time :{}". format (end_time-start_time) 28 29 if _ name _ = '_ main _': 30 main () 31 32 33 ''' 34 py2.7: 35 serial: 9.17599987984s36 concurrency: 9.26799988747s37 py3.6: 38 serial: 9.540389776229858s39 concurrency: 9.568442683084116s40 41 '''
Computing-intensive, multi-thread concurrency, no significant advantage over serial

3. Solution

Replacing the Thread multiprocessing library with multiprocessing is largely to make up for the inefficiency of the thread library due to GIL. It completely copies a set of interfaces provided by the thread to facilitate migration. The only difference is that it uses multi-process instead of multi-thread. Each process has its own independent GIL, so there will be no GIL competition between processes.

1 # coding: utf8 2 from multiprocessing import Process 3 import time 4 5 def counter (): 6 I = 0 7 for _ in range (100000000 ): 8 I = I + 1 9 10 return True11 12 def main (): 13 14 l = [] 15 start_time = time. time () 16 17 # for _ in range (2): 18 # t = Process (target = counter) 19 # t. start () 20 # l. append (t) 21 # t. join () 22 #23 # for t in l: 24 # t. join () 25 counter () 26 counter () 27 end_time = time. time () 28 print ("Total time :{}". format (end_time-start_time) 29 30 if _ name _ = '_ main _': 31 main () 32 33 34 ''' 35 36 py2.7: 37 serial: 8.92299985886 s38 parallel: 8.19099998474 s39 40 py3.6: 41 serial: 9.963459014892578 s42 concurrency: 5.1366541385650635 s43 44 '''
Multiprocess multi-process concurrent operations can improve efficiency

Of course, multiprocessing is not a panacea. Its introduction will increase the difficulty of data communication and synchronization between threads during program implementation. Take the counter for example. If we want multiple threads to accumulate the same variable, for the thread, declare a global variable and use the thread. the Lock context is wrapped, and the three rows are settled. Because the data of the other party cannot be seen between processes, multiprocessing can only declare a Queue in the main thread, put and get or use the share memory method. This extra implementation cost makes it even more painful to encode a very painful multi-threaded program.

Conclusion: due to the existence of GIL, only multithreading In the IO Bound scenario can improve the performance. For programs with high parallel computing performance, you can consider turning the core part into the C module, or simply implement it in other languages. GIL will continue to exist for a long period of time, but will continue to improve it.

 

 

 

 

Not complete...

 

 

 

References:

1. http://www.cnblogs.com/yuanchenqi/articles/6755717.html#top

2. http://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/00143192823818768cd506abbc94eb5916192364506fa5d000? T = 1494233371173 # comments

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.