Overview
We all know that Windows is a multitasking-enabled operating system.
What do you mean "multitasking"? To put it simply, the operating system can run multiple tasks at the same time. For example, while you're surfing the Internet with your browser and listening to MP3, while you're working with Word, that's multitasking, at least 3 tasks running at the same time. There are a lot of tasks quietly running in the background at the same time, but the desktop is not displayed.
Multicore CPUs are now very popular, but even the single-core CPUs of the past can do multitasking. Since the CPU execution code is executed sequentially, how does a single-core CPU perform multi-tasking?
The answer is that the operating system turns each task to perform alternately, Task 1 executes 0.01 seconds, switches to Task 2, Task 2 executes 0.01 seconds, then switches to Task 3, executes 0.01 seconds ... This is done repeatedly. On the surface, each task is executed alternately, but because the CPU is executing too fast, we feel as if all the tasks are executing at the same time.
True parallel multitasking can only be done on multicore CPUs, but because the number of tasks is much larger than the number of cores in the CPU, the operating system automatically shifts many tasks to each core.
For the operating system, a task is a process, such as open a browser is to start a browser process, open a notepad started a Notepad process, open two Notepad started the two Notepad process, open a word started a word process.
Some processes do more than one thing at the same time, such as word, which can be typed, spell-checked, and printed at the same time. Within a process, to do multiple tasks at the same time, you need to run multiple "subtasks" at the same time, and we refer to these "subtasks" in the process as threads (thread).
Because each process has at least one thing to do, a process has at least a single thread. Of course, a complex process such as word can have multiple threads, multiple threads can execute simultaneously, multithreading is performed the same way as multiple processes, and the operating system quickly switches between multiple threads, allowing each thread to run briefly alternately, seemingly as if it were executing concurrently. Of course, a multi-core CPU is required to actually execute multiple threads at the same time.
All of the Python programs we wrote earlier are those that perform single-task processes, that is, only one thread. What if we want to do multiple tasks at the same time? There are two types of solutions:
-
- One is to start multiple processes, although each process has only one thread, but multiple processes can perform multiple tasks in one piece.
- One approach is to start a process that starts multiple threads within a process, so that multiple threads can perform multiple tasks in one piece.
Of course, there is a third way, that is, to start multiple processes, each process to start more than one thread, so that the simultaneous execution of more tasks, of course, this model is more complex, and rarely used.
To summarize, there are 3 ways to implement a multitasking:
-
- Multi-process mode;
- multithreaded mode;
- Multi-process + multithreaded mode
Performing multiple tasks at the same time is usually not unrelated to each task, but requires communication and coordination with each other, sometimes task 1 must pause waiting for task 2 to finish before it can continue, and sometimes task 3 and task 4 cannot be executed at the same time, so The complexity of multi-process and multi-threaded programs is much higher than the one-process single-threaded program we wrote earlier.
Python supports both multi-process and multi-threaded.
Process
A process that is in progress or a task. The CPU is responsible for performing the task.
Because computing computers are now multitasking at the same time, such as: Open QQ, and then listen to music, after the download of the tablets, then how these are done? The answer is through multiple processes. The operating system will plan the CPU time, each process performs a task (function), the CPU will quickly switch between these to achieve the purpose of simultaneous ( single core CPU case )
Processes and Procedures
Program: A collection of code aggregates. Process: Refers to the process of running a program. Note that a program executes two times, resulting in two mutually isolated processes.
Concurrency and parallelism
Parallel: Run concurrently, only with multiple CPUs to achieve parallel concurrency: pseudo-parallel, which appears to be running concurrently. A single cpu+ multi-channel technology can be implemented concurrently. (parallelism also belongs to concurrency)
Synchronous vs. asynchronous
Synchronization is when a process executes a request, and if it takes a while for the request to return information, the process will wait until it returns to Southwest XI to continue. Async means that a process does not have to wait, but continues to do the following, regardless of the state of other processes. The system notifies the process when a message is returned, which can improve the efficiency of execution. Example: Call is synchronous, texting is asynchronous
Creation of processes
Mainly divided into 4 kinds: 1, System initialization: (View the process Linux with the PS command, Windows with Task Manager, the foreground process is responsible for interacting with the user, the background running process is independent of the user, running in the background and only when needed to wake up processes, become daemons, such as e-mail, Web page , news, printing, etc.) 2, a process in the process of opening a child process (such as nginx open multi-threaded, operating system os.fork (), subprocess. Popen, etc.) 3, the user's interaction request, and create a new process (such as user double-click QQ) 4, a batch job start (only in the large batch system application)
All four of these are actually created by an already existing process that executes a system call to create the process.
- The call in the Unix/linux system is: fork, which is very special. A normal function call, called once, is returned once, but
fork()
called once, and returned two times, because the operating system automatically replicates the current process (called the parent process) with a copy (called a child process), and then returns within the parent and child processes, respectively. The child process returns 0, and the parent process returns the PID of the child process.
- The call in Winodws is that createprocess,createprocess both processes the creation of the process and is responsible for loading the correct program into the new process.
Attention:
- Process creation stepfather processes and child processes have their own different address spaces (multi-channel technology requires a physical level of memory isolation between processes), and any process modification in its address space does not affect another process.
- In Unix/linux, the initial address space of a child process is a copy of the parent process, and the child process and parent process can have a read-only shared memory area. But for WINODWS systems, the address space of the parent process and the child process is different from the beginning.
Sharing endpoints between processes, sharing a file system
Status of the process
The state of the process is divided into three main types: done, blocked, ready
Thread
In a traditional operating system, each process has an address space, and by default there is a control thread, multithreading (and multiple control threads) The concept is that there are multiple control threads in a process, multiple control threads share the process's address space, the process is only used to centralize resources together (process is only a resource unit, or resource collection), and the thread is the executing unit of the CPU.
Why use Multithreading
Multithreading refers to the opening of multiple threads in a process, simply: If multiple tasks are common to a single address space, then multiple threads must be opened within a process. 1, multi-thread sharing the address space of a process 2, threads are more lightweight than the process, threads are easier to create and revoke than processes, in many operating systems, create a line turndown to create a process 10-100 times 3, for CPU-intensive applications, multithreading does not improve performance, but for I/o-intensive, The use of multithreading will significantly increase speed (I/o-intensive, no multi-core advantage) 4, in a multi-CPU system, in order to maximize the use of multicore, you can open multiple threads (much less than the open process overhead)-- for other languages note: The threads in Python are more special, Other languages, 1 processes within 4 threads, if there are 4 CPUs, can be run simultaneously, and Python at the same time 1 processes, only one thread can work. (Even if you have more CPUs, not for Python)
The difference between a thread and a process
1. Thread sharing the address space of the process that created it, the process has its own address space 2, the thread can directly access the data of the process, the process has a copy of its parent process memory space 3, the thread can communicate with other threads in the same process directly, the process must interprocess Communicateion (IPC mechanism) for communication 4, the thread can be easily created, and the process relies on a copy of the parent process memory space 5, the thread can directly control the other threads within the same process, the process can only control its own child process 6, change the main thread (control) may affect other threads, Changing the master process does not affect its child processes
Multiprocessing Module
Multithreading in Python does not take advantage of multicore advantages, and if you want to fully use the resources of multicore CPUs (Os.cpu_count () Viewing), most of the situations in Python require multiple processes to be used. Python provides multiprocessing, which is used to open sub-processes and perform our custom tasks (such as functions) in a subprocess, similar to the programming interface of the Multithreaded module threading.
The multiprocessing module has many functions: supporting sub-processes, communicating and sharing data, performing different forms of synchronization, and providing components such as process, Queue, Pipe, lock, etc.
One thing that needs to be emphasized again is that, unlike threads, processes do not have any shared state, the process modifies data, and changes are limited to that process.
Process class and use
Note: the process () in Windows must be placed in # if __name__ = = ' __main__ ': under
classes that use process to create processes :
Process ([group [, Target [, name [, args [, Kwargs]]]), an object instantiated by the class that represents a task in a child process (not yet started) emphasizes: 1. You need to use a keyword to specify parameter 2. args Specifies the positional parameter to be passed to the target function, which is a tuple form and must have a comma
Parameters:
- The group parameter is not used and the value is always none
- Target represents the calling object, which is the task to be performed by the child process
- Args represents the positional parameter tuple of the calling object, args= ("Egon",)
- Kwargs represents the dictionary of the calling object, kwargs={' name ': ' Egon ', ' Age ': 18}
- Name is the child process
Method of the Process class
P.start (): # Starts the process and calls the P.run () in the child process () , and the direct call to the Run method is different because it initializes some of the other parameters. P.run (): # The method that runs at the start of the process is exactly what it calls the target specified function, we must implement the method in our custom class class P.terminate (): # Forces the process p to terminate, no cleanup is done, and if p creates a child process, the subprocess becomes a zombie process , using this method requires special care in this case. If P also holds a lock then it will not be released, resulting in Deadlock p.is_alive (): # If P is still running, return truep.join ([timeout]): # The main thread waits for p to terminate (emphasis: is the main thread is in the state, and P is in the running state). Timeout is an optional time-out, and it should be emphasized that the p.join can only join the START process and not join the run-open process
Other properties of the process
p.daemon:# the default value is False, if set to True, represents the daemon that P is running in the background, when P's parent process terminates, p also terminates with it, and when set to True, p cannot create its own new process and must be set before P.start () p.name:# The name of the process p.pid:# process's pidp.exitcode:# process is none at run time, and if –n, indicates that the authentication key for the p.authkey:# process is terminated by signal N, which by default is Os.urandom () A randomly generated 32-character string. The purpose of this key is to provide security for the underlying interprocess communication involving network connections, which can only succeed if they have the same authentication key
Special emphasis: Setting P.daemon=true is recycled as the master process finishes executing, regardless of whether the child process has completed the task.
Basic use
There are two ways to create a class of processes using process:
1, through the instantiation process class to complete the creation of processes
2, inherit the process class, customize the function you need to instantiate the creation process class
# ---------------------------Method 1---------------------------Import randomimport timefrom multiprocessing Import Process def Hello (name): print (' Welcome to my Home ') time.sleep (Random.randint (1,3)) print (' Bye Bye ') p = Process (target=hello,args= (' Daxin ') #) # Create Child process Pp.start () # start child process print (' main process end ') # --------------------- ------Method 2---------------------------Import randomimport timefrom multiprocessing import Processclass myprocess ( Process): def __init__ (self,name): super (myprocess, self). __init__ () # must inherit the parent class's constructor Self.name = Name def run (self): # must be called the Run method, because start is the Run method executed. print (' Welcome to {0} Home '. Format (self.name)) Time.sleep (Random.randint (1,3)) print (' Bye Bye ') p = Myprocess (' Daxin ') P.start () print (' End of main process ')
Modify the socket server with multi-process completion
In the previous section, we used the socket to complete the socket server writing, where we use multiprocessing to rewrite the server side and complete the concurrent accept request function.
Socket Server SideSocket Client Side
If the server accepts tens of thousands of requests, wouldn't it be to create 10,000 processes to separate them? This is not possible, then we can use the concept of the process pool to solve this problem, the process pool problem, in the following section detailed description
Process Sync Lock
Data sharing between processes, but sharing the same file system, so access to the same file, or the same print terminal, there is no problem, the result of competition is the confusion, how to control, is to lock processing.
scramble for resource-causing order problems
The purpose of the lock is: when the program 1 in use, the application lock, and lock the shared resources, after use, release the lock resources, other programs to obtain the lock, repeat the process.
The multiprocessing module provides the ability for a lock object to complete a process synchronization lock
The from multiprocessing import Locklock = Lock () # object has no parameter # requirements for lock/release by using the Acquire/release method of the lock object.
Use process sync lock to simulate the need for ticket-snatching software:
- Create a ticket file with JSON, set the number of votes remaining
- 100 Concurrent processes Rob a ticket
- Simulating network latency with the Random + time module
Import Randomimport timeimport jsonfrom multiprocessing import process,lockdef gettickles (filename,str,lock): Lock.acquire () # Lock with open (filename,encoding= ' utf-8 ') as f: dic = Json.loads (F.read ()) for the part to be modified if dic[' count '] > 0: dic[' count ']-= 1 time.sleep (Random.random ()) with open (filename, ' W ', encoding= ' Utf-8 ') as F: F.write (Json.dumps (DIC)) print (' \033[33m{0} Grab ticket succeeded \033[0m '. Format (str)) else: print (' \033[35m{0} Rob failed \033[0m '. Format (str)) lock.release () # unlock if __name__ = = ' __main__ ' After the modification is complete: lock = Lock () # Create a lock file p_l = [] for i in range: p = Process (target=gettickles,args= (' a.txt ', ' user%s '% i, Lock)) P_l.append (p) P.start ()
Lock can ensure that multiple processes modify the same piece of data, only one task can be modified at the same time, that is, serial modification, yes, the speed is slow, but at the expense of speed to ensure the data security.
Process Pool
In the use of Python for system management, especially the simultaneous operation of multiple file directories, or remote control of multiple hosts, parallel operation can save a lot of time. Multi-process is one of the means to achieve concurrency, the problems to be noted are:
- It is clear that the tasks that need to be performed concurrently are typically much larger than the number of cores
- An operating system cannot be opened indefinitely, usually with several cores opening several processes
- The process is too open and the efficiency will decrease (the open process requires system resources, and the process of opening the number of extra cores cannot be parallel)
For example, when the number of objects is small, can be directly used in multiprocessing process dynamic genetic multiple processes, more than 10 is OK, but if it is hundreds, thousands of ... Manually to limit the number of processes is too cumbersome, at this time can play the role of process pool.
We can control the number of processes by maintaining a process pool, such as the httpd process pattern, which specifies the minimum number of processes and the maximum number of processes ...
PS: For high-level applications with remote procedure calls, a process pool should be used, pool can provide a specified number of processes for the user to invoke, and when a new request is submitted to the pool, a new process is created to execute the request if it is not full. But if the number of processes in the pool has reached the specified maximum, then the request waits until the process in the pool has ended and reuses the processes in the process pool.
Class to create a process pool: If you specify Numprocess as 3, the process pool creates three processes from scratch and then uses the three processes to perform all tasks, without opening other processes
From multiprocessing Import Poolpool = Pool (Processes=none, Initializer=none, initargs= ())
Parameters:
- Processes: The maximum number of processes for a process pool
- Initiallizer: function to execute after initialization is complete
- Initargs: Arguments to pass to the function
Common methods
P.apply (func [, args [, Kwargs]) # calls a process in the process pool to execute the function Func,args/kwargs for the passed parameter, note that apply is blocking and is executed serially. P.apply_async (func [, args [, Kwargs]) # function with apply, the difference is non-blocking, both asynchronous execution. ——— > Common p.close () # Close the process pool to prevent further action. If all operations persist, they will complete p.join () # Before the worker process terminates, waiting for all worker processes to exit. This method can only be called after close () or teminate ()
Attention:
Apply_async will return the Asyncresul object, this Asyncresul object has a method:
View Code
Overwrite the socket server with the process pool:
Import osimport Socketimport multiprocessingserver = Socket.socket (socket.af_inet,socket. SOCK_STREAM) server.setsockopt (socket. Sol_socket,socket. so_reuseaddr,1) Server.bind ((' 127.0.0.1 ', 8100)) Server.listen (5) def talk (conn): print (' My process number is:%s '% os.getpid () While True: msg = CONN.RECV (1024x768) if not msg:break data = Msg.decode (' utf-8 ') msg = Data.upper () C6/>conn.send (Msg.encode (' utf-8 ')) if __name__ = = ' __main__ ': pool = multiprocessing. Pool (1) while True: conn,addr = server.accept () print (addr) Pool.apply_async (talk,args= (conn,)) pool.close () pool.join ()
This specifies that the number of process pools is 1, then concurrent two connections, the second will hold, only the first disconnection, will be connected, note: The PID number of the process, or the same.
callback function
scenario where a callback function is required: once any of the tasks in the process pool have been processed, inform the main process immediately: I'm all right, you can handle my results. The main process calls a function to process the result, which is the callback function. We can put the time-consuming (blocking) task into the process pool and then specify the callback function (the main process is responsible for executing) so that the main process eliminates the I/O process when executing the callback function, and the result of the task is directly obtained.
Apply_async (Self, Func, args= (), kwds={}, Callback=none) # func results are given to the specified callback function for processing
A small example of a reptile:
From multiprocessing import Poolimport requestsimport osdef geturl (URL): print (' My process number:%s '% os.getpid ()) Print (' I handle URL:%s '% URL) response = requests.get (URL) # Request page return Response.text # return to Web page source def urlpar Ser (htmlcode): print (' My process number is:%s '% os.getpid ()) datalength = Len (htmlcode) # Calculates the length of the source print (' The parsed HTML size is:%s '% datalength) if __name__ = = ' __main__ ': pool = pool () url = [ ' http://www.baidu.com ', c13/> ' http://www.sina.com ', ' http://www.qq.com ', ' http://www.163.com ' ] res_l = [] for I In URL: res = Pool.apply_async (geturl,args= (i,), Callback=urlparser) # res is the result of geturl execution because it has been handed over to Urlparser , so there's no need to take res_l.append (res) pool.close () pool.join () for res in res_l: print (Res.get ()) # What we got here is the source of the Web page
Inter-process communication
Processes are isolated from each other, and to implement interprocess communication (IPC), the Multiprocessing module provides two forms: queues and pipelines, both of which are delivered using messaging. But there is also a way of sharing data that is deprecated, and it is recommended to use queues for interprocess communication.
Looking ahead, concurrent programming based on messaging is a trend, and even with threading, the recommended approach is to design a large collection of independent threads to exchange data through Message Queuing. This greatly reduces the need to use locking and other synchronization methods, and can be extended to distributed systems.
Queue
The bottom layer is implemented in a pipeline and locked way.
to create a queue class :
Queue ([MaxSize]): Creates a shared process queue, which is a multi-process secure queue that enables data transfer between multiple processes using the queue. # parameter maxsize: The maximum number of queues that can be hosted, without limiting the queue size if omitted
Basic use:
From multiprocessing Import Queueq = Queue (3) q.put (' a ') # Data stored in Queueprint (Q.get ()) # Fetch data from queue
Note: The queue is FIFO mode, both first in and out.
Methods of the queue
Q.put () is used to insert data into the queue.
Q.put (obj, block=true, timeout=none) # parameter: # Blocked,timeout: If blocked is True (default), and timeout is a positive value, This method blocks the time specified by timeout until the queue has the remaining space. If timed out, a Queue.full exception is thrown. If blocked is false, but the queue is full, an Queue.full exception is thrown immediately.
Ps:q.put_nowait () equivalent to Q.put (Block=false)
Q.get () is used to fetch data from the queue.
Q.get (block=true,timeout=none) # Parameters: # Blocked and timeout. If blocked is true (the default) and timeout is a positive value, then no element is taken within the wait time, and a Queue.empty exception is thrown. If blocked is false, there are two cases where the queue has a value that is available, and the value is immediately returned, otherwise the Queue.empty exception is thrown immediately if it is empty.
Ps:q.get_nowait () equivalent to Q.get (Block=false)
other methods (not particularly accurate, can be forgotten)Producer Consumer Model
Using producer and consumer patterns in concurrent programming can solve most concurrency problems. This mode improves the overall processing speed of the program by balancing the productivity of the production line and the consuming thread.
Why use producer and consumer models
In the world of threads, the producer is the thread of production data, and the consumer is the thread of consumption data. In multithreaded development, producers have to wait for the consumer to continue producing data if the producer is processing fast and the consumer processing is slow. Similarly, consumers must wait for producers if their processing power is greater than that of producers. To solve this problem, the producer and consumer models were introduced.
What is the producer consumer model
The producer-consumer model solves the problem of strong coupling between producers and consumers through a container. Producers and consumers do not communicate with each other directly, and through the blocking queue to communicate, so producers do not have to wait for consumer processing after the production of data, directly to the blocking queue, consumers do not find producers to data, but directly from the blocking queue, the blocking queue is equivalent to a buffer, Balance the processing power of producers and consumers.
Implement producer consumer models based on queues:
- Producers are only responsible for the production of cakes, and the finished cakes are placed in the queue
- Consumers are only responsible for consuming cakes, each time they pick up the cake from the queue
producer Consumer base model
The above example is perfect, but the producer finished, the consumer is finished, then our main program should quit, but there is no, because the consumer is still waiting to get from the queue (Q.get), here we consider can send a finish/eat the signal, grab the signal after the exit.
- Put a fixed value in the queue to signal
- Use the Joinablequeue object + Daemon property to recycle consumer processes
using keyword signalsusing the Joinablequeue objectjoinablequeue + Daemon
which
- Use the Join,task_done method of the Joinablequeue object to complete the confirmation/notification.
- If the producer is finished, the consumer must also confirm to the producer that the consumption is finished, so as long as the producer is finished, it can exit the main process.
- When the main process exits but the consumer process is not yet recycled, you can set the consumer daemon property to True, followed by the main process being recycled.
Sharing data
Inter-process data is independent and can be communicated using queues or pipelines, both of which are message-based, although the data is independent from one process to the other, but can also be shared by the manager, in fact, the manager is far more than that.
Manager () # no Parameters # Create a shared data type using the Manager object
Use Manager to create data and complete process sharing
Import osfrom multiprocessing import manager,processdef Worker (d,l): d[os.getpid ()]=os.getpid () # Modify Shared data L.append (Os.getpid ()) If __name__ = = ' __main__ ': m = Manager () d = m.dict () # Create a shared dictionary l = m.list ()
# Create a shared list p_l = [] for i in range: p= Process (target=worker,args= (d,l)) p_l.append (p) P.start () for p in p_l: p.join () print (d) print (L)
Threading Module
The Python standard library provides thread and threading two modules to support multithreading. Where the thread module processes and controls threads in a low-level, primitive way, the threading module provides a more convenient API to process threads by encapsulating the thread two times.
Ps:multiprocessing completely imitates the interface of the threading module, which has a great similarity in the use level, so many usages are the same, so it may look more familiar.
Thread class and use
Thread is one of the most important classes in the threading module and can be used to create threads.
There are two ways to create a thread:
- By inheriting the thread class, rewrite its run method;
- Create a threading. The thread object in which the callable object is passed in as a parameter in its initialization function (__INIT__);
#-----------------------instantiating an object--------------------------import threadingdef work (name): print (' hello,{0} '. Format (name)) If __name__ = = ' __main__ ': t = Threading. Thread (target=work,args= (' daxin ',)) T.start () print (' main process ') #----------------------- Create your own class--------------------------import Threadingclass work (threading. Thread): def __init__ (self,name): super (work, self). __init__ () self.name = name def run (self): Print (' hello,{0} '. Format (self.name)) if __name__ = = ' __main__ ': t = Work (name= ' daxin ') t.start () Print (' main process ')
PS: When executing, we can see that the "Hello,daxin" will be printed before the "main process" is printed, so this also shows that the creation of the line turndown the creation process consumes much less resources, and threads are created and executed quickly. If we print os.getpid in both the function and the main function of target execution, you will find that the process number is the same, which also shows that the self-threading is turned on here.
Python Learning notes-day13-processes and threads