How Python uses multiple processes to implement parallel processing for instance analysis

Source: Internet
Author: User
Tags nslookup
This article mainly introduces the use of multi-process in Python to achieve parallel processing of the method summary, with a certain reference value, interested in small partners can refer to

Processes and threads are important concepts in the field of computer software, and the processes and threads are different and closely related, and the two concepts are first discriminated:

1. Definition

A process is a program with a certain independent function about a single run activity on a data set, a process that is an independent unit of the system's resource allocation and scheduling.
A thread is an entity of a process that is the basic unit of CPU dispatch and dispatch, which is a smaller unit that can run independently than a process. The thread itself basically does not own the system resources, only has a point in the operation of the necessary resources (such as program counters, a set of registers and stacks), However, it can share all of the resources owned by the process with other threads that belong to one process.

2. Relationship

One thread can create and revoke another thread, and can execute concurrently between multiple threads in the same process.

Relative to a process, a thread is a concept that is closer to the execution body, it can share data with other threads in the process, but has its own stack space and has a separate execution sequence.

3. Differences

The main difference between processes and threads is that they are different ways to manage operating system resources. The process has a separate address space, and after a process crashes, it does not affect other processes in protected mode, and the thread is just a different execution path in a process. Thread has its own stack and local variables, but there is no separate address space between the threads, a thread dead is equal to the entire process dead, so the multi-process program is more robust than multithreaded programs, but in the process of switching, the cost of large resources, efficiency is worse. but for some concurrent operations that require simultaneous and shared variables, only threads can be used, and processes cannot be used.

1) In short, a program has at least one process, and a process has at least one thread.

2) The thread partition scale is smaller than the process, which makes the multi-thread procedure high concurrency.

3) In addition, the process has a separate memory unit during execution, while multiple threads share memory, which greatly improves the efficiency of the program operation.

4) threads are still different from the process in the execution process. Each separate thread has a program run entry, sequence of sequence execution, and exit of the program. However, threads cannot be executed independently, and must be dependent on the application, which provides multiple threads of execution control.

5) from a logical point of view, the meaning of multithreading is that in an application, there are multiple execution parts that can be executed concurrently. However, the operating system does not consider multiple threads as separate applications to implement scheduling and management of processes and resource allocation. This is the important difference between processes and threads.

4. Pros and cons

Threads and processes have advantages and disadvantages in use: the overhead of thread execution is small but not conducive to the management and protection of resources, while the process is the opposite. At the same time, threads are suitable for running on SMP machines, while processes can be migrated across machines.

This article focuses on the application of multi-process in Python

The Unix/linux operating system provides a fork () system call, which is very special. A normal function call, called once, is returned once, but the fork () is called once and returned two times because the operating system automatically copies the current process (called the parent process), which is then returned within the parent and child processes, respectively.

The child process always returns 0, and the parent process returns the ID of the child process. The reason for this is that a parent process can fork out many child processes, so the parent process has to note the ID of each child process, and the child process only needs to call Getpid () to get the ID of the parent process.

Python's OS module encapsulates common system calls, including fork, which makes it easy to create child processes in a Python program:


Import Osprint (' Process (%s) start ... '% os.getpid ()) # only works on unix/linux/mac:pid = Os.fork () if pid = = 0:  print ( ' I am child process (%s) and my parent are%s. '% (Os.getpid (), Os.getppid ())) Else:  print (' I (%s) just created a child Process (%s). '% (Os.getpid (), PID)

The results of the operation are as follows:

Process (876) Start ... I (876) just created a child process (877). I am Child process (877) and my parent is 876.

Because Windows does not have a fork call, the above code cannot be run on Windows.

With a fork call, a process can copy a child process to handle a new task when it receives a new task, and the common Apache server is the parent process listening port, and whenever there is a new HTTP request, fork out the child process to process the new HTTP request.

Multiprocessing

If you are going to write a multi-process service program, Unix/linux is undoubtedly the right choice. Because Windows does not have a fork call, isn't it possible to write multi-process programs in Python on Windows?

Because Python is cross-platform, nature should also provide a cross-platform, multi-process support. The multiprocessing module is a multi-process module with cross-platform versions.

The multiprocessing module provides a process class to represent a processing object, and the following example demonstrates starting a child process and waiting for it to end:


From multiprocessing import Processimport os# subprocess to execute code def RUN_PROC (name):  print (' Run child process%s (%s) ... '% (nam E, Os.getpid ())) if __name__== ' __main__ ':  print (' Parent process%s. '% Os.getpid ())  p = Process (Target=run_proc , args= (' Test ',))  print (' child process'll start. ')  P.start ()  p.join ()  print (' child process end. ')

When you create a child process, you only need to pass in a parameter that executes functions and functions, create a process instance, and start with the start () method, so that the creation process is simpler than fork ().

The join () method can wait for the child process to end before continuing to run, typically for inter-process synchronization.

Pool

If you want to start a large number of child processes, you can create the child processes in batches using the process pool:


From multiprocessing import Poolimport OS, time, Randomdef long_time_task (name):  print (' Run task%s (%s) ... '% (name, Os.getpid ()))  start = Time.time ()  time.sleep (Random.random () * 3)  end = Time.time ()  print (' Task%s Runs%0.2f seconds. '% (name, (End-start))) if __name__== ' __main__ ':  print (' Parent process%s. '% Os.getpid ())  p = Pool (4) for  I in range (5):    P.apply_async (Long_time_task, args= (i))  print (' Waiting for all subprocesses Done ... ')  p.close ()  p.join ()  print (' All subprocesses done. ')

The results of the implementation are as follows:

Parent process 669.Waiting for all subprocesses-done ... Run Task 0 (671) ... Run Task 1 (672) ... Run Task 2 (673) ... Run Task 3 (674) ... Task 2 runs 0.14 seconds. Run Task 4 (673) ... Task 1 runs 0.27 seconds. Task 3 runs 0.86 seconds. Task 0 runs 1.41 seconds. Task 4 runs 1.91 seconds. All subprocesses is done.

Code interpretation:

Calling the Join () method on the pool object waits for all child processes to complete, must call Close () before calling join (), and cannot continue adding a new process after calling Close ().

Note that the result of the output, task 0,1,2,3 is executed immediately, and task 4 waits for the previous task to complete before it executes, because the default size of pool is 4 on my computer, so up to 4 processes are executed at the same time. This is a design restriction by the pool and is not a limitation of the operating system. If you change to:


p = Pool (5)

You can run 5 processes at a time.

Since the default size of the pool is the number of cores of the CPU, if you unfortunately have a 8-core CPU, you will have to submit at least 9 sub-processes to see the wait effect above.

Child process

Many times, a child process is not itself, but an external process. After we have created the child process, we also need to control the input and output of the child process.

The Subprocess module allows us to easily start a subprocess and then control its input and output.

The following example shows how to run the command nslookup www.python.org in Python code, which works just like the command line:


Import Subprocessprint (' $ nslookup www.python.org ') R = Subprocess.call ([' Nslookup ', ' www.python.org ']) print (' Exit Code: ', R)

Operation Result:

$ nslookup www.python.orgServer:        192.168.19.4Address:    192.168.19.4#53non-authoritative Answer: www.python.org    Canonical name = Python.map.fastly.net.Name:    python.map.fastly.netAddress: 199.27.79.223Exit code:0

If the child process also requires input, it can be entered using the Communicate () method:


Import Subprocessprint (' $ Nslookup ') p = subprocess. Popen ([' Nslookup '], stdin=subprocess. PIPE, Stdout=subprocess. PIPE, Stderr=subprocess. PIPE) output, err = P.communicate (b ' Set q=mx\npython.org\nexit\n ') print (Output.decode (' Utf-8 ')) print (' Exit code: ', P.returncode)

The above code is equivalent to executing command nslookup at the command line, and then manually entering:

Set Q=mxpython.orgexit

Inter-process communication

There is definitely a need for communication between processes, and the operating system provides a number of mechanisms to enable interprocess communication. Python's multiprocessing module wraps the underlying mechanism, providing queue, pipes, and many other ways to exchange data.

For example, we will create two sub-processes in the parent process, one to write data to the queue, and one to read the data from the queue:


From multiprocessing import process, Queueimport OS, time, random# code to write the data processing execution: def write (q):  print (' Process to write:% S '% os.getpid ())  for value in [' A ', ' B ', ' C ']:    print (' Put%s to queue ... '% value)    q.put (value)    Time.slee P (Random.random ()) # Read data Process Execution code: def read (q):  print (' Process to read:%s '% os.getpid ()) while  True:    value = q . Get (True)    print (' Get%s from queue. '% value) if __name__== ' __main__ ':  # The parent process creates a queue and passes it to each sub-process:  q = Queue () C11/>PW = Process (Target=write, args= (Q,))  PR = Process (Target=read, args= (Q,))  # start child process PW, write:  Pw.start ()  # start child process PR, read:  Pr.start ()  # Wait for PW to end:  pw.join ()  # PR process is a dead loop, cannot wait for its end, can only forcibly terminate:  Pr.terminate ()

The results of the operation are as follows:

Process to write:50563put A to queue ... Process to Read:50564get A from queue. Put B to queue ... Get B from the queue. Put C to queue ... Get C from queue.

Under Unix/linux, the multiprocessing module encapsulates the fork () call, so that we do not need to focus on the details of the fork (). Because Windows does not have a fork call, multiprocessing needs to "emulate" the effect of a fork, and all Python objects of the parent process must be serialized to the child process via pickle, all, If multiprocessing fails in Windows downgrade, first consider if Pickle failed.

Summary

Under Unix/linux, you can use the fork () call to implement multiple processes.

To achieve multi-process across platforms, you can use the multiprocessing module.

Inter-process communication is realized through queue, pipes and so on.

Multithreading

Multitasking can be done by multiple processes or by multithreading within a process. A process is made up of several threads, and a process has at least one thread.

Because threads are execution units that are directly supported by the operating system, high-level languages often have built-in multithreading support, Python is no exception, and Python threads are real POSIX threads, not those that are emulated.

Python's standard library provides two modules: _thread and Threading,_thread are low-level modules, and threading is an advanced module that encapsulates _thread. In most cases, we only need to use the Advanced module threading.

Starting a thread is to pass a function in and create a thread instance, and then call Start () to start execution:


Import time, threading# code executed by the new thread: Def loop ():  print (' thread%s is running ... '% threading.current_thread (). Name)  n = 0  while n < 5:    n = n + 1    print (' thread%s >>>%s '% (Threading.current_thread (). Name, N))    time.sleep (1)  print (' thread%s ended. '% Threading.current_thread (). Name) print (' thread%s is running ... '% Threading.current_thread (). Name) T = Threading. Thread (Target=loop, name= ' Loopthread ') T.start () T.join () print (' thread%s ended. '% Threading.current_thread (). Name) Thread Mainthread is Running...thread loopthread is running...thread loopthread >>> 1thread loopthread >>& Gt 2thread loopthread >>> 3thread loopthread >>> 4thread loopthread >>> 5thread Loopthread ended. Thread Mainthread ended.

Since any process will start a thread by default, we call it the main thread, and the main thread can start new threads, and the Python threading module has a current_thread () function that always returns an instance of the current thread. The primary thread instance is named Mainthread, and the child thread name is specified at creation time, and we use Loopthread to name the child thread. The name is only used to display when printing, there is absolutely no other meaning, if you do not name Python will automatically give the thread named Thread-1,thread-2 ...

Lock

The biggest difference between multithreading and multi-process is that, in many processes, the same variable, each one copy is in each process, does not affect each other, and in many threads, all variables are shared by all threads, so any one variable can be modified by any one thread, so, The biggest danger of sharing data between threads is that multiple threads change a variable at the same time, and the content is scrambled.

Take a look at how multiple threads can manipulate a variable to change the content:


Import time, threading# assume this is your bank deposit: balance = 0def change_it (n):  # First Save and then fetch, the result should be 0:  global balance  balance = Balance + N  balance = Balance-ndef run_thread (n): for  I in range (100000):    change_it (n) t1 = threading. Thread (Target=run_thread, args= (5,)) t2 = Threading. Thread (Target=run_thread, args= (8,)) T1.start () T2.start () T1.join () T2.join () print (balance)

We define a shared variable balance, the initial value is 0, and start two threads, the first to save after fetching, the theoretical result should be 0, but, because the thread scheduling is determined by the operating system, when T1, T2 alternating execution, as long as the number of cycles, balance the result is not necessarily 0.

The reason is because a statement in a high-level language is a few statements when the CPU executes, even if a simple calculation:


Balance = balance + N

Also in two steps:

    1. Calculate balance + N and deposit in temporary variables;

    2. Assigns the value of the temporary variable to balance.

which can be seen as:


x = balance + nbalance = X

The reason for the data error: Because modifying balance requires more than one statement, while executing these statements, the thread may break, causing multiple threads to change the contents of the same object.

When two threads are saved at the same time, it may cause the balance to be incorrect, and you certainly don't want your bank account to become a negative number somehow, so we have to make sure that when a thread modifies balance, other threads must not change.

If we want to make sure that the balance calculation is correct, we need to give change_it () a lock, and when a thread starts executing change_it (), we say that the thread has acquired a lock, so other threads cannot execute change_it () at the same time, wait until the lock is released , the lock can be changed after it is acquired. Since there is only one lock, no matter how many threads, at most one thread at a time holds the lock, so there is no conflict of modification. Creating a lock is through threading. Lock () to achieve:


Balance = 0lock = Threading. Lock () def run_thread (n):  for I in Range (100000):    # First to get the lock:    lock.acquire ()    try:      # change it with confidence:      Change_it (n)    finally:      # change it out. Must release the Lock:      lock.release ()

When multiple threads execute Lock.acquire () at the same time, only one thread can successfully acquire the lock and then continue executing the code, and the other threads continue to wait until the lock is acquired.

The thread that gets the lock must release the lock after it is exhausted, or the thread that waits for the lock waits forever to become a dead thread. So we use try...finally to make sure that the lock will be released.

The advantage of the lock is to ensure that a certain section of the key code can only be performed by a single thread from start to finish, the disadvantage is also many, the first is to prevent multi-threaded concurrent execution, a piece of code containing the lock can actually only be executed in single-threaded mode, the efficiency is greatly reduced. Second, because multiple locks can exist, different threads hold different locks, and attempting to acquire a lock held by the other side can cause deadlocks, cause multiple threads to hang all, neither execute nor end, only by the operating system to force termination.

Multi-core CPUs

If you unfortunately have a multicore CPU, you must be thinking that multicore should be able to execute multiple threads at the same time.

What would happen if we wrote a dead loop?

Turn on Activity Monitor for Mac OS X, or Windows task Manager, to monitor the CPU usage of a process.

We can monitor a dead loop thread that consumes a CPU 100%. If there are two dead loop threads, in the multi-core CPU, you can monitor the CPU that consumes 200%, that is, two CPU cores. To fully run the core of the N-core CPU, you must start n dead loop threads.

Try Python to write a dead loop:


Import threading, Multiprocessingdef loop ():  x = 0 while  True:    x = x ^ 1for i in range (Multiprocessing.cpu_cou NT ()):  t = Threading. Thread (Target=loop)  T.start ()

Starting with the same number of n threads as the CPU cores, you can monitor the CPU occupancy by only 102% on a 4-core CPU, that is, only one core is used.

But using C, C + + or Java to rewrite the same dead loop, can directly complete the core, 4 cores ran to 400%, 8 cores ran to 800%, why not python?

Because the python thread is a real thread, but the interpreter executes the code, there is a Gil Lock: Global interpreter lock, before any Python thread executes, must first obtain the Gil Lock, and then, each execution of 100 bytecode, the interpreter will automatically release the Gil lock , giving other threads a chance to execute. This Gil global lock actually locks the execution code of all threads, so multithreading can only be performed alternately in Python, even if 100 threads run on a 100-core CPU, only 1 cores are used.

Gil is a legacy of the Python interpreter design, usually the interpreter we use is an officially implemented CPython, to really take advantage of multicore unless you rewrite an interpreter with no Gil.

So, in Python, you can use multi-threading, but don't expect to make efficient use of multicore. If you must use multi-core through multithreading, that can only be achieved through C extension, but this will lose the Python simple and easy to use features.

However, there is no need to worry too much that Python cannot use multithreading to achieve multicore tasks, but it can achieve multi-core tasks through multiple processes. Multiple Python processes have separate Gil locks that do not affect each other.

Multithreaded programming, model complex, prone to conflict, must be isolated with locks, but also beware of the occurrence of deadlocks.

The Python interpreter was designed with a Gil global lock, resulting in multi-threaded inability to take advantage of multicore.

ThreadLocal

In a multithreaded environment, each thread has its own data. It is better for a thread to use its own local variables than to use global variables, because local variables can only be seen by the thread themselves and not affect other threads, and changes to global variables must be locked. But local variables also have a problem, that is, when the function is called, it is cumbersome to pass:


Import threading# Create global threadlocal object: Local_school = threading.local () def process_student ():  # Gets the student of the current thread association:  std = local_school.student  print (' Hello,%s (in%s) '% (Std, Threading.current_thread (). Name)) def Process_ Thread (name):  # bind Threadlocal's student:  local_school.student = name  process_student () T1 = Threading. Thread (target= process_thread, args= (' Alice ',), name= ' thread-a ') t2 = threading. Thread (target= process_thread, args= (' Bob ',), name= ' Thread-b ') T1.start () T2.start () T1.join () T2.join ()

The global variable Local_school is a Threadlocal object, and each thread can read and write student properties to it, but it does not affect each other. You can think of Local_school as a global variable, but each property, such as Local_school.student, is a thread's local variable, can read and write without interfering with each other, and does not have to manage the lock problem, threadlocal internal processing.

Can be understood as a global variable local_school is a dict, not only can use local_school.student, but also can bind other variables, such as Local_school.teacher and so on.

The most common place for threadlocal is to bind a database connection, HTTP request, user identity information, and so on for each thread, so that all the calls to the handler for a thread can be accessed easily.

Although a threadlocal variable is a global variable, each thread can only read and write independent copies of its own thread, without interfering with each other. Threadlocal solves the problem of passing arguments between functions in a thread.

Process vs. thread

We have introduced multi-process and multi-threading, which are the two most common ways to achieve multitasking. Now, let's discuss the pros and cons of both approaches.

First, to achieve multitasking, we typically design the master-worker pattern, where master is responsible for assigning tasks, and the worker is responsible for performing tasks, so in a multitasking environment, it is usually a master, multiple workers.

If you implement master-worker with multiple processes, the master process is master and the other process is Worker.

If the Master-worker is implemented with multithreading, the main thread is master, and the other threads are worker.

The greatest advantage of multi-process mode is high stability, because one child process crashes without affecting the main process and other child processes. (Of course, the main process hangs all the processes are all hung up, but the master process is only responsible for assigning tasks, the probability of hanging off low) The famous Apache is the first multi-process mode.

The disadvantage of the multi-process mode is that the cost of creating the process is large, and under the Unix/linux system, it is OK to make a fork call, and the process overhead under Windows is huge. In addition, the operating system can run simultaneously the number of processes is also limited, under memory and CPU constraints, if there are thousands of processes running simultaneously, the operating system even scheduling will be problematic.

Multithreaded mode is usually faster than a multi-process, but it is not fast, and the fatal disadvantage of multithreaded mode is that any thread that hangs can directly cause the entire process to crash because all threads share the memory of the process. On Windows, if a thread executes a problem with the code, you can often see the hint that "the program is performing an illegal operation that is about to close," which is often a problem with a thread, but the operating system forces the entire process to end.

Under Windows, multithreading is more efficient than multiple processes, so Microsoft's IIS server uses multithreaded mode by default. Because of the stability of multithreading, IIS is less stable than Apache. To alleviate this problem, IIS and Apache now have multi-process + multi-threaded mixed mode, which is the more complicated the problem.

Thread switching

Whether it is multi-process or multi-threading, as long as the number of a lot, efficiency is definitely not going, why?

We make an analogy, assuming that you are unfortunately preparing for the test, every night need to do the language, mathematics, English, Physics, Chemistry, 5 of the homework, each job time 1 hours.

If you take 1 hours to do Chinese homework, finish, and then spend 1 hours doing math homework, so that, in turn, all done, a total of 5 hours, this method is called a single task model, or batch task model.

Suppose you are going to switch to a multitasking model, you can do 1 minutes of language, then switch to math, do 1 minutes, then switch to English, and so on, as long as the switch speed is fast enough, this way and single-core CPU to perform multitasking is the same, to kindergarten children's eyes, you are at the same time write 5 homework.

However, switching jobs is a cost, such as from the language cut to mathematics, to clean up the table of Chinese books, pens (this is called the preservation site), and then open the math textbook, find the compass ruler (this is called to prepare a new environment), to start doing math homework. The operating system is the same when switching processes or threads, it needs to save the current execution of the field environment (CPU register status, memory pages, etc.), and then prepare the execution environment of the new task (restore the last register state, switch memory pages, etc.) before you can start execution. This switching process is fast, but it also takes time. If there are thousands of tasks at the same time, the operating system may be mainly busy switching tasks, there is not much time to perform the task, the most common is the hard drive, the point window unresponsive, the system is in suspended animation state.

Therefore, once the multi-tasking to a limit, it will consume all the resources of the system, resulting in a sharp decline in efficiency, all tasks are not good.

Compute-intensive vs. IO-intensive

The second consideration with multitasking is the type of task. We can divide the task into compute-intensive and IO-intensive.

Compute-intensive tasks are characterized by a large number of computations that consume CPU resources, such as PI, HD decoding video, and so on, all relying on the computing power of the CPU. This computationally intensive task can be accomplished with multitasking, but the more tasks, the more time it takes to switch tasks, the less efficient the CPU is to perform the task, so the most efficient use of the CPU should be equal to the number of cores in the CPU.

Compute-intensive tasks are critical to the efficiency of your code because they consume CPU resources primarily. Scripting languages like Python are inefficient and are completely unsuitable for compute-intensive tasks. For computationally intensive tasks, it is best to write in C.

The second type of task is IO-intensive, the tasks involved in network, disk IO are IO-intensive tasks, which are characterized by low CPU consumption and most of the time the task is waiting for the IO operation to complete (because IO is much slower than CPU and memory speed). For IO-intensive tasks, the more tasks you have, the higher the CPU efficiency, but there is a limit. Most of the tasks that are common are IO-intensive tasks, such as Web applications.

IO-intensive task execution, 99% of the time spent on the IO, the time spent on the CPU is very small, so the fast-running C language to replace the very low-speed scripting language with Python, completely unable to improve operational efficiency. For IO-intensive tasks, the most appropriate language is the most efficient (least code) language, the scripting language is preferred, and the C language is the worst.

Asynchronous IO

Considering the huge speed difference between CPU and IO, a task is waiting for the IO operation most of the time during execution, and the single-process single-threaded model causes other tasks not to be executed in parallel, so we need a multi-process model or multithreaded model to support multitasking concurrent execution.

The modern operating system has made great improvements to IO operations, with the biggest feature being the support for asynchronous IO. If you take advantage of the asynchronous IO support provided by the operating system, you can use a single-process single-threaded model to perform multitasking, a new model called the event-driven model, Nginx is a Web server that supports asynchronous IO, and it can efficiently support multitasking by using a single-process model on a single-core CPU. On multi-core CPUs, you can run multiple processes (the same number as the number of CPU cores) to take advantage of multicore CPUs. Because the total number of processes in the system is very limited, operating system scheduling is very efficient. Using the asynchronous IO programming model to achieve multi-tasking is a major trend.

corresponding to the Python language, the single-process asynchronous programming model is called the coprocessor, and with the support of the coprocessor, an efficient multitasking program can be written based on event-driven. We'll discuss how to write the process later.

Distributed processes

In thread and process, the process should be preferred because the process is more stable, and the process can be distributed across multiple machines, and the thread can only be distributed to multiple CPUs on the same machine.

Python's multiprocessing module not only supports multiple processes, where the managers sub-module also supports the distribution of multiple processes across multiple machines. A service process can act as a dispatcher, distributing tasks across multiple processes and relying on network traffic. Because the managers module is well encapsulated, it is easy to write distributed multi-process programs without having to understand the details of network traffic.

For example, if we already have a multi-process program with queue communication running on the same machine, now, due to the heavy workload of processing tasks, we want to distribute the process of sending tasks and the process of processing tasks to two machines. How to implement with distributed process?

The existing queue can continue to be used, but by exposing the queue through the network through the Managers module, the process of other machines can access the queue.

Let's look at the service process, the service process is responsible for starting the queue, registering the queue on the network, and then writing the task to the queue:


import random, Time, queuefrom multiprocessing.managers import basemanager# queue for sending tasks: Task_queue = Queue. Queue () # Queues that receive results: Result_queue = queue. Queue () # Queuemanager:class QueueManager (Basemanager) inherited from Basemanager: pass# registers two queues on the network with the callable parameter associated with the queue object: Queuemanager.register (' Get_task_queue ', callable=lambda:task_queue) queuemanager.register (' Get_result_queue ', Callable=lambda:result_queue) # Bind Port 5000, set the Captcha ' abc ': Manager = QueueManager (address= (",", "," Authkey=b ' abc ') # Start Queue:manager.start () # Get a queue object accessed over the network: task = manager.get_task_queue () result = Manager.get_result_queue () # Put in a few quests: for I in range: n = random.randint (0, 10000) print (' Put task%d ... '% n) task.put (n) # reads results from the result queue: print (' Try get results ... ') for I in range: R = Result.get (timeout=10) print (' Result:%s '% r) # off: Manager.shutdown () print (' Master exit. ') 

When we write a multi-process program on a machine, the created queue can be used directly, but in a distributed multi-process environment, adding tasks to the queue can not directly operate on the original Task_queue, which bypasses the QueueManager package, The queue interface that must be obtained through Manager.get_task_queue () is added.

Then, start the task process on another machine (it can also be started on this machine):


Import time, sys, queuefrom multiprocessing.managers import basemanager# Create a similar queuemanager:class QueueManager ( Basemanager):  pass# because this queuemanager only gets the queue from the network, it only provides the name when registering: Queuemanager.register (' Get_task_queue ') Queuemanager.register (' Get_result_queue ') # connects to the server, which is the machine running task_master.py: server_addr = ' 127.0.0.1 ' Print (' Connect To server%s ... '% server_addr) # port and Authenticode note remain fully consistent with task_master.py settings: M = QueueManager (address= (SERVER_ADDR,), Authkey =b ' abc ') # From Network connection: M.connect () # Gets the object of the queue: task = m.get_task_queue () result = M.get_result_queue () # Fetch tasks from the task queue, and writes the results to the result queue: For I in range:  try:    n = task.get (timeout=1)    print (' Run task%d *%d ... '% (n, N))    r = '%d *%d =%d '% (n, N, n*n)    time.sleep (1)    Result.put (R)  except Queue.empty:    print (' Task Queue is Empty. ') # Processing Ends: print (' worker exit. ')

The task process is to connect to the service process over the network, so specify the IP of the service process. Http://www.jb51.net/article/65112.htm

Summary

Python's distributed Process interface is simple, well packaged, and suitable for environments where heavy tasks need to be distributed across multiple machines.

Note that queue is used to transfer tasks and receive results, and the amount of descriptive data for each task should be as small as possible. For example, to send a task to process log files, do not send hundreds of megabytes of the log file itself, but send the full path of log file storage, the worker process to share the disk to read the file.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.