32. Process pool and callback function, 32 callback function

Source: Internet
Author: User

32. Process pool and callback function, 32 callback function

Multi-process is one of the ways to achieve concurrency. We have learned that it is impossible for an operating system to enable multi-process without limit. Enabling a process requires resources. So today, we can learn about a process pool that can be used to manage processes we have enabled.

 

I. Share Data

(This data sharing method is not recommended only for understanding. The queue mentioned above should be used for inter-process communication)

Inter-process data is independent and can be communicated through queues or pipelines. Both are based on message transmission.

Although data is independent between processes, data can be shared through the Manager. In fact, the Manager function is far more than that.

A manager object returned by Manager () controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

A manager returned by Manager () will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array.

From multiprocessing import Manager, Process, Lockdef work (dic, mutex): # mutex. acquire () # dic ['Count']-= 1 # mutex. release () # If you operate the shared data without locking, data disorder will certainly occur. with mutex: dic ['Count']-= 1if _ name _ = '_ main _': mutex = Lock () m = Manager () # m. list ([1, 2, 3]) 1__dic = m. dict ({'Count': 100}) p_l = [] for I in range (100): p = Process (target = work, args = (1__dic, mutex )) p_l.append (p) p. start () for I in p_l: I. join () print (pai_dic)

 

2. Process pool (key)

When using Python for system management, especially operating multiple file directories at the same time, or remotely controlling multiple hosts, parallel operations can save a lot of time. Multi-process is one of the methods to achieve concurrency. The following issues must be noted:

1. It is obvious that tasks that need to be executed concurrently are generally much larger than the number of cores.

2. It is impossible for an operating system to enable processes without limit. There are usually several processes with several cores.

3. Too many processes are enabled, but the efficiency will decrease (enabling processes requires occupying system resources, and enabling processes with excess cores cannot be parallel)

For example, when the number of objects to be operated is small, you can directly use the Process in multiprocessing to dynamically generate multiple processes. A dozen processes are fine, but if they are hundreds, they are thousands... Manually limiting the number of processes is too cumbersome. In this case, you can use the process pool function.

1. Create a process pool class:If the value of numprocess is 3, the process pool will create three processes from scratch, and then use these three processes to execute all the tasks from start to end without starting other processes.

Pool ([numprocess [, initializer [, initargs]): Creates a process Pool.

2. Parameter introduction:

1 numprocess: Number of processes to be created. If omitted, the cpu_count () value (cpu_count () will be used by default. You can view the number of cores in the OS module) 2 initializer: it is the callable object to be executed when each worker starts. The default value is None3 initargs: it is the parameter group to be passed to initializer.

3. Main Methods:

P. apply (func [, args [, kwargs]): Execute func (* args, ** kwargs) in a pool Worker Process and return the result. It should be emphasized that this operation will not execute the func function in all pool worker processes. To Concurrently execute the func function using different parameters, you must call p. apply () function or use p. apply_async () p. apply_async (func [, args [, kwargs]): Execute func (* args, ** kwargs) in a pool worker process, and then return the result. The result of this method is an instance of the AsyncResult class, and callback is a callable object that receives input parameters. When the result of func becomes available, it is passed to callback. Callback prohibits the execution of any blocking operation. Otherwise, it will receive results from other asynchronous operations. P. close (): close the process pool to prevent further operations. If all operations continue to be suspended, they will complete P. jion () before the termination of work processes: waiting for all work processes to exit. This method can only be called after close () or teminate ()

4. other methods (understanding)

The return values of Methods apply_async () and map_async () are AsyncResul instance obj. The instance has the following methods: obj. get (): return results. If necessary, wait until the results arrive. Timeout is optional. If it does not arrive within the specified time range, an event is triggered. If an exception is thrown during a remote operation, the method is triggered again when this method is called. Obj. ready (): If the call is completed, Trueobj is returned. successful (): If the call is completed and no exception is thrown, True is returned. If this method is called before the result is ready, an exception obj is thrown. wait ([timeout]): wait for the result to become available. Obj. terminate (): terminate all working processes immediately without any cleanup or suspension. If p is reclaimed, this function is automatically called.

5. Applications

1) apply synchronous execution: Blocking

From multiprocessing import Poolimport OS, timedef work (n): print ('% s run' % OS. getpid () time. sleep (3) return n ** 2if _ name _ = '_ main _': p = Pool (3) # create three processes in the process pool from scratch. These three processes will be executing the task res_l = [] for I in range (10): res = p. apply (work, args = (I,) # Run synchronously, block, and get res res_l.append (res) print (res_l) after this task is completed)

2) asynchronous execution of apply_async: non-blocking

From multiprocessing import Poolimport OS, timedef work (n): print ('% s run' % OS. getpid () time. sleep (3) return n ** 2if _ name _ = '_ main _': p = Pool (3) # create three processes in the process pool from scratch. These three processes will be executing the task res_l = [] for I in range (10): res = p. apply_async (work, args = (I,) # Run synchronously, blocking, until the execution of this task is completed to get res res_l.append (res) # asynchronous apply_async usage: if you use a task submitted asynchronously, the main process needs to use jion, wait until all the tasks in the process pool are completed, and then you can use get to collect the results. Otherwise, the main process ends and the process pool may not be able to run yet, it ends with p. close () # Do not add task p to the process pool. join () for res in res_l: print (res. get () # Use get to get the result of apply_aync. If it is apply, there is no get method. Because apply is executed synchronously, get the result immediately, and no get is required.

3) Details: apply_async and apply

# A: Use the Process pool (non-blocking, apply_async) # coding: utf-8from multiprocessing import Process, Poolimport timedef func (msg): print ("msg:", msg) time. sleep (1) return msgif _ name _ = "_ main _": pool = Pool (processes = 3) res_l = [] for I in range (10): msg = "hello % d" % (I) res = pool. apply_async (func, (msg,) # The total number of processes that are maintained and executed is processes. After a process is executed, a new process is added to res_l.append (res) print ("=============================> ") # No subsequent join, Or get, the entire program ends, and the tasks in the process pool end with the main process before they are fully executed. close () # close the process pool to prevent further operations. If all operations continue to be suspended, they will call the close function before the pool. join () # Call the join function before the workflow ends. Otherwise, an error will occur. No new process will be added to the pool after close is executed. The join function waits for all sub-processes to end print (res_l) # What you see is <multiprocessing. pool. applyResult object at 0x10357c4e0> A list composed of objects instead of the final result. However, this step is executed after join, and the result is calculated, the rest is to call the get method under each object to obtain the result for I in res_l: print (I. get () # Use get to get the results of apply_aync. If it is apply, there is no get method. Because apply is executed synchronously and the results are obtained immediately, no get is required. #2: use a Process pool (blocking, apply) # coding: utf-8from multiprocessing import Process, Poolimport timedef func (msg): print ("msg:", msg) t Ime. sleep (0.1) return msgif _ name _ = "_ main _": pool = Pool (processes = 3) res_l = [] for I in range (10): msg = "hello % d" % (I) res = pool. apply (func, (msg,) # The total number of processes to be executed is processes. After a process is executed, a new process is added to res_l.append (res) # synchronous execution, that is to say, after executing one operation, you can get the result, run another print ("==============================> ") pool. close () pool. join () # Call the close function before calling join. Otherwise, an error occurs. After close is executed, no new process will be added to the pool. The join function waits for all sub-processes to end print (res_l) # The final result list is displayed. for I in res_l: # apply is synchronous, so the result is obtained directly. The get () method print (I) is not available)
Apply_async and apply

4) Improve previous link Loops

# The default number of processes in the Pool is the number of cpu cores, which is assumed to be 4 (view method OS. cpu_count () # When six clients are enabled, two clients are waiting. # view the pid in each process and four PIDs are used, that is, multiple clients share four processes from socket import * from multiprocessing import Poolimport osserver = socket (AF_INET, SOCK_STREAM) server. setsockopt (SOL_SOCKET, SO_REUSEADDR, 1) server. bind ('2017. 0.0.1 ', 8080) server. listen (5) def talk (conn, client_addr): print ('process pid: % s' % OS. getpid () while True: try: msg = conn. recv (1024) if not msg: break conn. send (msg. upper () failed t Exception: breakif _ name _ = '_ main _': p = Pool () while True: conn, client_addr = server. accept () p. apply_async (talk, args = (conn, client_addr) # p. apply (talk, args = (conn, client_addr) # for synchronization, only one client can access
Server
from socket import *client=socket(AF_INET,SOCK_STREAM)client.connect(('127.0.0.1',8080))while True:    msg=input('>>: ').strip()    if not msg:continue    client.send(msg.encode('utf-8'))    msg=client.recv(1024)    print(msg.decode('utf-8'))
Client

When multiple clients are enabled concurrently, the server only has three different PIDs at the same time. When one of the clients ends, the other clients will come in and be processed by one of the three processes.

 

3. Return the Function

In scenarios where callback functions are required: Once any task in the process pool is completed, the main process is immediately notified: I am finished, and you can process my results. The main process calls a function to process the result.

We can put time-consuming (blocking) Tasks in the process pool, and then specify the callback function (the main process is responsible for execution ), in this way, the main process saves the I/O process when executing the callback function and directly obtains the task result.

# Pip3 install requests # download the requests module (download in cmd) from multiprocessing import Poolimport requestsimport osimport timedef get_page (url ): print ('<% s> is getting [% s]' % (OS. getpid (), url) response = requests. get (url) time. sleep (2) print ('<% s> is done [% s]' % (OS. getpid (), url) return {'url': url, 'text': response. text} def parse_page (res): print ('<% s> parse [% s]' % (OS. getpid (), res ['url']) with open('db.txt ', 'A') as f: parse_res = 'url: % s size: % s \ n' % (res ['url'], len (res ['text']) f. write (parse_res) if _ name _ = '_ main _': p = Pool (4) urls = ['https: // www.baidu.com ', 'HTTP: // www.openstack.org ', 'https: // www.python.org', 'https: // help.github.com/', 'HTTP: // www.sina.com.cn/'] for url in urls: p. apply_async (get_page, args = (url,), callback = parse_page) p. close () p. join () print ('main', OS. getpid ())

If you wait for all the tasks in the process pool to be executed in the main process and then process the results in a unified manner, no callback function is required.

Crawler case:

from multiprocessing import Poolimport time,randomimport requestsimport redef get_page(url,pattern):    response=requests.get(url)    if response.status_code == 200:        return (response.text,pattern)def parse_page(info):    page_content,pattern=info    res=re.findall(pattern,page_content)    for item in res:        dic={            'index':item[0],            'title':item[1],            'actor':item[2].strip()[3:],            'time':item[3][5:],            'score':item[4]+item[5]        }        print(dic)if __name__ == '__main__':    pattern1=re.compile(r'<dd>.*?board-index.*?>(\d+)<.*?title="(.*?)".*?star.*?>(.*?)<.*?releasetime.*?>(.*?)<.*?integer.*?>(.*?)<.*?fraction.*?>(.*?)<',re.S)    url_dic={        'http://maoyan.com/board/7':pattern1,    }    p=Pool()    res_l=[]    for url,pattern in url_dic.items():        res=p.apply_async(get_page,args=(url,pattern),callback=parse_page)        res_l.append(res)    for i in res_l:        i.get()    # res=requests.get('http://maoyan.com/board/7')    # print(re.findall(pattern,res.text))
Crawler case

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.