Python Full Stack Development Foundation "22nd" process pool and callback function

Source: Internet
Author: User
Tags mutex

First, data sharing

1. Inter-process communication should try to avoid sharing data in a way

2. The data between processes is independent and can be communicated using queues or pipelines, both of which are based on message passing.

Although the inter-process data is independent, you can use the manager to achieve data sharing, in fact, the manager is much more than that.

| findstr  pycharm   #(findstr is filtered), | is the pipeline (the content of the tasklist is put into the pipeline, the findstr  behind the pipe Pycharm on the receiving)

3. Communication between (IPC) processes is implemented in two ways: pipelines and queues

# Data sharing from multiprocessing import manager,process,lockdef work (Dic,mutex):    # Mutex.acquire ()    # dic[' Count ']-= 1    # mutex.release ()    # can also be locked with    mutex:        dic[' count ']-= 1if __name__ = = ' __main__ ':    mutex = Lock ()    m = Manager ()  #实现共享, because the dictionary is a shared dictionary, you have to add a lock    share_dic = m.dict ({' Count ': +})    p_l = [] for    I in range (        p = Process (target=work,args= (Share_dic,mutex))        p_l.append (p)  #先添加进去        p.start ()    For i in p_l:        i.join ()    print (share_dic) # Sharing means there will be competition,

 

Second, Process pool

In the use of Python for system management, especially the simultaneous operation of multiple file directories, or remote control of multiple hosts, parallel operation can save a lot of time. Multi-process is one of the means to achieve concurrency, the problems to be noted are:

    1. It is clear that the tasks that need to be performed concurrently are typically much larger than the number of cores
    2. An operating system cannot be opened indefinitely, usually with several cores opening several processes
    3. The process is too open and the efficiency will decrease (the open process requires system resources, and the process of opening the number of extra cores cannot be parallel)

For example, when the number of objects is small, can be directly used in multiprocessing process dynamic genetic multiple processes, more than 10 is OK, but if it is hundreds, thousands of ... Manually to limit the number of processes is too cumbersome, at this time can play the role of process pool.

So what is a process pool? a process pool is a process that controls the number of processes in the form of resource pools.

For high-level applications for remote procedure calls, the process pool should be used, and the pool can provide a specified number of processes for the user to invoke, and when a new request is submitted to the pool, a new process is created to execute the request if it is not already full, but if the number of processes in the pool has reached the specified maximum , the request waits until the process ends in the pool, reusing processes in the process pool.

Structure of the process pool:

Class to create a process pool: If you specify Numprocess as 3, the process pool creates three processes from scratch and then uses the three processes to perform all tasks, without opening other processes

1. Create a process pool

Pool ([numprocess  [, initializer [, Initargs]]): Create a process pool

2. Parameter Introduction

Numprocess: The number of processes to be created, if omitted, defaults to Cpu_count (), Os.cpu_count () to view initializer: is the callable object to execute at the start of each worker process. Default = Noneinitargs: is the parameter group to pass to initializer

3. Method Introduction

P.apply (func [, args [, Kwargs]): Executes func (*args,**kwargs) in a pool worker process and returns the result. It should be emphasized that this operation does not execute the Func function in all pool worker processes. If you want to execute the Func function concurrently with different parameters, you must call the P.apply () function from a different thread or use the P.apply_async ()  P.apply_async (func [, args [, Kwargs]]): Executes func (*args,**kwargs) in a pool worker process and returns the result. The result of this method is an instance of the AsyncResult class, and callback is a callable object that receives input parameters. When the result of Func becomes available, the understanding is passed to callback. Callback does not prohibit any blocking operations, otherwise it will receive results from other asynchronous operations.    P.close (): Closes the process pool to prevent further action. Prohibit adding tasks to the process pool (note that you must write above close () p.jion (): Wait for all worker processes to exit. This method can only be called after close () or teminate ()

Application 1:

  

# Apply synchronization process pool (blocking) (serial) from multiprocessing import poolimport os,timedef task (n) : Print (' [%s] is running '%os.getpid ()) Time.sleep (2) print (' [%s] was done '%os.getpid ()) return n**2if __name__ = = ' __main__ ': # print (Os.cpu_count ()) #查看cpu个数 p = Pool (4) #最大四个进程 for I in Range (1,7): #开7个任务 res = P.ap Ply (task,args= (i)) #同步的, waiting for one to run out before executing another print (' End of this task:%s '%res) p.close () #禁止往进程池内在添加任务 p.join () #在等进程池 pri NT (' master ') 
# Apply_async Asynchronous Process pool (non-blocking) #----------------# So why do we use a process pool? This is because the process pool is used to control the number of processes, # We need several to open several processes. If you do not use a process pool to implement concurrency, there will be a lot of process # if you drive a very large number of processes, then your machine will be very card, so we put the process control, with a few on the # Open a few, and will not take up memory from multiprocessing import Poolimport OS, Timedef Walk (N):    print (' task[%s] running ... '%os.getpid ())    Time.sleep (3)    return n**2if __name__ = = ' __ main__ ':     p = Pool (4)     res_obj_l = [] for     i in range:         res = P.apply_async (walk,args= (i))         # Print ( RES)  #打印出来的是对象         res_obj_l.append (res)  #那么现在拿到的是一个列表, how do I get a value? We use a. Get Method     P.close () #禁止往进程池里添加任务     p.join ()     # Print (res_obj_l)     print ([Obj.get () for obj in Res_ Obj_l])  #这样就得到了

  

The difference between synchronous and asynchronous

Synchronization is when a process executes a request, and if it takes a while for the request to return information, the process waits until the return message is received before it continues.

Async means that a process does not have to wait, but continues to do the following, regardless of the state of other processes. The system notifies the process when a message is returned, which can improve the efficiency of execution.

Serial and parallel differences

Example: Can drive a few cars side by step can be said to be "parallel", only one car can be opened in the "serial". Obviously, parallel speeds are much faster than serial. (parallel to each other, serial waiting for one to be finished before the other)

Application 2:

Maintain a fixed number of processes using a process pool (previous client and server improvements)

#服务端from Socket Import *from multiprocessing Import pools = socket (af_inet,sock_stream) s.setsockopt (sol_socket,so_ reuseaddr,1) #端口重用s. Bind ((' 127.0.0.1 ', 8081)) S.listen (5) Print (' Start running ... ') def talk (COON,ADDR): While    True:  # Communication Loop        try:            cmd = coon.recv (1024x768)            print (Cmd.decode (' Utf-8 '))            if not cmd:break            Coon.send (Cmd.upper ())            print (' Send%s '%cmd.upper (). Decode (' Utf-8 '))        except Exception: Break    Coon.close () if __name__ = = ' __main__ ':    p = Pool (4) while    True: #链接循环        coon,addr = s.accept ()        print ( COON,ADDR)        P.apply_async (talk,args= (coon,addr))    s.close () #因为是循环 so you don't have to p.join.
#客户端from Socket Import *c = socket (af_inet,sock_stream) c.connect ((' 127.0.0.1 ', 8081)) while True:    cmd = input (' > Strip ()    if not cmd:continue    c.send (Cmd.encode (' Utf-8 '))    data = C.recv (1024x768)    print (' Accept%s ') %data.decode (' Utf-8 ')) C.close ()

Third, callback function

When does the callback function work? (callback functions are most commonly used in crawlers) time-consuming processing of data is not time consuming. If the address you downloaded is complete, it will automatically remind the main process to tell the parsing function to parse (the power of the callback function).

scenario where a callback function is required: once any of the tasks in the process pool have been processed, inform the main process immediately: I'm all right, you can handle my results. The main process calls a function to process the result, which is the callback function

We can put the time-consuming (blocking) task into the process pool and then specify the callback function (the main process is responsible for executing) so that the main process eliminates the I/O process when executing the callback function, and the result of the task is directly obtained.

 #回调函数 (small example of downloading a webpage) from multiprocessing import Poolimport Requestsimport  Osimport timedef get_page (URL): Print (' <%s> is getting [%s] '% (os.getpid (), url)) response = Requests.get (URL) #得到地址 time.sleep (2) print (' <%s> was done [%s] '% (os.getpid (), URL)) return {' url ': url, ' text ': response.text}d  EF Parse_page (res): ' Analytic function ' ' Print (' <%s> parse [%s] '% (Os.getpid (), res[' url ') ') with open (' Db.txt ', ' a ') as F:parse_res = ' url:%s size:%s\n '% (res[' url '],len (res[' text '])) F.write (parse_res) if __name__ = = ' __main__ ': P = Pool (4) URLs = [' https://www.baidu.com ', ' http://www.openstack.org ', ' Https://www.pytho N.org ', ' https://help.github.com/', ' http://www.sina.com.cn/'] for the URL in urls:obj = P.apply_ Async (get_page,args= (URL,), callback=parse_page) P.close () P.join () print (' Master ', Os.getpid ()) #都不用. Get () method 

If you wait for all the tasks in the process pool to finish executing in the main process and then process the results uniformly, you do not need a callback function

#下载网页小例子 (no callback function required) from  multiprocessing import poolimport requestsimport osdef get_page (URL):    print (' <%os > get [%s] '% (os.getpid (), URL))    response = requests.get (URL)  #得到地址  Response response    return {' url ': URL, ' Text ': response.text}if __name__ = = ' __main__ ':    p = Pool (4)    urls = [        ' https://www.baidu.com ',        ' http:/ /www.openstack.org ',        ' https://www.python.org ',        ' https://help.github.com/',        ' HTTP// www.sina.com.cn/'    ]    obj_l= [] for    URL in urls:        obj = P.apply_async (get_page,args= (URL,))        Obj_l.append (obj)    p.close ()    p.join ()    print ([Obj.get () for obj in obj_l])

  

Python Full Stack Development Foundation "22nd" process pool and callback function

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.