17, Seventh week-network programming-the concept of the process, the Gevent module concurrent Crawl Web page

Source: Internet
Author: User


Co-process, also known as micro-threading, fiber. What is a thread: The process is a lightweight thread of user-state.

The co-process has its own register context and stack. When the schedule is switched, the register context and stack are saved elsewhere, and the previously saved register context and stack are restored when it is cut back. Thus: The process can retain the state of the last invocation (that is, a specific combination of all local states), each time the procedure is re-entered, which is equivalent to the state of the last call, in other words: The position of the logical flow at the last departure.

Benefits of the co-process:

    • No overhead for thread context switching
    • No need for atomic operation locking and synchronization overhead (note:"atomic operations (atomic operation) do not require synchronized", so-called atomic operations are actions that are not interrupted by the thread-scheduling mechanism; Once this operation starts, it runs until the end, without any (Switch to another thread.) An atomic operation can be a step or multiple steps, but its order cannot be disrupted, or it can be cut off only the execution part. As a whole is the nucleus of atomicity. )
    • Easy switching of control flow and simplified programming model
    • High concurrency + high scalability + Low cost: A CPU support for tens of thousands of processes is not a problem. Therefore, it is suitable for high concurrency processing.

Disadvantages of the co-process:

    • Unable to take advantage of multicore resources: The nature of the process is a single thread, it can not be a single CPU at the same time multiple cores, the process needs and processes to run on multi-CPU. Of course, most of the applications that we write in the day-out are not necessary, except for CPU-intensive applications.
    • Blocking (Blocking) operations (such as IO) can block the entire program

Conforming to the standard of the co-process

    1. Concurrency must be implemented in only one single thread
    2. No lock required to modify shared data
    3. The context stack in the user program that holds multiple control flows
    4. A coprocessor encounters an IO operation that automatically switches to other co-threads

Example:

1, using yield to achieve the co-operation:
import timeimport queuedef Consumer (name): Print ("---> Starting eating Baozi ... ") while True:new_baozi = yield print (" [%s] is eating Baozi%s "% (name, New_bao        Zi) # time.sleep (1) def producer (): R = con.__next__ () R = con2.__next__ () n = 0 while n < 5: n + = 1 con.send (n) con2.send (n) Print ("\033[32;1m[producer]\033[0m is making Baozi%s"% n) if __name __ = = ' __main__ ': con = consumer ("C1") Con2 = Consumer ("C2") p = producer () Output:--->starting eating baozi ...---& Gt;starting eating Baozi ... [C1] is eating baozi 1[c2] are eating baozi 1[producer] is making baozi 1[c1] are eating baozi 2[c2] is eating baozi 2[produ CER] is making baozi 2[c1] are eating baozi 3[c2] is eating baozi 3[producer] are making baozi 3[c1] is eating baozi 4[c2] I  S eating Baozi 4[producer] is making Baozi 4[c1] is eating baozi 5[c2] are eating baozi 5[producer] is making Baozi 5 
2. Using a third-party module: Greenlet (manually specify the execution switch co-process)

Greenlet is a process module implemented in C, which allows you to switch freely between any function, rather than declaring the function as generator, compared to the yield that comes with python.

Greenlet is a library of Python's parallel processing. Python has a very famous library called Stackless, used to do concurrent processing, mainly to get a micro-thread called Tasklet, and Greenlet and stackless the biggest difference is that greenlet need you to handle the thread switching, that is, You need to specify which Greenlet to execute now and which greenlet to execute. equivalent to manually switching the co-process.

A "Greenlet" is a small, independent pseudo-thread. You can think of it as some stack frame, the bottom of which is the function of the initial call, and the top of the stack is the current Greenlet pause position. You use Greenlet to create a bunch of these stacks and then jump between them to execute. Jumps must be explicitly declared: A greenlet must select another greenlet to jump to, which will cause the previous one to hang, and the latter to resume execution at the front. Jumps between different greenlets are called transitions (switching).

When you create a greenlet, it gets a stack that starts empty, and when you first switch to it, it executes the specified function, which may call other functions, switch out of Greenlet, and so on. The Greenlet stack becomes empty when the end-of-stack function executes, and the Greenlet dies. Greenlet will also die because of an unchecked anomaly.

Example:

From Greenlet import Greenletdef test1 (): Print    #2, printing    gr2.switch () #3, switch to test2    print #6, print    Gr2.switch () #7, switch to Test2def test2 (): Print    (#4, printing    gr1.switch () #5, switch to test1 print    #8, print, perform GR1 = Greenlet (test1) #启动一个协程gr2 = Greenlet (test2) #启动一个协程gr1. switch ()  #1, start the call to toggle the output: 12563478 Note: The sequence of steps performed, from 1-8.

The above example is not able to automatically switch in the process, Greenlet can only specify execution manually, but it is much simpler for the generator. How do I implement automatic monitoring and automatic switching of the co-processes? Introduce the Gevent module.

3, the use of third-party modules: Gevent (automatic monitoring and automatic switching of the co-process)

Gevent is a third-party library that makes it easy to implement concurrent or asynchronous programming through Gevent, and the main pattern used in Gevent is Greenlet, which is a lightweight coprocessor that accesses Python in the form of a C extension module. Greenlet all run inside the main program operating system process, but they are dispatched in a collaborative manner.

Gevent is a third-party library, through the implementation of Greenlet, the basic idea is: when a greenlet encounter IO operation, such as access to the network, automatically switch to the other Greenlet, wait until the IO operation is completed, and then switch back in the appropriate time to continue execution. Because the IO operation is very time-consuming and often puts the program in a waiting state, with gevent automatically switching the co-process for us, it is guaranteed that there will always be greenlet running, rather than waiting for IO. Since the switchover is done automatically during IO operations, Gevent needs to modify some of the standard libraries that Python comes with, which is done by Monkey Patch at startup.

Included attributes:
1. Fast event loop based on Libev
2. Greenlet-based lightweight execution unit
3. Reusing the Python standard library with a similar conceptual API
4. SSL-enabled collaboration sockets
5. DNS queries via c-ares or thread pooling
6. Ability to use blocking socket code in standard libraries and third-party libraries

The third library requires additional, open source packages for installation.

Example:

A, the verification gevent through the automatic judgment, chooses the best line carries on the judgment execution . Note: You can gevent.sleep () to adjust the time for verification testing. Conclusion: The last time each function is printed, the shorter the wait, the more it is executed first.

Import geventdef foo ():    print (' running foo ')    gevent.sleep (3)    print (' Explicit context switch to foo again ') def bar ():    print (' Running bar ')    gevent.sleep (6)    print (' implicit context switch back to bar ') def func3 (): C6/>print ("Running func3")    gevent.sleep (1)    print ("Switch back to Func3") Gevent.joinall ([    gevent.spawn ( Foo),    gevent.spawn (bar),    gevent.spawn (func3)) output: Running foorunning barrunning Func3switch back to Func3explicit context Switch to foo againimplicit context switch back to bar

B. The difference between synchronous and asynchronous performance is as follows:

 An important part of the program is to encapsulate the task function into the Greenlet internal thread gevent.spawn . The initialized greenlet list is stored in the array threads , which is passed to the gevent.joinall function, which blocks the current process and executes all the given Greenlet. The execution process will not continue until all greenlet have been executed.

Import geventdef Task (PID):    gevent.sleep (1)    print (' task%s done '% PID) def synchronous ():    for I in range (1, 6 ): #range从1到5打印        Task (i) def asynchronous ():    threads = [Gevent.spawn (task, I) for I in range (5)]    Gevent.joinall (threads) #print (' Synchronous: ') synchronous () #正常函数, the serial call prints print ("") at a time of each 1s printing (' asynchronous: ') Asynchronous () #并行打印, waiting for a one-time print out output: Synchronous:task 1 donetask 2 donetask 3 donetask 4 donetask 5 done waiting for print asynch Ronous:task 0 donetask 1 donetask 2 donetask 3 Donetask 4 Done

   c, gevent co-crawl Web page, The business is switched on automatically when an IO block is encountered. Examples are as follows:

From gevent import monkey;monkey.patch_all () Import gevent,timefrom urllib.request import urlopendef F (URL): Print (' GET :%s '% url) resp = urlopen (URL) data = Resp.read () #print (data) #打印爬取到的网页内容 print ('%d bytes received from%s. '% (len (data), URL)) Time_start = Time.time () urls = [' http://www.cnblogs.com/alex3714/articles/5248247.html ', '/HTTP/    Www.cnblogs.com/chen170615/p/8797609.html ', ' http://www.cnblogs.com/chen170615/p/8761768.html ',]for i in URLs: f (i) print ("Synchronous Execution Time:", Time.time ()-Time_start) print ("") Async_time_start = Time.time () gevent.joinall ([Gevent.spawn ( F, ' http://www.cnblogs.com/alex3714/articles/5248247.html '), Gevent.spawn (F, ' http://www.cnblogs.com/chen170615/p /8797609.html '), Gevent.spawn (F, ' http://www.cnblogs.com/chen170615/p/8761768.html ')]) print ("Asynchronous Execution Time:", Time.time ( )-Async_time_start) Output: get:http://www.cnblogs.com/alex3714/articles/5248247.html92147 bytes received from/HTTP Www.cnblogs.com/alex3714/articles/5248247.html. get:http://www.cnblogs.com/chen170615/p/8797609.html10930 Bytes received from http://www.cnblogs.com/chen170615/p/ 8797609.html. get:http://www.cnblogs.com/chen170615/p/8761768.html11853 Bytes received from http://www.cnblogs.com/chen170615/p/ 8761768.html. Synchronous execution Time 20.319132089614868 get:http://www.cnblogs.com/alex3714/articles/5248247.htmlget:http:// www.cnblogs.com/chen170615/p/8797609.htmlGET:http://www.cnblogs.com/chen170615/p/8761768.html11853 bytes Received from http://www.cnblogs.com/chen170615/p/8761768.html.10930 bytes received from http://www.cnblogs.com/ chen170615/p/8797609.html.92147 bytes received from http://www.cnblogs.com/alex3714/articles/5248247.html. Asynchronous execution Time: 0.28768205642700195

As can be seen from the above example, the performance of asynchronous concurrent execution of Gevent is higher than that of synchronous serial execution, and the performance of asynchronous is excellent when it encounters the IO that waits. (You can see the contrast when you do it a few more times.) )

D, through gevent realize multi-socket concurrency in single-threaded process

Example:

Service side:

Co-process gevent_socket_server.py

Import sys,socket,time,geventfrom gevent Import socket,monkeymonkey.patch_all () def server (port):    Gevent_server = Socket.socket ()    gevent_server.bind ((' 0.0.0.0 ', port))    Gevent_server.listen ()    while True:        CLI, addr = gevent_server.accept ()        Gevent.spawn (HANDLE_REQUEST,CLI) def handle_request (conn):    try:        while True:            data = CONN.RECV (1024x768)            print ("recv:", data)            conn.send (data)            if not data:                Conn.shutdown (socket.) SHUT_WR)    except Exception as ex:        print (ex)    finally:        conn.close () if __name__ = = "__main__":    Server (8001)
There are two types of clients, as follows:

Client 1: Co-gevent_socket_client.py (normal manual input mode)

Import sockethost = "localhost" PORT = 8001s = Socket.socket (socket.af_inet,socket. Sock_stream) S.connect ((Host,port)) while True:    msg = bytes (input (">>:"). Strip (), encoding= "Utf-8")    S.sendall (msg)    data = S.recv (1024x768)    print (' Received ', repr (data)) S.close ()

Client 2: Co-gevent_socket_cli.py (concurrent execution by way of process)

Import socket,threadinghost = "localhost" PORT = 8001def sock_conn ():   s = socket.socket ()   s.connect ((host,port) )   count = 0 while   True:       s.sendall (("Hello%s"% count). Encode ("Utf-8"))       data = S.recv (1024x768)       Print ("[%s]recv from server:"% threading.get_ident (), Data.decode ())       Count + = 1   s.close () for I in range: #测试注意数值, do not set too large. Otherwise, the machine is stuck dead    t = Threading. Thread (Target=sock_conn)    T.start ()

17, Seventh week-network programming-the concept of the process, the Gevent module concurrent Crawl Web page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.