Who said the python process is chicken ribs! Stand out, I won't shoot him! It's such a cool ride!

Source: Internet
Author: User
Tags switches

The article thought: This article will introduce the concept of the co-process, then introduce the use of python2.x and 3.x, and finally compare the association with multithreading and introduce the asynchronous crawler module.

Co-process

Concept

Co-process, also known as micro-threading, Fiber, English name Coroutine. The function of the process is that when function A is executed, it can be interrupted at any time to execute function B, and then the interrupt continues to execute function a (which is free to switch). But this process is not a function call (no call statement), and this whole process looks like multi-threading, but only one thread executes.

Incoming group: 548377875 to get dozens of sets of PDFs Oh!

python2.x co-process

python2.x Application:

    • Yield
    • Gevent

python2.x in support of the module is not many, gevent is more commonly used, here is a brief introduction of gevent usage.

Gevent

Gevent is the third-party library, through the Greenlet implementation of the process, the basic idea:

When an greenlet encounters an IO operation, such as accessing the network, it automatically switches to the other Greenlet, waits until the IO operation is complete, and then switches back to execution at the appropriate time. Because the IO operation is very time-consuming and often puts the program in a waiting state, with gevent automatically switching the co-process for us, it is guaranteed that there will always be greenlet running, rather than waiting for IO.

Operation Result:

Start 0

Start 1

Start 2

End 2

End 0

End 1

Note: From the results, the order of execution get_body should first output "start", and then execute to URLLIB2 when the IO Jam, will automatically switch to run the next program (continue to execute get_body output start) until URLLIB2 return the results, Execute end again. In other words, the program does not wait for the URLLIB2 request site to return results, but instead skips directly, waits for execution to complete before returning to get the return value. It is worth mentioning that, in this process, only one thread is executing, so this is not the same as the concept of multithreading.

Gevent Instructions for use

    • Monkey can make some blocked modules become non-blocking, mechanism: When the IO operation is automatically switched, manual switching can be used Gevent.sleep (0) (the crawler code for this, the same effect can be reached the switch context)
    • Gevent.spawn startup process, parameter is function name, parameter name
    • Gevent.joinall Stop co-process

python3.x co-process

In order to test the python3.x application under the virtualenv, I installed the python3.6 environment under the background.

python3.x Application:

    • Asynico + yield from (python3.4)
    • Asynico + await (python3.5)
    • Gevent

Python3.4 later introduced the Asyncio module, can be very good support for the process.

Asynico

The Asyncio is a standard library introduced in Python version 3.4, and is built directly with support for asynchronous IO. The asynchronous operation of Asyncio needs to be done through yield from coroutine.

Description: From the running results can be seen, with the gevent to achieve the same effect, but also in the experience of IO operation to switch (so the output test_1, etc. test_1 output and then output test_2). But here I am a little unclear, why is the output of test_1 not executed in order? You can compare the output of the gevent (hopefully the big God can answer it).

Usage

Example (python3.5 later version used):

Gevent

Same as python2.x usage.

Co-process vs multithreading

If through the above introduction, you already understand the multi-threading and the difference between the process, then I think the test is not necessary. Because when threads become more and more numerous, the main overhead of multithreading is spent on thread switching, and the co-process is switched within one of the threads, so the overhead is much smaller, which is perhaps the fundamental difference between the performance of the two. (personal view)

Asynchronous crawler

Perhaps the friend who cares about the association, most of it is with its crawler (because the association can solve the IO blocking problem well), but I found that the common Urllib, requests can not be used in conjunction with Asyncio, probably because the crawler module itself is synchronous (or I can not find the usage). So what about the need for asynchronous crawlers, and how do we use the co-process? Or how to write an asynchronous crawler?

    • Grequests (asynchronous requests module)
    • Crawler Module +gevent (compare recommended this)
    • Aiohttp (This does not seem to be a lot of information, I am not very good at the moment)
    • Asyncio built-in crawler function (This is also more difficult to use)

Federation Pool

Function: Control the number of processes

Get It!

Who said the python process is chicken ribs! Stand out, I won't shoot him! It's such a cool ride!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.