Use the Python Twisted framework to compile the code example of a non-blocking program, pythontwisted
Let's take a look at a piece of code:
# ~*~ Twisted - A Python tale ~*~from time import sleep# Hello, I'm a developer and I mainly setup Wordpress.def install_wordpress(customer): # Our hosting company Threads Ltd. is bad. I start installation and... print "Start installation for", customer # ...then wait till the installation finishes successfully. It is # boring and I'm spending most of my time waiting while consuming # resources (memory and some CPU cycles). It's because the process # is *blocking*. sleep(3) print "All done for", customer# I do this all day long for our customersdef developer_day(customers): for customer in customers: install_wordpress(customer)developer_day(["Bill", "Elon", "Steve", "Mark"])
Run the command and the result is as follows:
$ ./deferreds.py 1
------ Running example 1 ------Start installation for BillAll done for BillStart installation...* Elapsed time: 12.03 seconds
This is a piece of code executed in sequence. It takes three seconds for four consumers to install a package for one user, and 12 seconds for the four consumers. This is not satisfactory, so let's take a look at the second example using a thread:
import threading# The company grew. We now have many customers and I can't handle the# workload. We are now 5 developers doing exactly the same thing.def developers_day(customers): # But we now have to synchronize... a.k.a. bureaucracy lock = threading.Lock() # def dev_day(id): print "Goodmorning from developer", id # Yuck - I hate locks... lock.acquire() while customers: customer = customers.pop(0) lock.release() # My Python is less readable install_wordpress(customer) lock.acquire() lock.release() print "Bye from developer", id # We go to work in the morning devs = [threading.Thread(target=dev_day, args=(i,)) for i in range(5)] [dev.start() for dev in devs] # We leave for the evening [dev.join() for dev in devs]# We now get more done in the same time but our dev process got more# complex. As we grew we spend more time managing queues than doing dev# work. We even had occasional deadlocks when processes got extremely# complex. The fact is that we are still mostly pressing buttons and# waiting but now we also spend some time in meetings.developers_day(["Customer %d" % i for i in xrange(15)])
Run:
$ ./deferreds.py 2
------ Running example 2 ------Goodmorning from developer 0Goodmorning from developer1Start installation forGoodmorning from developer 2Goodmorning from developer 3Customer 0...from developerCustomer 13 3Bye from developer 2* Elapsed time: 9.02 seconds
This is a piece of code executed in parallel. Five working threads are used. Each of the 15 Consumers spends 3 s, which means a total of 45 s. However, it takes only 9 s to use 5 threads for parallel execution. This code is a bit complicated. A large part of the code is used to manage concurrency, rather than focusing on algorithms or business logic. In addition, the output results of the program also look very mixed, and the readability is also Tianjin. Even simple multi-threaded code is hard to write, so we use Twisted:
# For years we thought this was all there was... We kept hiring more# developers, more managers and buying servers. We were trying harder# optimising processes and fire-fighting while getting mediocre# performance in return. Till luckily one day our hosting# company decided to increase their fees and we decided to# switch to Twisted Ltd.!from twisted.internet import reactorfrom twisted.internet import deferfrom twisted.internet import task# Twisted has a slightly different approachdef schedule_install(customer): # They are calling us back when a Wordpress installation completes. # They connected the caller recognition system with our CRM and # we know exactly what a call is about and what has to be done next. # # We now design processes of what has to happen on certain events. def schedule_install_wordpress(): def on_done(): print "Callback: Finished installation for", customer print "Scheduling: Installation for", customer return task.deferLater(reactor, 3, on_done) # def all_done(_): print "All done for", customer # # For each customer, we schedule these processes on the CRM # and that # is all our chief-Twisted developer has to do d = schedule_install_wordpress() d.addCallback(all_done) # return d# Yes, we don't need many developers anymore or any synchronization.# ~~ Super-powered Twisted developer ~~def twisted_developer_day(customers): print "Goodmorning from Twisted developer" # # Here's what has to be done today work = [schedule_install(customer) for customer in customers] # Turn off the lights when done join = defer.DeferredList(work) join.addCallback(lambda _: reactor.stop()) # print "Bye from Twisted developer!"# Even his day is particularly short!twisted_developer_day(["Customer %d" % i for i in xrange(15)])# Reactor, our secretary uses the CRM and follows-up on events!reactor.run()
Running result:
------ Running example 3 ------Goodmorning from Twisted developerScheduling: Installation for Customer 0....Scheduling: Installation for Customer 14Bye from Twisted developer!Callback: Finished installation for Customer 0All done for Customer 0Callback: Finished installation for Customer 1All done for Customer 1...All done for Customer 14* Elapsed time: 3.18 seconds
This time we got the perfect Execution Code and readable output results, and didn't use threads. We processed 15 Consumers in parallel, that is to say, the execution time of 45 s was completed within 3 s. The trick is that we replace all the blocked calls to sleep () with the peer task. deferLater () and callback functions in Twisted. Since the current processing operations are performed elsewhere, we can easily serve 15 consumers at the same time.
The preceding operations occur somewhere else. Now let's explain that arithmetic operations still occur in the CPU, but the current CPU processing speed is very fast compared to disk and network operations. Therefore, it takes most of the time to provide data to the CPU or send data from the CPU to the memory or another CPU. We use non-blocking operations to save time in this regard. For example, task. deferLater () uses a callback function and is activated when data has been transferred.
Another important thing is that Goodmorning from Twisted developer and Bye from Twisted developer in the output! Information. The two messages are printed at the beginning of code execution. If the code is executed so early, When will our application actually start running? The answer is that a Twisted application (including Scrapy) runs in reactor. run. Before calling this method, each Deferred chain that may be used in the application must be ready, and the reactor. run () method will monitor and activate the callback function.
Note that the primary reactor rule is that you can perform any operation as long as it is fast enough and is not blocked.
Now, the Code does not have much to do with managing multithreading, but these callback functions seem messy. You can modify it as follows:
# Twisted gave us utilities that make our code way more readable!@defer.inlineCallbacksdef inline_install(customer): print "Scheduling: Installation for", customer yield task.deferLater(reactor, 3, lambda: None) print "Callback: Finished installation for", customer print "All done for", customerdef twisted_developer_day(customers): ... same as previously but using inline_install() instead of schedule_install()twisted_developer_day(["Customer %d" % i for i in xrange(15)])reactor.run()
The running result is the same as that in the previous example. This code serves the same purpose as the previous example, but it looks more concise and clear. The inlineCallbacks generator can use some Python mechanisms to pause or resume the inline_install () function. The inline_install () function becomes a Deferred object and runs concurrently for each consumer. Each time yield is run, it is aborted on the current inline_install () instance until the yield Deferred object is complete and then resumes running.
The only problem now is that if we have more than 15 Consumers, what should we do when we have 10000 consumers? This code starts 10000 concurrent sequences (such as HTTP requests and database write operations ). This may be fine, but may cause various failures. In applications with massive concurrent requests, such as Scrapy, we often need to limit the number of concurrent requests to an acceptable level. In the following example, we use task. Cooperator () to complete this function. Scrapy also uses the same mechanism in its Item Pipeline to limit the number of CONCURRENT_ITEMS ):
@defer.inlineCallbacksdef inline_install(customer): ... same as above# The new "problem" is that we have to manage all this concurrency to# avoid causing problems to others, but this is a nice problem to have.def twisted_developer_day(customers): print "Goodmorning from Twisted developer" work = (inline_install(customer) for customer in customers) # # We use the Cooperator mechanism to make the secretary not # service more than 5 customers simultaneously. coop = task.Cooperator() join = defer.DeferredList([coop.coiterate(work) for i in xrange(5)]) # join.addCallback(lambda _: reactor.stop()) print "Bye from Twisted developer!"twisted_developer_day(["Customer %d" % i for i in xrange(15)])reactor.run()# We are now more lean than ever, our customers happy, our hosting# bills ridiculously low and our performance stellar.# ~*~ THE END ~*~
Running result:
$ ./deferreds.py 5------ Running example 5 ------Goodmorning from Twisted developerBye from Twisted developer!Scheduling: Installation for Customer 0...Callback: Finished installation for Customer 4All done for Customer 4Scheduling: Installation for Customer 5...Callback: Finished installation for Customer 14All done for Customer 14* Elapsed time: 9.19 seconds
From the above output, we can see that there are five slots for processing consumers when the program is running. Unless a slot is empty, the next consumer's request will not be processed. In this example, the processing time is three seconds, so it looks like five times. The final result shows the same performance as the use of the thread, but this time there is only one thread, and the code is more concise and easier to write the correct code.
PS: deferToThread enables non-blocking of synchronous Functions
Defer. Deferred (from twisted. internet import defer) of inclued can return a deferred object.
Note: deferToThread is implemented using threads and is not recommended for excessive use.
* ** Convert the synchronous function to asynchronous (return a Deferred )***
The deferToThread (from twisted. internet. threads import deferToThread) of twisted also returns a deferred object. However, the callback function is processed in another thread and is mainly used for database/File Read operations.
.. # Code snippet def dataReceived (self, data): now = int (time. time () for ftype, data in self. fpcodec. feed (data): if ftype = 'oob': self. msg ('oob: ', repr (data) elif ftype = 0x81: # Heartbeat response to server requests (this is an anti-fatigue driver that is sent to the gps host machine, then the host computer sends it to the server) self. msg ('fp. PONG: ', repr (data) else: self. msg ('todo: ', (ftype, data) d = deferToThread (self. redis. zadd, "beier: fpstat: fps", now, self. devid) d. addCallback (self. _ doResult, extra)
The complete example below is for your reference.
#-*-Coding: UTF-8-*-from twisted. internet import defer, reactorfrom twisted. internet. threads import deferToThreadimport functoolsimport time # This is a synchronous blocking function def mySleep (timeout): time. sleep (timeout) # The returned value is equivalent to adding the return 3 def say (result): print in callback. "The time-consuming operation is over and the returned result is returned to me ", result # Use functools. wrap the partial and pass the parameters in cb = functools. partial (mySleep, 3) d = deferToThread (cb) d. addCallback (say) print "You have not finished execution, haha" reactor. run ()