For a large number of data acquisition in addition to multithreading, it is only asynchronous to achieve. This paper is based on the twisted framework to achieve asynchronous acquisition,
Async Batching with twisted:a Walkthrough
Example 1:just a Defferedlist
code is as follows |
copy code |
from Twisted.internet Import reactor from twisted.web.client import getpage from Twisted.internet.defer Import Deferredlist def listcallback (results): Print results def finish (IGN): reactor.stop () def test (): D1 = getpage (' http://www.111cn.net ') D2 = getpage (' http://yahoo.com ') DL = De Ferredlist ([D1, D2]) Dl.addcallback (listcallback) Dl.addcallback (finish) Test () Reactor.run () |
This is one of the simplest examples you ll ever to the for a deferred list in action. Get two deferreds (the GetPage function returns a deferred) and use them to created a deferred list. ADD callbacks to the list, garnish with a lemon.
Example 2:simple Result Manipulation
code is as follows |
copy code |
from twisted.internet import reactor from twisted.web.client import getpage from Twisted.internet.defer import deferredlist def listcallback (results): to issuccess, content in R Esults: print successful? %s '% issuccess print ' content Length:%s '% len (content) Def finish (IGN): REACTOR.S Top () Def Test (): D1 = getpage (' http://www.111cn.net ') D2 = getpage (' http://yahoo.com ') &nb Sp DL = Deferredlist ([D1, D2]) Dl.addcallback (listcallback) Dl.addcallback (finish) Test () Reactor.run () |
We make things a little more interesting in this example by doing some on the processing. For this to make sense, just remember that a callback gets passed the "result" when the deferred action completes. If we have the API documentation for Deferredlist, we are returns a list of (success, result) tuples, where suc Cess is a Boolean and result are the result of the "a" deferred that were put in the list (remember, we ' ve got two layers of defer Reds here!).
Example 3:page Callbacks Too
code is as follows |
copy code |
from twisted.internet import reactor from twisted.web.client import getpage from Twisted.internet.defer import deferredlist def pagecallback [result]: return len (result) Def Listcallback (Result): Print result def finish (IGN): Reactor.stop () def Test (): D 1 = getpage (' http://www.111cn.net ') D1.addcallback (pagecallback) D2 = getpage (' http://yahoo.com ' D2.addcallback (pagecallback) DL = Deferredlist ([D1, D2]) Dl.addcallback (listcallback Dl.addcallback (finish) Test () Reactor.run () |
Here, we mix things up a little bit. Instead of doing processing on the results at once (in the deferred list callback), we ' re processing them when the Pag E callbacks fire. Our processing are just a simple example of getting the length of getpage deferred result:the HTML content of the page at the given URL.
Example 4:results with more Structure
The code is as follows |
Copy Code |
From twisted.internet Import reactor From twisted.web.client import GetPage From Twisted.internet.defer import deferredlist def pagecallback (Result): data = { ' Length ': Len (Result), ' content ': result[:10], } Return data def listcallback (Result): For issuccess, data in result: If issuccess: Print "Call to server succeeded with data%s"% str (data) def finish (IGN): Reactor.stop () def test (): D1 = getpage (' http://www.111cn.net ') D1.addcallback (Pagecallback) D2 = GetPage (' http://yahoo.com ') D2.addcallback (Pagecallback) DL = Deferredlist ([D1, D2]) Dl.addcallback (Listcallback) Dl.addcallback (Finish) Test () Reactor.run () |
A follow-up to the last example, we'll put the "data in which" we are interested into a dictionary. We don ' t end up pulling any of the ' the ' dictionary; We just stringify it and print it to stdout.
Example 5:passing Values to callbacks
The code is as follows |
Copy Code |
From twisted.internet Import reactor From twisted.web.client import GetPage From Twisted.internet.defer import deferredlist def pagecallback (result, URL): data = { ' Length ': Len (Result), ' content ': result[:10], ' url ': URL, } Return data def getpagedata (URL): d = getpage (URL) D.addcallback (pagecallback, URL) Return D def listcallback (Result): For issuccess, data in result: If issuccess: Print "Call to%s succeeded with data%s"% (data[' url '), str (data) def finish (IGN): Reactor.stop () def test (): D1 = getpagedata (' http://www.111cn.net ') D2 = Getpagedata (' http://yahoo.com ') DL = Deferredlist ([D1, D2]) Dl.addcallback (Listcallback) Dl.addcallback (Finish) Test () Reactor.run () |
After the all this playing, we start asking ourselves more serious questions, like: "I want to decide which values My callbacks "or" Some information this is available here, isn ' t available there. How does I get it there? " This are how:-) Just Pass the parameters your want to your callback. They ' ll be tacked in after the result (as you can, the function signatures).
In this example, we needed to create our own deferred-returning function, one of that wraps the GetPage function so that we C An also pass the URL on to the callback.
Example 6:adding Some Error Checking
The code is as follows |
Copy Code |
From twisted.internet Import reactor From twisted.web.client import GetPage From Twisted.internet.defer import deferredlist URLs = [ ' Http://yahoo.com ', ' Http://www.111cn.net ', ' Http://www.111cn.net/MicrosoftRules.html ', ' Http://bogusdomain.com ', ] def pagecallback (result, URL): data = { ' Length ': Len (Result), ' content ': result[:10], ' url ': URL, } Return data def pageerrback (Error, URL): return { ' msg ': Error.geterrormessage (), ' Err ': Error, ' url ': URL, } def getpagedata (URL): d = getpage (URL, timeout=5) D.addcallback (pagecallback, URL) D.adderrback (pageerrback, URL) Return D def listcallback (Result): For ignore, data in result: If Data.has_key (' Err '): Print "Call to%s failed with data%s"% (data[' url '), str (data) Else Print "Call to%s succeeded with data%s"% (data[' url '), str (data) def finish (IGN): Reactor.stop () def test (): Deferreds = [] For URL in URLs: d = getpagedata (URL) Deferreds.append (d) DL = Deferredlist (deferreds, Consumeerrors=1) Dl.addcallback (Listcallback) Dl.addcallback (Finish) Test () Reactor.run () |
As we get closer to building real applications and we start getting concerned about things like catching/anticipating. We haven ' t added any errbacks to the deferred list, but we have added one to our page callback. We ' ve added more URLs and put them in a list to ease the pains of duplicate code. As you can, two of the URL should return errors:one a 404, and the other should being a domain not resolving (we ll This as a timeout).
Example 7:batching with Deferredsemaphore
code is as follows |
copy code |
from Twisted.internet Import reactor from twisted.web.client import getpage from twisted.internet import defer Max Run = 1 urls = [ ' http://twistedmatrix.com ', ' http://twistedsoftwarefoundation.org ', ' http://yahoo.com ', ' http://www.111cn.net ', ] def listcallback (results): For issuccess, result in results: print len (result) Def finish (IGN): Reactor.stop () def Test (): deferreds = [] SEM = defer. Deferredsemaphore (Maxrun) for URL in URLs: d = sem.run (getpage, url)   &N Bsp Deferreds.append (d) DL = defer. Deferredlist (deferreds) Dl.addcallback (listcallback) Dl.addcallback (finish) Test () Reactor.run () |
These last two examples are for the more advanced use cases. As soon as the reactor starts, deferreds that are ready, start "firing"-their "Jobs" start running. What if we ' ve got deferreds in a list? So, they all start processing. As you can imagine, this is a easy way to run a accidental DoS against a friendly service. Not cool.
For situations like this, what we want are a way to run only so many at a time. This is a great use for the deferred semaphore. When I repeated runs of the example above, the content lengths of the four pages returned after about 2.5 seconds. With the "example rewritten to" Use just the deferred list (no deferred semaphore), the content lengths were About 1.2 seconds. The extra time was due to the fact, which I (for the sake of the example) forced only one deferred to run in a time, OBVIOUSL Y not what your ' re going to want to does for a highly concurrent task;-
Note This without changing the code and only setting Maxrun to 4, the timings for getting the content lengths is about The same, averaging for me 1.3 seconds (there's a little more overhead when using the involved deferred).
One last subtle note (in anticipation to the next example): The For loop, creates all, deferreds at once; The deferred semaphore simply limits how to many get run in a time.
Example 8:throttling with Cooperator
The code is as follows |
Copy Code |
From twisted.internet Import reactor From twisted.web.client import GetPage From twisted.internet import defer, task Maxrun = 2 URLs = [ ' Http://twistedmatrix.com ', ' Http://twistedsoftwarefoundation.org ', ' Http://yahoo.com ', ' Http://www.111cn.net ', ] def pagecallback (Result): Print Len (Result) return result Def doWork (): For URL in URLs: d = getpage (URL) D.addcallback (Pagecallback) Yield D def finish (IGN): Reactor.stop () def test (): Deferreds = [] Coop = task. Cooperator () Work = DoWork () For I in Xrange (Maxrun): D = coop.coiterate (work) Deferreds.append (d) DL = defer. Deferredlist (deferreds) Dl.addcallback (Finish) Test () Reactor.run () |
Although not yet to study the level of the twisted framework, but here first recorded, for later to be savored.