很久沒有更新部落格了,今天上來分享一下昨天實現的一個多線程網頁下載器。
這是一個有著真實需求的實現,我的用途是拿它來通過 HTTP 方式向伺服器提交遊戲資料。把它放上來也是想大家幫忙挑刺,找找 bug,讓它工作得更好。
keywords:python,http,multi-threads,thread,threading,httplib,urllib,urllib2,Queue,http pool,httppool
廢話少說,上源碼:
# -*- coding:utf-8 -*-<br />import urllib, httplib<br />import thread<br />import time<br />from Queue import Queue, Empty, Full<br />HEADERS = {"Content-type": "application/x-www-form-urlencoded",<br />'Accept-Language':'zh-cn',<br />'User-Agent': 'Mozilla/4.0 (compatible; MSIE 6.0;Windows NT 5.0)',<br />"Accept": "text/plain"}<br />UNEXPECTED_ERROR = -1<br />POST = 'POST'<br />GET = 'GET'<br />def base_log(msg):<br />print msg<br />def base_fail_op(task, status, log):<br />log('fail op. task = %s, status = %d'%(str(task), status))<br />def get_remote_data(tasks, results, fail_op = base_fail_op, log = base_log):<br />while True:<br />task = tasks.get()<br />try:<br />tid = task['id']<br />hpt = task['conn_args'] # hpt <= host:port, timeout<br />except KeyError, e:<br />log(str(e))<br />continue<br />log('thread_%s doing task %d'%(thread.get_ident(), tid))<br />#log('hpt = ' + str(hpt))<br />conn = httplib.HTTPConnection(**hpt)</p><p>try:<br />params = task['params']<br />except KeyError, e:<br />params = {}<br />params = urllib.urlencode(params)<br />#log('params = ' + params)</p><p>try:<br />method = task['method']<br />except KeyError:<br />method = 'GET'<br />#log('method = ' + method)</p><p>try:<br />url = task['url']<br />except KeyError:<br />url = '/'<br />#log('url = ' + url)</p><p>headers = HEADERS<br />try:<br />tmp = task['headers']<br />except KeyError, e:<br />tmp = {}<br />headers.update(tmp)<br />#log('headers = ' + str(headers))<br />headers['Content-Length'] = len(params)</p><p>try:<br />if method == POST:<br />conn.request(method, url, params, headers)<br />else:<br />conn.request(method, url + params)<br />response = conn.getresponse()<br />except Exception, e:<br />log('request failed. method = %s, url = %s, params = %s headers = %s'%(<br />method, url, params, headers))<br />log(str(e))<br />fail_op(task, UNEXPECTED_ERROR, log)<br />continue</p><p>if response.status != httplib.OK:<br />fail_op(task, response.status, log)<br />continue</p><p>data = response.read()<br />results.put((tid, data), True)</p><p>class HttpPool(object):<br />def __init__(self, threads_count, fail_op, log):<br />self._tasks = Queue()<br />self._results = Queue()</p><p>for i in xrange(threads_count):<br />thread.start_new_thread(get_remote_data,<br />(self._tasks, self._results, fail_op, log))</p><p>def add_task(self, tid, host, url, params, headers = {}, method = 'GET', timeout = None):<br />task = {<br />'id' : tid,<br />'conn_args' : {'host' : host} if timeout is None else {'host' : host, 'timeout' : timeout},<br />'headers' : headers,<br />'url' : url,<br />'params' : params,<br />'method' : method,<br />}<br />try:<br />self._tasks.put_nowait(task)<br />except Full:<br />return False<br />return True</p><p>def get_results(self):<br />results = []<br />while True:<br />try:<br />res = self._results.get_nowait()<br />except Empty:<br />break<br />results.append(res)<br />return results</p><p>def test_google(task_count, threads_count):<br />hp = HttpPool(threads_count, base_fail_op, base_log)<br />for i in xrange(task_count):<br />if hp.add_task(i,<br />'www.google.cn',<br />'/search?',<br />{'q' : 'lai'},<br />#method = 'POST'<br />):<br />print 'add task successed.'</p><p>while True:<br />results = hp.get_results()<br />if not results:<br />time.sleep(1.0 * random.random())<br />for i in results:<br />print i[0], len(i[1])<br />#print unicode(i[1], 'gb18030')</p><p>if __name__ == '__main__':<br />import sys, random<br />task_count, threads_count = int(sys.argv[1]), int(sys.argv[2])<br />test_google(task_count, threads_count)
有興趣想嘗試啟動並執行朋友,可以把它儲存為 xxxx.py,然後執行 python xxxx.py 10 4,其中 10 表示向 google.cn 請求 10 次查詢,4 表示由 4 條線程來執行這些任務。