Original post address: http://hi.baidu.com/yss1983/item/933fbe45a09c43e01381da06
Problem Description:Not set the timeout parameters, the result is bad in the network environment, often read () method has no response to the problem, the program card died in read () method, engaged in the majority of days, only to find the problem, to Urlopen plus timeout on OK, Set the timeout after timeout read timeout will throw Socket.timeout exception, want to stabilize the program, but also need to Urlopen plus exception handling, coupled with abnormal retry, the program is perfect. Import urllib2 url= ' http://www.facebook.com/' fails = 0 while True:try:if fails >= 20:break req = urllib2. Request (URL) response = Urllib2.urlopen (req, None, 3) page = Response.read () except:fails + + 1 print ' Network connection problem, trying again: ', fails Else:break
Solution:
Sometimes when we crawl the network data, because the other side slow speed, server timeout and other reasons, resulting in urllib2.urlopen () after the read () operation (download content) card death, to solve this problem methods are as follows: 1, Set optional parameters for Urlopen timeout
Import urllib2
# http://classweb.loxa.com.tw/dino123/air/P1000772.jpg
r = urllib2. Request ("http://classweb.loxa.com.tw/dino123/air/P1000775.jpg")
try:
print 111111111111111111
f = Urllib2.urlopen (R, Data=none, timeout=3)
print 2222222222222222 result
= f.read ()
Print 333333333333333333
except exception,e:
print "444444444444444444---------" + str (e)
print " 55555555555555 "
2, set the global socket timeout:
Import Socket
socket.setdefaulttimeout (10.0)
or use: Httplib2 or timeout_urllib2
http:// Code.google.com/p/httplib2/wiki/examples
http://code.google.com/p/timeout-urllib2/source/browse/trunk/ timeout_urllib2.py
3. Timer Timer used
from URLLIB2 import urlopen from threading import Timer url = "http://www.python.org" def Handler (FH): Fh.clo SE () fh = Urlopen (URL) t = Timer (20.0, HANDLER,[FH]) t.start () data = Fh.read () #如果二进制文件需要换成二进制的读取方式 t.cancel ()