No set timeout parameters, the result in the network environment is bad, often read () method without any response to the problem, the program card died in read () method, engaged in the majority of days, only to find problems, to Urlopen plus timeout on OK, set the timeout After the timeout after the read timeout will throw Socket.timeout exception, want to stabilize the program, but also need to Urlopen plus exception handling, coupled with abnormal retry, the program is perfect.
Import urllib2
url= ' http://www.facebook.com/'
fails = 0 while
True:
try:
if fails >=:
Break
req = Urllib2. Request (URL)
response = Urllib2.urlopen (req, None, 3)
page = Response.read ()
except:
fails = 1
print ' Network connection problem, trying to request again: ', fails
else: Break
Solution:
Sometimes when we crawl the network data, because the other side slow speed, the server timeout and so on, cause urllib2.urlopen () after the read () operation (download content) card death, to solve this problem methods are as follows: 1, set optional parameters for Urlopen Timeout
Import urllib2
# http://classweb.loxa.com.tw/dino123/air/P1000772.jpg
r = urllib2. Request ("http://classweb.loxa.com.tw/dino123/air/P1000775.jpg")
try:
print 111111111111111111
f = Urllib2.urlopen (R, Data=none, timeout=3)
print 2222222222222222 result
= f.read ()
Print 333333333333333333
except exception,e:
print "444444444444444444---------" + str (e)
print " 55555555555555 "
2, set the global socket timeout:
Import Socket
socket.setdefaulttimeout (10.0)
or use: Httplib2 or timeout_urllib2
http:// Code.google.com/p/httplib2/wiki/examples
http://code.google.com/p/timeout-urllib2/source/browse/trunk/ timeout_urllib2.py
3. Timer Timer used
from urllib2 import Urlopen from threading import Timer url = "http://www.python.org" def Handler (FH): Fh.close () fh = Urlopen (URL) t = Timer (20.0, HANDLER,[FH)) T.start () data = Fh.read () #如果二进制文件需要换成二进制的读取方 Type T.cancel ()