Python socket. Error: [errno 10054] the remote host forces an existing connection to be closed. Solution:
I used python to read web pages a few days ago. Because a website uses a large number of urlopen operations, it will be identified as an attack by that website. Sometimes download is no longer allowed. As a result, request. Read () remains stuck there after urlopen. Errno 10054 will be thrown.
This error is caused by Connection reset by peer. That is, the remote host has reset the connection. The reason may be that the socket timeout is too long, or request = urllib. request. after urlopen (URL), no request is made. close () operation, or it may take a few seconds for the website to identify this behavior as an attack.
The specific solution is as follows:
Import socket <br/> Import time <br/> timeout = 20 <br/> socket. setdefatimetimeout (timeout) # Set the timeout time for the entire socket layer. If socket is used in subsequent files, you do not need to set <br/> sleep_download_time = 10 <br/> time. sleep (sleep_download_time) # set the time here <br/> request = urllib. request. urlopen (URL) # Here is the URL of the content to be read <br/> content = request. read () # Read. An exception is usually reported here. <br/> request. close () # Remember to close
Because the read () operation after urlopen actually calls some functions at the socket layer. Therefore, you can disable the network by setting the default socket timeout. You do not have to wait at read.
Of course, you can write a few more try and try t on the outer layer, for example:
Try: <br/> time. sleep (self. sleep_download_time) <br/> request = urllib. request. urlopen (URL) <br/> content = request. read () <br/> request. close () </P> <p> counter t unicodedecodeerror as E: </P> <p> Print ('----- unicodedecodeerror URL:', URL) </P> <p> upload t urllib. error. urlerror as E: <br/> Print ("----- urlerror URL:", URL) </P> <p> skip t socket. timeout as E: <br/> Print ("----- socket timout:", URL)
Generally, there is no problem. I tested the download of thousands of webpages and then said this. However, if you download tens of thousands of files, I did a test and MS will still jump out of this exception. It may be that the time. Sleep () is too short, or the network is suddenly interrupted. I tested urllib. Request. Retrieve () and found that data downloading continuously always fails.
A simple solution is as follows: first, refer to my article: Python checkpoint simple implementation
. Make a Check Point first. Then, run the exception Section Code while true. See the following pseudocode:
Def download_auto (downloadlist, fun, sleep_time = 15): <br/> while true: <br/> try: # outsource a layer of try <br/> value = fun (downloadlist, sleep_time) # Fun here is your download function. When the function pointer is passed in. <Br/> # exit only after normal execution. <Br/> If value = util. success: <br/> Break <br/> failed T: # If 10054 or ioerror or xxxerror occurs <br/> sleep_time + = 5 # Sleep for 5 seconds, execute the download. because of the checkpoint, the above program will continue to execute from where an exception is thrown. This prevents program interruptions caused by unstable network connections. <Br/> Print ('enlarge sleep time: ', sleep_time)
However, if you cannot find the corresponding webpage, you need to do another thing:
# Print download information <br/> def reporthook (blocks_read, block_size, total_size): <br/> if not blocks_read: <br/> Print ('Connection opened ') <br/> If total_size <0: <br/> Print ('read % d blocks '% blocks_read) <br/> else: <br/> # if not found, the page does not exist. The totalsize may be 0 and the percentage cannot be calculated. <br/> Print ('downloading: % d MB, totalsize: % d mb' % (blocks_read * block_size/1048576.0, total_size/1048576.0) </P> <p> def download (path, URL): <br/> # url = 'Http: // downloads.sourceforge.net/sourceforge/alliancep2p/alliance-v1.0.6.jar' <br/> # filename = URL. rsplit ("/") [-1] <br/> try: <br/> # download function provided by python <br/> urllib. request. urlretrieve (URL, path, reporthook) <br/> failed t ioerror as E: # if it cannot be found, it may cause an ioerror. <Br/> Print ("Download", URL, "/nerror:", e) <br/> Print ("done: % S/ncopy: % s "% (URL, PATH ))
If you still have problems... please comment on other solutions.