Example of multi-thread HTTP download tool implemented by Python, python download Tool
This document describes how to use pythonto compile multi-thread httpdownloads and generate the. exe executable file.
Environment: windows/Linux + Python2.7.x
Single thread
A single thread is introduced before multithreading. The idea of writing a single thread is:
1. parse the url;
2. Connect to the web server;
3. Construct an http request package;
4. download the file.
The following code is used to describe.
Parse url
You can enter a url for resolution. If the parsed path is null, the value is '/'. If the port number is null, the value is "80 "; the name of the downloaded file can be changed according to the user's wishes (enter 'y' to change the file name, and enter other words to do not need to be changed ).
Several parsing functions are listed below:
# Parse host and pathdef analyHostAndPath (totalUrl): protocol, s1 = urllib. splittype (totalUrl) host, path = urllib. splithost (s1) if path = '': path = '/'Return host, path # parse portdef analysisPort (host): host, port = urllib. splitport (host) if port is None: return 80 return port # parse filenamedef analysisFilename (path): filename = path. split ('/') [-1] if '. 'Not in filename: return None return filename
Connect to the web server
Use the socket module to connect to the web server based on the host and port obtained by parsing the url. The Code is as follows:
import socketfrom analysisUrl import port,hostip = socket.gethostbyname(host)s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)s.connect((ip, port))print "success connected webServer!!"
Construct an http request package
Construct an HTTP request package based on the path, host, and port obtained by parsing the url.
from analysisUrl import path, host, portpacket = 'GET ' + path + ' HTTP/1.1\r\nHost: ' + host + '\r\n\r\n'
Download files
Based on the constructed http request packet, the system sends a file to the server and captures the "Content-Length" in the response header ".
def getLength(self): s.send(packet) print "send success!" buf = s.recv(1024) print buf p = re.compile(r'Content-Length: (\d*)') length = int(p.findall(buf)[0]) return length, buf
Download the file and calculate the download time.
def download(self): file = open(self.filename,'wb') length,buf = self.getLength() packetIndex = buf.index('\r\n\r\n') buf = buf[packetIndex+4:] file.write(buf) sum = len(buf) while 1: buf = s.recv(1024) file.write(buf) sum = sum + len(buf) if sum >= length: break print "Success!!"if __name__ == "__main__": start = time.time() down = downloader() down.download() end = time.time() print "The time spent on this program is %f s"%(end - start)
Multithreading
Capture the "Content-Length" field in the response message header, and lock the multipart download based on the number of threads. Different from a single thread, all the code is integrated into a file, and more Python built-in modules are used in the code.
Get "Content-Length ":
def getLength(self): opener = urllib2.build_opener() req = opener.open(self.url) meta = req.info() length = int(meta.getheaders("Content-Length")[0]) return length
Based on the obtained Length and the number of threads:
def get_range(self): ranges = [] length = self.getLength() offset = int(int(length) / self.threadNum) for i in range(self.threadNum): if i == (self.threadNum - 1): ranges.append((i*offset,'')) else: ranges.append((i*offset,(i+1)*offset)) return ranges
Implement multi-threaded download. When writing content to a file, lock the thread and use with lock instead of lock. acquire ()... lock. release (); Use file. seek () sets the file offset address to ensure the accuracy of the written files.
def downloadThread(self,start,end): req = urllib2.Request(self.url) req.headers['Range'] = 'bytes=%s-%s' % (start, end) f = urllib2.urlopen(req) offset = start buffer = 1024 while 1: block = f.read(buffer) if not block: break with lock: self.file.seek(offset) self.file.write(block) offset = offset + len(block) def download(self): filename = self.getFilename() self.file = open(filename, 'wb') thread_list = [] n = 1 for ran in self.get_range(): start, end = ran print 'starting:%d thread '% n n += 1 thread = threading.Thread(target=self.downloadThread,args=(start,end)) thread.start() thread_list.append(thread) for i in thread_list: i.join() print 'Download %s Success!'%(self.file) self.file.close()
Running result:
Convert (*. py) files to (*. exe) executable files
After writing a tool, how can people who have not installed Python use it? In this case, convert the. pyfile to the. exe file.
Python's py2exe module is used here for the first time, so we will introduce it:
Py2exe is a tool that converts a Python script into an executable file (*. exe) that can be executed independently on windows. In this way, you do not need to install Python to run this executable program on windows.
Next, in the same directory of multiThreadDownload. py, create the mysetup. py file and write:
from distutils.core import setupimport py2exesetup(console=["multiThreadDownload.py"])
Run Python mysetup. py py2exe.
Generate a distfile folder. The multitjhreaddownload.exe file is located in it. Click "run:
Demo: HttpFileDownload_jb51.rar
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.