Example of multi-thread HTTP download tool implemented by Python, python download Tool

Last Update:2017-02-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This document describes how to use pythonto compile multi-thread httpdownloads and generate the. exe executable file.

Environment: windows/Linux + Python2.7.x

Single thread

A single thread is introduced before multithreading. The idea of writing a single thread is:

1. parse the url;

2. Connect to the web server;

3. Construct an http request package;

4. download the file.

The following code is used to describe.

Parse url

You can enter a url for resolution. If the parsed path is null, the value is '/'. If the port number is null, the value is "80 "; the name of the downloaded file can be changed according to the user's wishes (enter 'y' to change the file name, and enter other words to do not need to be changed ).

Several parsing functions are listed below:

# Parse host and pathdef analyHostAndPath (totalUrl): protocol, s1 = urllib. splittype (totalUrl) host, path = urllib. splithost (s1) if path = '': path = '/'Return host, path # parse portdef analysisPort (host): host, port = urllib. splitport (host) if port is None: return 80 return port # parse filenamedef analysisFilename (path): filename = path. split ('/') [-1] if '. 'Not in filename: return None return filename

Connect to the web server

Use the socket module to connect to the web server based on the host and port obtained by parsing the url. The Code is as follows:

import socketfrom analysisUrl import port,hostip = socket.gethostbyname(host)s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)s.connect((ip, port))print "success connected webServer！！"

Construct an http request package

Construct an HTTP request package based on the path, host, and port obtained by parsing the url.

from analysisUrl import path, host, portpacket = 'GET ' + path + ' HTTP/1.1\r\nHost: ' + host + '\r\n\r\n'

Download files

Based on the constructed http request packet, the system sends a file to the server and captures the "Content-Length" in the response header ".

def getLength(self):    s.send(packet)    print "send success!"    buf = s.recv(1024)    print buf    p = re.compile(r'Content-Length: (\d*)')    length = int(p.findall(buf)[0])    return length, buf

Download the file and calculate the download time.

def download(self):    file = open(self.filename,'wb')    length,buf = self.getLength()    packetIndex = buf.index('\r\n\r\n')    buf = buf[packetIndex+4:]    file.write(buf)    sum = len(buf)    while 1:      buf = s.recv(1024)      file.write(buf)      sum = sum + len(buf)      if sum >= length:        break    print "Success!!"if __name__ == "__main__":  start = time.time()  down = downloader()  down.download()  end = time.time()  print "The time spent on this program is %f s"%(end - start)

Multithreading

Capture the "Content-Length" field in the response message header, and lock the multipart download based on the number of threads. Different from a single thread, all the code is integrated into a file, and more Python built-in modules are used in the code.

Get "Content-Length ":

def getLength(self):    opener = urllib2.build_opener()    req = opener.open(self.url)    meta = req.info()    length = int(meta.getheaders("Content-Length")[0])    return length

Based on the obtained Length and the number of threads:

def get_range(self):    ranges = []    length = self.getLength()    offset = int(int(length) / self.threadNum)    for i in range(self.threadNum):      if i == (self.threadNum - 1):        ranges.append((i*offset,''))      else:        ranges.append((i*offset,(i+1)*offset))    return ranges

Implement multi-threaded download. When writing content to a file, lock the thread and use with lock instead of lock. acquire ()... lock. release (); Use file. seek () sets the file offset address to ensure the accuracy of the written files.

def downloadThread(self,start,end):    req = urllib2.Request(self.url)    req.headers['Range'] = 'bytes=%s-%s' % (start, end)    f = urllib2.urlopen(req)    offset = start    buffer = 1024    while 1:      block = f.read(buffer)      if not block:        break      with lock:        self.file.seek(offset)        self.file.write(block)        offset = offset + len(block)  def download(self):    filename = self.getFilename()    self.file = open(filename, 'wb')    thread_list = []    n = 1    for ran in self.get_range():      start, end = ran      print 'starting:%d thread '% n      n += 1      thread = threading.Thread(target=self.downloadThread,args=(start,end))      thread.start()      thread_list.append(thread)    for i in thread_list:      i.join()    print 'Download %s Success!'%(self.file)    self.file.close()

Running result:

Convert (*. py) files to (*. exe) executable files

After writing a tool, how can people who have not installed Python use it? In this case, convert the. pyfile to the. exe file.

Python's py2exe module is used here for the first time, so we will introduce it:

Py2exe is a tool that converts a Python script into an executable file (*. exe) that can be executed independently on windows. In this way, you do not need to install Python to run this executable program on windows.

Next, in the same directory of multiThreadDownload. py, create the mysetup. py file and write:

from distutils.core import setupimport py2exesetup(console=["multiThreadDownload.py"])

Run Python mysetup. py py2exe.

Generate a distfile folder. The multitjhreaddownload.exe file is located in it. Click "run:

Demo: HttpFileDownload_jb51.rar

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Example of multi-thread HTTP download tool implemented by Python, python download Tool

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Example of multi-thread HTTP download tool implemented by Python, python download Tool

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support