Example of multi-thread HTTP download tool implemented by Python, python download Tool

Source: Internet
Author: User

Example of multi-thread HTTP download tool implemented by Python, python download Tool

This document describes how to use pythonto compile multi-thread httpdownloads and generate the. exe executable file.

Environment: windows/Linux + Python2.7.x

Single thread

A single thread is introduced before multithreading. The idea of writing a single thread is:

1. parse the url;

2. Connect to the web server;

3. Construct an http request package;

4. download the file.

The following code is used to describe.

Parse url

You can enter a url for resolution. If the parsed path is null, the value is '/'. If the port number is null, the value is "80 "; the name of the downloaded file can be changed according to the user's wishes (enter 'y' to change the file name, and enter other words to do not need to be changed ).

Several parsing functions are listed below:

# Parse host and pathdef analyHostAndPath (totalUrl): protocol, s1 = urllib. splittype (totalUrl) host, path = urllib. splithost (s1) if path = '': path = '/'Return host, path # parse portdef analysisPort (host): host, port = urllib. splitport (host) if port is None: return 80 return port # parse filenamedef analysisFilename (path): filename = path. split ('/') [-1] if '. 'Not in filename: return None return filename

Connect to the web server

Use the socket module to connect to the web server based on the host and port obtained by parsing the url. The Code is as follows:

import socketfrom analysisUrl import port,hostip = socket.gethostbyname(host)s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)s.connect((ip, port))print "success connected webServer!!"

Construct an http request package

Construct an HTTP request package based on the path, host, and port obtained by parsing the url.

from analysisUrl import path, host, portpacket = 'GET ' + path + ' HTTP/1.1\r\nHost: ' + host + '\r\n\r\n'  

Download files

Based on the constructed http request packet, the system sends a file to the server and captures the "Content-Length" in the response header ".

def getLength(self):    s.send(packet)    print "send success!"    buf = s.recv(1024)    print buf    p = re.compile(r'Content-Length: (\d*)')    length = int(p.findall(buf)[0])    return length, buf

Download the file and calculate the download time.

def download(self):    file = open(self.filename,'wb')    length,buf = self.getLength()    packetIndex = buf.index('\r\n\r\n')    buf = buf[packetIndex+4:]    file.write(buf)    sum = len(buf)    while 1:      buf = s.recv(1024)      file.write(buf)      sum = sum + len(buf)      if sum >= length:        break    print "Success!!"if __name__ == "__main__":  start = time.time()  down = downloader()  down.download()  end = time.time()  print "The time spent on this program is %f s"%(end - start) 

Multithreading

Capture the "Content-Length" field in the response message header, and lock the multipart download based on the number of threads. Different from a single thread, all the code is integrated into a file, and more Python built-in modules are used in the code.

Get "Content-Length ":

def getLength(self):    opener = urllib2.build_opener()    req = opener.open(self.url)    meta = req.info()    length = int(meta.getheaders("Content-Length")[0])    return length

Based on the obtained Length and the number of threads:

def get_range(self):    ranges = []    length = self.getLength()    offset = int(int(length) / self.threadNum)    for i in range(self.threadNum):      if i == (self.threadNum - 1):        ranges.append((i*offset,''))      else:        ranges.append((i*offset,(i+1)*offset))    return ranges

Implement multi-threaded download. When writing content to a file, lock the thread and use with lock instead of lock. acquire ()... lock. release (); Use file. seek () sets the file offset address to ensure the accuracy of the written files.

def downloadThread(self,start,end):    req = urllib2.Request(self.url)    req.headers['Range'] = 'bytes=%s-%s' % (start, end)    f = urllib2.urlopen(req)    offset = start    buffer = 1024    while 1:      block = f.read(buffer)      if not block:        break      with lock:        self.file.seek(offset)        self.file.write(block)        offset = offset + len(block)  def download(self):    filename = self.getFilename()    self.file = open(filename, 'wb')    thread_list = []    n = 1    for ran in self.get_range():      start, end = ran      print 'starting:%d thread '% n      n += 1      thread = threading.Thread(target=self.downloadThread,args=(start,end))      thread.start()      thread_list.append(thread)    for i in thread_list:      i.join()    print 'Download %s Success!'%(self.file)    self.file.close()

Running result:

Convert (*. py) files to (*. exe) executable files

After writing a tool, how can people who have not installed Python use it? In this case, convert the. pyfile to the. exe file.

Python's py2exe module is used here for the first time, so we will introduce it:

Py2exe is a tool that converts a Python script into an executable file (*. exe) that can be executed independently on windows. In this way, you do not need to install Python to run this executable program on windows.

Next, in the same directory of multiThreadDownload. py, create the mysetup. py file and write:

from distutils.core import setupimport py2exesetup(console=["multiThreadDownload.py"]) 

Run Python mysetup. py py2exe.

Generate a distfile folder. The multitjhreaddownload.exe file is located in it. Click "run:

Demo: HttpFileDownload_jb51.rar

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.