First, the simple principle
When it comes to breakpoint continuation, you have to say some HTTP header fields related to the breakpoint continuation.
①content-length
Content-length is used to indicate the size of an entity in an HTTP response package. Unless a block code is used, the Content-length header is a message that must be used with the entity body. The content-length header is used to detect message truncation caused by server crashes and to segment multiple packets that share persistent connections.
Detection End
earlier versions of HTTP used a closed connection method to delimit the end of a message. However, no content-length, the client can not distinguish between the end of the message is the normal shutdown connection or message transmission due to server crashes caused by the connection shutdown. The client needs to detect the message truncated by Content-length.
The problem of packet truncation is particularly important for caching proxy servers. If the cache server receives the truncated message but does not recognize the truncation, it may store the incomplete content and use it multiple times to provide the service. Caching proxy servers typically do not cache HTTP principals that do not have an explicit content-length header, thus reducing the risk of caching a truncated packet.
Content-length and persistent connections
The Content-length header is essential for persistent links. If the response is routed through a persistent connection, another HTTP response may be followed. The client can know where the message ends and where the next message starts by Content-length the header. Because the connection is persistent, the client cannot rely on the connection shutdown to determine the end of the message.
In one case, a persistent connection can be used without the content-length header, that is, when the Block encoding (chunked encoding) is used. In the case of block coding, the data is divided into a series of blocks to send, no block has a size description. Even if the server does not know the size of the entire entity when it generates the header (usually because the entity is dynamically generated), a block encoding can still be used to transfer several blocks of known size.
②transfer-encoding
The HTTP protocol only defines a transger-encoding, or chunked. For example, if the principal of the service side is dynamically generated. And the client does not want the server to generate all of the main body, because the middle of the Shiyan is particularly large. The chunked format is as follows:
http/1.1 OK
Transfer-encoding:chunked
2 AB
A 0123456789
0
1
2
3
4
5
6
7
8
9
http/1.1 OK
Transfer-encoding:chunked
2 AB
A 0123456789
0
③content-enconding
The following three kinds are common: gzip,deflate,compress. It is used to indicate what algorithm the entity is encoded in. Usually, content-encoding is used in conjunction with transfer-encoding.
④content-range
For the response header, specifies the insertion position of a portion of the entire entity, and he also indicates the length of the entire entity. When the server returns a partial response to the customer, it must describe the scope of the response coverage and the entire entity length. General format:
Content-range:bytes Start-end/total
⑤range
For the request header, specify the position of the first byte and the position of the last byte, in general format:
Range:bytes=start-end
Two, single thread realization
① whether to support breakpoint continuation
Use head to get some entities to see if the return header contains Content-range
Use head to get some entities to see if the return status code is 206.
② Specific implementation steps
Use the head method to get the file size
Get local File size
Set Request Header Range information
Using requqests.response.iter_content and opening stream mode
The file is downloaded to a certain size to write
The code is as follows |
Copy Code |
# usr/bin/env Python # Coding:utf-8 """ Copyright (c) 2015-2016 Cain Author:cain <singforcain@gmail.com> """ Import OS Import time Import logging Import datetime Import requests Import Argparse Class FileDownload (object):
def __init__ (self, URL, file_name):
"""
:p Aram URL: Download address for file
:p Aram file_name: Name of the renamed file
: return:
"""
Self.url = URL
Self.file_name = file_name
Self.stat_time = Time.time ()
Self.file_size = Self.getsize ()
Self.offset = Self.getoffset ()
Self.downloaded_size = Self.offset
Self.headers = Self.setheaders ()
Self.tmpfile = ""
Self.info () def info (self): Logging.info ("downloaded [%s] bytes"% (Self.offset)) def setheaders (self): "" set range header range based on the size of the downloaded file and return : "" start = Self.offset end = Self.file_size-1 range = "Bytes={0}-{1}". Format (start, end) return { "Range": Range} def getoffset (self): if Os.path.exists ( Self.file_name): if self.file_size = = Os.path.getsize (self.file_name): exit () else: return Os.path.getsize ( Self.file_name) else: return 0 def getsize (self): """ : return: Returns the size of the file, using the Head method """ Response = Requests.head (Self.url) return int (response.headers["content-length"]) def download (self):
"""
The core part of the continuation of a breakpoint
: return:
"""
With open (Self.file_name, "AB") as F:
Try
R = Requests.get (Self.url, Stream=true, Headers=self.headers)
For chunk in R.iter_content (chunk_size=1024):
If not chunk:
Break
Self.tmpfile + + Chunk
If Len (self.tmpfile) = = 1024*50:
F.write (Self.tmpfile)
Self.downloaded_size = Len (self.tmpfile)
Logging.info ("Downloaded---[%.2f%%] [%s/%s] bytes" (Float (self.downloaded_size)
/SELF.FILE_SIZE*100,
Self.downloaded_size, Self.file_size))
Self.tmpfile = ""
Except Keyboardinterrupt:
Logging.warning ("interruped by User")
Logging.info ("Ending the thread,please does not exit")
Finally
F.write (Self.tmpfile)
Self.downloaded_size = Len (self.tmpfile)
Logging.info ("Downloaded---[%.2f%%]%s/%s bytes" (Float (self.downloaded_size)
/SELF.FILE_SIZE*100,
Self.downloaded_size, Self.file_size))
consume = Int (Time.time ())-Self.stat_time
Logging.info ("It consumes%d seconds"% (consume))
Logging.info ("End at%s"% (Time.strftime ("%y-%m-%d%h:%m:%s", Time.localtime (Time.time ()))) DEF init (): """ Configuring log information : return: """ Logging.basicconfig (format= ' [% (asctime) s]\t[% (levelname) s]\t% (message) s ', Level= "DEBUG", datefmt= "%y/%m/%d%i:%m:%s%p" ) def run (URL, name): If not name: Name = Url.split ("/") [-1] File = filedownload (URL, name) File.download () if __name__ = = ' __main__ ': Init () Parser = Argparse. Argumentparser () Parser.add_argument ("url", help= "the file ' s url") Parser.add_argument ("--name", help= "the file ' s name you want to rename") args = Parser.parse_args () Run (Args.url, Args.name) |
Third, multithreading implementation (non-breakpoint continuation)
The code is as follows |
Copy Code |
# usr/bin/env Python
# Coding:utf-8
"""
Copyright (c) 2015-2016 Cain
Author:cain <singforcain@gmail.com>
"""
Import time
Import Math
Import Queue
Import logging
Import Argparse
Import requests
Import threading
Mutex = Threading. Lock ()
Class FileDownload (object):
def __init__ (self, url, filename, threadnum, bulk_size, chunk_size):
Self.url = URL
Self.filename = filename
Self.threadnum = Threadnum
Self.bulk_size = Bulk_size
Self.chunk_size = Chunk_size
Self.file_size = Self.getsize ()
Self.buildemptyfile ()
Self.queue = Queue.queue (1024)
Self.setqueue ()
def getsize (self):
"""
: return: Returns the size of the file, using the Head method
"""
Response = Requests.head (Self.url)
return int (response.headers["content-length"])
def buildemptyfile (self):
"""
Create an empty file
: return:
"""
Try
Logging.info ("Building Empty File ...")
With open (Self.filename, "w") as F:
F.seek (Self.file_size)
F.write ("\x00")
F.close ()
Except Exception as err:
Logging.error ("Building Empty File Error ...")
Logging.error (ERR)
Exit ()
def setqueue (self):
"""
Set up queues based on file size and the file size of each task set
: return: Back to queue information
"""
Logging.info ("Setting the queue ...")
tasknums = Int (Math.ceil (float (self.file_size)/self.bulk_size)) # Rounding up
For I in Range (tasknums):
Ranges = (Self.bulk_size*i, self.bulk_size* (i+1)-1)
Self.queue.put (ranges)
def download (self):
While True:
Logging.info ("Downloading data in%s"% (Threading.current_thread (). GetName ()))
If not Self.queue.empty ():
Start, end = Self.queue.get ()
Tmpfile = ""
ranges = "Bytes={0}-{1}". Format (start, end)
headers = {"Range": Ranges}
Logging.info (Headers)
R = Requests.get (Self.url, Stream=true, Headers=headers)
For chunk in R.iter_content (chunk_size=self.chunk_size):
If not chunk:
Break
Tmpfile + + Chunk
Mutex.acquire ()
With open (Self.filename, "r+b") as F:
F.seek (Start)
F.write (Tmpfile)
F.close ()
Logging.info ("Writing [%d]bytes data into the file ..."% (len (tmpfile))
Mutex.release ()
Else
Logging.info ("%s is over ..."% (Threading.current_thread (). GetName ()))
Break
def run (self):
threads = List ()
For I in Range (Self.threadnum):
Threads.append (Threading. Thread (Target=self.download))
For thread in Threads:
Thread.Start ()
For thread in Threads:
Thread.Join ()
Def loginit ():
"""
Configuring log information
: return:
"""
Logging.basicconfig (format= ' [% (asctime) s]\t[% (levelname) s]\t% (message) s ',
Level= "DEBUG",
datefmt= "%y/%m/%d%i:%m:%s%p")
def start (URL, filename, threadnum):
"""
Download some of the core features
:p Aram URL:
:p Aram FileName:
:p Aram Threadnum:
: return:
"""
url = URL
filename = filename
Threadnum = Threadnum if threadnum and Threadnum < else 5
Bulk_size = 2*1024*1014
Chunk_size = 50*1024
Print URL, filename, threadnum, bulk_size, chunk_size
Download = filedownload (URL, filename, threadnum, bulk_size, Chunk_size)
Download.run ()
if __name__ = = ' __main__ ':
LogInit ()
Logging.info ("APP is starting ...")
Start_time = Time.time ()
Parser = Argparse. Argumentparser ()
Parser.add_argument ("url", help= "the file ' s url")
Parser.add_argument ("--filename", help= "the file ' s name you want to rename")
Parser.add_argument ("--threadnum", help= "The threads you want to choose", Type=int)
args = Parser.parse_args ()
Start (Args.url, Args.filename, Args.threadnum)
Logging.info ("App in Ending ...")
Logging.info ("It consumes [%d] seconds"% (Time.time ()-start_time))
|
Four, multithreading breakpoint continued transmission
This is the combination of the above, but usually a configuration file is used to save the status of the download, which is reconfigured according to the file when it is downloaded again.