This is a creation in Article, where the information may have evolved or changed.
We have used thunder and other download tools, features are supported concurrent download, breakpoint continued to pass. We do not introduce it here, this is more complex, pressing also do not understand. In this paper, we only introduce the simple breakpoint continuation in narrow sense and the multi-threaded download in narrow sense. As before, in order to study the principle of real life is basically no use, measured down multi-threaded download than single-threaded download is also slow ... It's embarrassing.
The main three aspects, how to download the concurrent HTTP, through the Golang for multi-process development, how to resume the breakpoint.
Concurrent download of HTTP
If you want to download concurrently, block the contents of the download and download the blocks in parallel. This requires the server to support chunked fetching of data. Big Thunder, Electric donkey This has its own agreement, thunder:// this, we only study the principle, say the HTTP protocol for concurrency support.
| |
http header |
|
corresponding value |
|
meaning |
|
| |
content-length |
|
14247 |
|
http response body size, download, body is a file, can also be considered a file size, unit is bit |
|
| |
content-disposition |
|
inline; filename= "Bryce.jpg" |
| The
is an extension of the MIME protocol, and the MIME protocol indicates how the MIME user agent displays the attached file. When the browser receives the header, it activates the file download. This also contains the file name |
|
| |
accept-ranges |
|
by TES |
|
allows clients to obtain files in bytes form |
|
| |
Ran GE |
|
bytes=0-511 |
|
block gets the data, which represents the No. 0 to No. 511 data, total 512 bytes |
&n BSP; |
If you want to download a file, you want to know the information about these files, such as the file name, file size, whether concurrent download is supported, and the file type can be obtained from the response header. How to obtain this content before downloading, but not the download, you can use the Head method provided by HTTP. The head method responds only to the header part of HTTP and does not contain the body part.
req, err := http.NewRequest("HEAD", get.Url, nil)resp, err := get.GetClient.Do(req)
Gets parameters such as file type, filename, and so on. HTTP from the URL to the head in the body, you can think of is a string, and is indeed a string, but the resolution of the time do not handle the string itself, or disgusting you. The resolution of the URL of the big Golang has the net/url package support, MIME has the mime package support, this is the native package, other languages must also support.
get.ContentLength = int(resp.ContentLength)get.MediaType, get.MediaParams, _ = mime.ParseMediaType(get.Header.Get("Content-Disposition"))log.Printf("Get %s MediaType:%s, Filename:%s, Length %d.\n", get.Url, get.MediaType, get.MediaParams["filename"], get.ContentLength)
Output
2015/07/02 09:56:47 Get http://7b1h1l.com1.z0.glb.clouddn.com/bryce.jpg MediaType:inline, Filename:bryce.jpg, Length 14247.
If the response header is also included Accept-Ranges , it means that the server supports chunked fetching:
if get.Header.Get("Accept-Ranges") != "" {log.Printf("Server %s support Range by %s.\n", get.Header.Get("Server"), get.Header.Get("Accept-Ranges"))} else {log.Printf("Server %s doesn't support Range.\n", get.Header.Get("Server"))}
Sub-module download, create a new n temporary files, my naming rule is to add a block interval suffix, such as bryce.jpg.0-512, so you can save a configuration file (mainly I lazy write). The downloaded blocks are stored in the temporary file, and the final file is saved after the final download. Block download Add a Range head to it.
range_i := fmt.Sprintf("%d-%d", get.DownloadRange[i][0], get.DownloadRange[i][1])log.Printf("Download #%d bytes %s.\n", i, range_i)defer get.TempFiles[i].Close()req, err := http.NewRequest("GET", get.Url, nil)req.Header.Set("Range", "bytes="+range_i)resp, err := get.GetClient.Do(req)defer resp.Body.Close()
Finally, the download is kept in the file. This is when the block is downloaded and then written to the hard disk, and it is kept in memory after the download is complete.
cnt, err := io.Copy(get.TempFiles[i], resp.Body)
Multithreaded development
Concurrent download, to start the N-co-process, the main thread at this time need to block, waiting for the N-process download completed. First you want to channel write with yourself, not special ... With the help of the sync package implementation, it supports WaitGroup , just can solve my problem here.
In the main thread to start the N-process, the Add method can be understood to add a task, task counter plus one; Wait methods are used to block and instruct all tasks to complete.
for i, _ := range get.DownloadRange {get.WG.Add(1)go get.Download(i)}get.WG.Wait()
After the download of the process increment Done function, the task is notified after the completion of the process, task counter minus one.
defer get.WG.Done()
Breakpoint Continuation
This is the simplest, if the task download is paused, the content of the transmission is insufficient. The first step in creating a temporary file's file name suffix comes in handy, after reading to the size that the block should be, check the block's actual size. FILE- os.FileInfo related attribute information can be obtained through the files. This adds an offset to the download and skips the downloaded content.
for i := 0; i < len(get.DownloadRange); i++ {range_i := fmt.Sprintf("%d-%d", get.DownloadRange[i][0], get.DownloadRange[i][1])temp_file, err := os.OpenFile(get.FilePath+"."+range_i, os.O_RDONLY|os.O_APPEND, 0)if err != nil {temp_file, _ = os.Create(get.FilePath + "." + range_i)} else {fi, err := temp_file.Stat()if err == nil {get.DownloadRange[i][0] += int(fi.Size())}}get.TempFiles = append(get.TempFiles, temp_file)}
Probably the simple principle is these, said earlier, than the project can not be used for practical purposes, for the following reasons:
- A thread pool concurrent download is required, and the current design will cause the concurrency to be too large when downloading large files. The thread that has completed the download can also continue to download the unfinished block, which involves dynamic allocation and dynamic splitting of the task;
- Concurrent download block size, this value is also very particular, the general 4096 and Byte is a memory block units, in multiples of this number to download the comparison of saving memory, at present I here the block size is based on the number of concurrent calculation, compare water;
- Mentioned earlier, this is a narrow-threaded download, the premise is that the server must support range, otherwise it will not be able to obtain data concurrently, the actual time when doing the most at the very least there are a few downloaded servers, so that their own server can develop support for data-block access to the protocol to support concurrent download, the market is so dry;
- There is a download progress bar, is similar to the
wget command in the console display progress bar and download speed of the log, this is not very likely to write, checked, it is said that can be achieved by means of a fmt.Println("abc\rcde") \r carriage return, you can go back to the beginning of the line, I have not tried, we can try.
Please refer to the complete source code in this article.
Reference documents
- HTTP protocol file download principle and multi-threaded breakpoint Continuation-Zhuhuiby
- Package Sync-the Go Programming Language
- How do I update command line output? -StackOverflow
The original link: Golang implementation of multi-threaded concurrent Download, reproduced please indicate the source!