Golang implementing multi-threaded concurrent Downloads

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

We have used thunder and other download tools, features are supported concurrent download, breakpoint continued to pass. We do not introduce it here, this is more complex, pressing also do not understand. In this paper, we only introduce the simple breakpoint continuation in narrow sense and the multi-threaded download in narrow sense. As before, in order to study the principle of real life is basically no use, measured down multi-threaded download than single-threaded download is also slow ... It's embarrassing.

The main three aspects, how to download the concurrent HTTP, through the Golang for multi-process development, how to resume the breakpoint.

Concurrent download of HTTP

If you want to download concurrently, block the contents of the download and download the blocks in parallel. This requires the server to support chunked fetching of data. Big Thunder, Electric donkey This has its own agreement, thunder:// this, we only study the principle, say the HTTP protocol for concurrency support.

The
  http header   corresponding value   meaning  
  content-length   14247   http response body size, download, body is a file, can also be considered a file size, unit is bit  
  content-disposition   inline; filename= "Bryce.jpg"   is an extension of the MIME protocol, and the MIME protocol indicates how the MIME user agent displays the attached file. When the browser receives the header, it activates the file download. This also contains the file name  
  accept-ranges   by TES   allows clients to obtain files in bytes form  
  Ran GE   bytes=0-511   block gets the data, which represents the No. 0 to No. 511 data, total 512 bytes &n BSP;

If you want to download a file, you want to know the information about these files, such as the file name, file size, whether concurrent download is supported, and the file type can be obtained from the response header. How to obtain this content before downloading, but not the download, you can use the Head method provided by HTTP. The head method responds only to the header part of HTTP and does not contain the body part.

req, err := http.NewRequest("HEAD", get.Url, nil)resp, err := get.GetClient.Do(req)

Gets parameters such as file type, filename, and so on. HTTP from the URL to the head in the body, you can think of is a string, and is indeed a string, but the resolution of the time do not handle the string itself, or disgusting you. The resolution of the URL of the big Golang has the net/url package support, MIME has the mime package support, this is the native package, other languages must also support.

get.ContentLength = int(resp.ContentLength)get.MediaType, get.MediaParams, _ = mime.ParseMediaType(get.Header.Get("Content-Disposition"))log.Printf("Get %s MediaType:%s, Filename:%s, Length %d.\n", get.Url, get.MediaType, get.MediaParams["filename"], get.ContentLength)

Output

2015/07/02 09:56:47 Get http://7b1h1l.com1.z0.glb.clouddn.com/bryce.jpg MediaType:inline, Filename:bryce.jpg, Length 14247.

If the response header is also included Accept-Ranges , it means that the server supports chunked fetching:

if get.Header.Get("Accept-Ranges") != "" {log.Printf("Server %s support Range by %s.\n", get.Header.Get("Server"), get.Header.Get("Accept-Ranges"))} else {log.Printf("Server %s doesn't support Range.\n", get.Header.Get("Server"))}

Sub-module download, create a new n temporary files, my naming rule is to add a block interval suffix, such as bryce.jpg.0-512, so you can save a configuration file (mainly I lazy write). The downloaded blocks are stored in the temporary file, and the final file is saved after the final download. Block download Add a Range head to it.

range_i := fmt.Sprintf("%d-%d", get.DownloadRange[i][0], get.DownloadRange[i][1])log.Printf("Download #%d bytes %s.\n", i, range_i)defer get.TempFiles[i].Close()req, err := http.NewRequest("GET", get.Url, nil)req.Header.Set("Range", "bytes="+range_i)resp, err := get.GetClient.Do(req)defer resp.Body.Close()

Finally, the download is kept in the file. This is when the block is downloaded and then written to the hard disk, and it is kept in memory after the download is complete.

cnt, err := io.Copy(get.TempFiles[i], resp.Body)

Multithreaded development

Concurrent download, to start the N-co-process, the main thread at this time need to block, waiting for the N-process download completed. First you want to channel write with yourself, not special ... With the help of the sync package implementation, it supports WaitGroup , just can solve my problem here.

In the main thread to start the N-process, the Add method can be understood to add a task, task counter plus one; Wait methods are used to block and instruct all tasks to complete.

for i, _ := range get.DownloadRange {get.WG.Add(1)go get.Download(i)}get.WG.Wait()

After the download of the process increment Done function, the task is notified after the completion of the process, task counter minus one.

defer get.WG.Done()

Breakpoint Continuation

This is the simplest, if the task download is paused, the content of the transmission is insufficient. The first step in creating a temporary file's file name suffix comes in handy, after reading to the size that the block should be, check the block's actual size. FILE- os.FileInfo related attribute information can be obtained through the files. This adds an offset to the download and skips the downloaded content.

for i := 0; i < len(get.DownloadRange); i++ {range_i := fmt.Sprintf("%d-%d", get.DownloadRange[i][0], get.DownloadRange[i][1])temp_file, err := os.OpenFile(get.FilePath+"."+range_i, os.O_RDONLY|os.O_APPEND, 0)if err != nil {temp_file, _ = os.Create(get.FilePath + "." + range_i)} else {fi, err := temp_file.Stat()if err == nil {get.DownloadRange[i][0] += int(fi.Size())}}get.TempFiles = append(get.TempFiles, temp_file)}

Probably the simple principle is these, said earlier, than the project can not be used for practical purposes, for the following reasons:

    1. A thread pool concurrent download is required, and the current design will cause the concurrency to be too large when downloading large files. The thread that has completed the download can also continue to download the unfinished block, which involves dynamic allocation and dynamic splitting of the task;
    2. Concurrent download block size, this value is also very particular, the general 4096 and Byte is a memory block units, in multiples of this number to download the comparison of saving memory, at present I here the block size is based on the number of concurrent calculation, compare water;
    3. Mentioned earlier, this is a narrow-threaded download, the premise is that the server must support range, otherwise it will not be able to obtain data concurrently, the actual time when doing the most at the very least there are a few downloaded servers, so that their own server can develop support for data-block access to the protocol to support concurrent download, the market is so dry;
    4. There is a download progress bar, is similar to the wget command in the console display progress bar and download speed of the log, this is not very likely to write, checked, it is said that can be achieved by means of a fmt.Println("abc\rcde") \r carriage return, you can go back to the beginning of the line, I have not tried, we can try.

Please refer to the complete source code in this article.

Reference documents

    1. HTTP protocol file download principle and multi-threaded breakpoint Continuation-Zhuhuiby
    2. Package Sync-the Go Programming Language
    3. How do I update command line output? -StackOverflow

The original link: Golang implementation of multi-threaded concurrent Download, reproduced please indicate the source!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.