Multi-threaded downloading via HTTP protocol

Source: Internet
Author: User
1. Basic principles, each thread from the file in different locations to download, and finally merge the complete data.

2. The benefits of using multithreaded downloads
Fast download speed. Why, then? Well understood, I used to be a thread on the server to download. That is, the corresponding on the server, there is a my download thread exists.
I'm sure it's not just me. In the download, the server must have multiple download threads at the same time, downloading server resources. For CPUs, it is not possible to implement concurrent execution.
The CPU will be fair to divide the time slice for these threads, take turns executing, a line Cheng millisecond, B line Cheng millisecond ...
Assuming the use of this approach means that my download application can be downloaded simultaneously using any number of threads on the server side (theoretically).
Assuming that the number of threads is 50, this application will be more than 50 times times the server CPU care.
But it will always be limited by the speed of the local network.

3. The length of data that each thread is responsible for downloading can be computed by dividing the total length of the download data by the total number of threads participating in the download. But consider the circumstances that are not divisible.
Assuming there are 5 threads participating in the download, the calculation formula should be:
int block = Total length of data% thread = 0? 10/3:10/3+1; (not divisible, plus one)

4. And database paging query type. Each thread needs to know where to start downloading the data and where to download it.
First, each thread is equipped with an ID, starting with the zero-based ID, 0 1 2 3 ...
Start position: The thread ID multiplied by the length of data that each thread is responsible for downloading.
End Position: The previous position where the next thread starts.
Such as:
int startposition = thread id * Length of data downloaded per thread
int endposition = (thread ID + 1) * The length of data downloaded per thread-1;

5. The Range header of the HTTP protocol can specify where to download from the file and where to end it. Unit is 1byte
range:bytes=2097152-4194304 means download from the 2M location of the file, download to 4M end
If Range specifies the number of bytes to read to 5104389 of the file, the downloaded file itself has only 4,104,389 lengths. Then the download operation will automatically stop at 4104389.
Therefore, no additional invalid data is downloaded.

6. Another challenge is how to write data to local files in order. Because threads are executed synchronously, they are simultaneously writing data to the local target file.
The data that threads write between threads is not in the order of the downloaded data itself. The final local download will be distorted if you follow the normal outputstream write mode.
So we're going to use the following class:
Java.io.RandomAccessFile
Because this class implements both the DataOutput and Datainput methods. So that they have both write and read capabilities.
This class seems to have something like a file pointer that can begin to read and write anywhere in the file.
Thus, instances of this class support reading and writing to random access files.

For example:
Java code file = new file ("1.txt");           Randomaccessfile accessfile = new Randomaccessfile (file,"rwd"); Accessfile.setlength (1024);
Although, after executing this code, we have not yet been to the target file "1.txt" Write any data. However, if you view its size at this time, it is already 1kb. This is the size we set ourselves.
This operation is similar to storing a large byte array to this file. This array supports the file to the specified size. Waiting to be filled.
The advantage, then, is that we can randomly access portions of this filesystem through the index.
For example, this file size may be 500
Well, my business needs may need to write data from 300 bits for the first time and write to 350.
The second time, I started writing data from 50 to 100.
In short, I am not "one-time" "in order" to finish this document.
So, Randomaccessfile can support this operation.

Api
void SetLength (Long newlength)
Sets the length of this file. (Set the estimated size of the file)
void Seek (Long POS)
Sets the File-pointer offset, measured from the beginning of this file, at which the next read or write occurs.
Let's say that this method passes in the 1028 parameter, which is written from the 1028-bit file.
void Write (byte[] b, int off, int len)
Writes Len bytes from the specified byte array starting in offset off to this file.
Write (byte[] b)
Writes B.length bytes from the specified byte array to this file, and starting at the current file pointer.
void writeUTF (String str)
Writes a string to the file using modified UTF-8 encoding in a machine-independent manner.
String ReadLine ()
Reads the next line of text from this file.

Experimental code:


Java Code    public static void main (String[] args)  throws exception {        file file = new file ("1.txt");        Randomaccessfile accessfile = new randomaccessfile (file,"rwd ");       /*  setup file is  3  byte size  */       accessfile.setlength (3);       /*  write to second position   ' 2 '  */       accessfile.seek (1);       accessfile.write ("2". GetBytes ());       /*  write to the first location   ' 1 '  */       Accessfile.seek (0);  accessfile.write ("1". GetBytes ());      /*  write to the third position   ' 3 '  */        Accessfile.seek (2);       ACCESSFILe.write ("3" GetBytes ())  accessfile.close ();      //  Expect the contents of the file to be  :123       }   


The above experiments were successful, although we wrote the string in the order of "2", "1", "3&quot, but because of the relationship between the file offset set, The final data saved by the file is: 123
Another question is that the size of the file is already 3 bytes, after the three data has been written. Already full of written data, then we continue to put data in it will have any effect.

/* Write data to the fourth byte position exceeding size * *
Accessfile.seek (3);
Accessfile.write ("400" GetBytes ());

The above code, regardless of the file pointer offset specified by the Seek method and the data being stored, has exceeded the 3-byte size that was initially set for the file.
According to my guess, at least "Accessfile.seek (3)" position will be thrown " arrayindexoutofboundsexception" An exception that indicates that the subscript is out of bounds.
And, separately executes "accessfile.write ("400". GetBytes ()) " Should be able to succeed. Because this demand is reasonable, there should be a mechanism for enforcing it.
The result of the experiment is that both codes are successful. It appears to be a description of the large byte array implied by the file, which can be automatically propped up.

However, it is important to note that you must ensure that each location of the file size you set has valid data, at least not null.
For example:
/* write ' 3 ' to a third position
Accessfile.seek (2);
Accessfile.write ("3" GetBytes ());

Accessfile.seek (5);
Accessfile.write ("400" GetBytes ());
So combining the previous code, the final result is:
123 Mouth 400
There are garbled characters in the blank two locations. This is for well-deserved.

Also, suppose we specify 100 lengths for the file:
Accessfile.setlength (100);
And, in fact, we only set the values for the first five places. Then, of course, the data saved by the file will end up with 95 garbled characters.

7. The preparatory work should be very adequate. The next code.



Java code Import java.io.File;     Import java.io.IOException;     Import Java.io.InputStream;     Import Java.io.RandomAccessFile;     Import java.net.HttpURLConnection;     Import Java.net.URL; /** * Multithreading File download/public class Multhreaddownload {/* Download URL * *

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.