ref:http://blog.csdn.net/zhuhuiby/article/details/6725951
Recently studied about the file download related content, feel or write something down to better. At first I just wanted to study, but later found that it was necessary to write a more reusable module, which I think is the habit of most developers.
For HTTP protocols, when you request a file from a server, you can send a request similar to the following:
Get/path/filename http/1.0
Host:www.server.com:80
Accept: */*
User-agent:generaldownloadapplication
Connection:close
Each line is separated by a "carriage return line", and a "carriage return line" is appended to the end of the entire request.
Get in the first row is one of the methods supported by the HTTP protocol, the method name is case sensitive, and the HTTP protocol supports options, Haed, POST, put, DELETE, TRACE, connect, and so on, and the Get and head methods are often considered to be " Secure, which means that any server program that implements the HTTP protocol will implement both methods. For the file download function, get enough. Get is followed by a space, followed by the absolute path of the file to be downloaded from the Web server root. The path is followed by a space, followed by the protocol name and protocol version.
Except for the first line, the remaining lines are the field portion of the HTTP header. The host field represents the hostname and port number, and can be written without the port number, which is the default 80. The */* in the Accept field represents receiving any type of data. User-agent represents the user agent, which is optional, but is strongly recommended because it is the basis for server statistics, tracking, and identifying clients. A close in the connection field uses a non-persistent connection.
More details about the HTTP protocol can be referred to RFC2616 (HTTP 1.1). Because I just want to implement the file download through the HTTP protocol, so I just look at a part of it and don't see it all.
If the server receives the request successfully and does not receive any errors, it returns data similar to the following:
http/1.0 OK
content-length:13057672
Content-type:application/octet-stream
last-modified:wed, Oct 00:56:34 GMT
Accept-ranges:bytes
ETag: "2f38a6cac7cec51:160c"
server:microsoft-iis/6.0
X-powered-by:asp.net
date:wed Nov-01:57:54 GMT
Connection:close
Without explanation, a lot of things look almost clear, just say we all care about the content.
The first line is the protocol name and version number, the space will be followed by a three-digit number, is the HTTP protocol response status code, 200 for success, OK is a short text description of the status code. There are 5 types of status codes:
1XX belongs to the notification class;
2xx belongs to the successful class;
3xx belongs to the redirect class;
4xx belongs to client error class;
The 5xx belongs to the server-side error class.
For the status code, I believe that you should be familiar with the 404, if you request a non-existent file to a server, you will get the error, usually the browser will also display similar "HTTP 404-No Files found" error. The Content-length field is a more important field that indicates the length of the server's return data, which does not contain the HTTP header length. In other words, our request does not have a range field (as we'll say later), indicating that we are requesting the entire file, so content-length is the size of the entire file. The remaining fields are some information about the properties of the file and the server.
The return data also ends with the end of the last line (carriage return) and an extra carriage return line, or "\r\n\r\n". and "\r\n\r\n" immediately after the contents of the file, so that we can find "\r\n\r\n", and from the first byte after it, a steady stream of reading, and then written to the file.
The above is through the HTTP protocol to achieve the entire process of downloading files. However, it is not possible to achieve a continuation of the breakpoint, but in fact the implementation of the continuation of the breakpoint is very simple, as long as you add a range field in the request.
If a file has 1000 bytes, then its range is 0-999, then:
range:bytes=500-represents 500-999 bytes of reading the file, totalling 500 bytes.
range:bytes=500-599 represents 500-599 bytes of reading the file, totalling 100 bytes.
Range There are several other ways to write, but above these two are the most commonly used, for the continuation of the breakpoint is also sufficient. If the HTTP request contains a range field, the server returns 206 (Partial Content) and an appropriate content-range field in the HTTP header, similar to the following format:
Content-range:bytes 500-999/1000
The Content-range field indicates that the server returned a range of files and the total length of the file. At this point the Content-length field is not the size of the entire file, but the corresponding file in the range of bytes, this must be noted.
It seems that there is basically no problem, and I thought so, but that's not the case. If the URL of the file we are requesting is a file like Http://www.server.com/filename.exe, there is no problem. However, many software download Web site file download links are through program redirection, such as PChome's acdsee http download address is:
Http://download.pchome.net/php/tdownload2.php?sid=5547&url=/multimedia/viewer/acdc31sr1b051007.exe&svr =1&typ=0
This address does not directly identify the location of the file, but is redirected by the program. If you request such a URL to the server, the server will return 302 (moved temporarily), meaning that redirection is required, and a location field is included in the HTTP header, and the value of the Location field is the redirected destination URL. You will need to disconnect the current connection and send a request to the redirected server.
Well, that's basically what the principle is. In fact, put a sniffer a good analysis, it is easy to analyze out. But Netants also helped me a bit, and its file download log is helpful for developers.
Annegu a simple HTTP multithreaded download program to discuss multithreading concurrent downloads and breakpoint continuation issues.
The function of this program is that the data can be downloaded from the destination address by multiple threads, each thread is responsible for downloading a part, and can support the continuation of the breakpoint and the timeout connection.
The download method is download (), which receives two parameters, namely the URL and encoding of the page to be downloaded. In this download-responsible approach, there are three main steps. The first step is to set a breakpoint to continue the time of some information, the second step is the main thread of the download, and finally the data merge.
1, Multithreading Download:[Java] View plain copy public string download (string urlstr, string charset) { this.charset = charset; long contentLength = 0; ① CountDownLatch latch = new Countdownlatch (threadnum); long[] startpos = new long[threadNum]; long endpos = 0; try { // get downloaded file format and name from URL this.fileName = Urlstr.substring (Urlstr.lastinDexof ("/") + 1, urlstr.lastindexof ("?") >0 ? urlstr.lastindexof ("?") : urlstr.length ()); if ("". Equalsignorecase (This.filename)) { this.filename = uuid.randomuuid (). toString (); } this.url = new url (URLSTR); urlconnection con = url.openconnection (); setheader (Con); // get content Length &Nbsp; contentlength = Con.getcontentlength (); // divides the context into threadnum segments, the length of each segment. this.threadlength = contentLength / threadNum; // The first step is to analyze the temporary files that have been downloaded, set breakpoints, and create the destination file if it is a new download task. Explained in the 4th. startPos = Setthreadbreakpoint (Filedir, filename, contentlength, startpos); //second step, download files with multiple threads eXecutorservice exec = executors.newcachedthreadpool (); for (int i = 0; i < threadnum; i++) { // Create a subroutine to download the data, the starting position for each piece of data (threadlength * i + download length) startpos[i] += threadlength * i; /* sets the ending position of the child thread, not the last thread (threadlength * (i + 1) - 1) the last thread's end position is the length of the download content */ if (i == threadnum - 1) { endpos = contentLength; } else { endpos = threadlength * (i + 1) - 1; }