Uncover the layer of resumable upload (download) and resumable upload.

Source: Internet
Author: User

Uncover the layer of resumable upload (download) and resumable upload.
1. Introduction

This article mainly introduces the resumable upload during http download, detailed to each step. The main steps are as follows: DNS lookup, TCP three-way handshake, http request sending, TCP data transmission, paused status, continue download, TCP three-way handshake, http request sending, data transmission, and so on ,... Download the http response message, and disconnect the TCP handshake.

2. Principles

2.1 Q &

Q: What is resumable data transfer? What is the principle of resumable data transfer?

A: resumable data transfer means that after a signal is interrupted (disconnection or shutdown), it can be sent from the last place next time (generally referred to as downloading or uploading ), resumable upload is not supported, which means that the next download or upload must start from scratch. Resumable upload is based on the http header Range and Content-Range. Generally, the object headers of Range and Content-Range are used for breakpoint download in the HTTP header. In the Range user request header, specify the location of the first byte and the location of the last byte, such as (Range: 200-300 or Range: 200-); Content-Range is used for the response header. In general, the file size is 10. This download is 3 and interrupted. When the next download continues, the pointer is moved to 3 and the download starts from 3, download the entire file.

  2.2 simple http File Download

  Request to download the entire file:
Get.test.rar HTTP/1.1
Connection: close
Host: 192.168.95.11
Range: bytes = 0-801 // the general request for downloading the entire object is bytes = 0-or you do not need this Header
  Normal response:
HTTP/1.1 200 OK
Content-Length: 801
Content-Type: application/octet-stream
Content-Range: bytes 0-800/801 // 801: total file size

  2.3 important headers

Response Header:

Content-type: Content-type indicates the MIME type of the browser file. This is a very important response header, and there are many MIME types. It is very likely that some MIME types are missing in the program, indicating that all are content-type: application/octet-stream (byte stream)

Content-Disposition: an extension of the MIME protocol. The MIME Protocol indicates how the MIME User Agent displays additional files. When Internet Explorer receives the header,It will activate the File Download Dialog BoxIts File Name box is automatically filled with the file name specified in the header. Well, this is the header. In the activation pop-up prompt download box, content-disposition: attachment; filename = name

Content-Length: "Content-Length: 321" means to tell the browser that the file size is 321 bytes. In fact, I found that without setting this header, the browser can also identify the Pragma Cache-control: set these two headers to public to tell the browser cache. I usually set cache-control: public

Content-Range: indicates that the server returns a certain Range of files and the total length of the files. At this time, the Content-Length field is not the size of the entire file, but the number of bytes corresponding to the file range. Pay attention to this 1.1. General Format: Content-Range: bytes 500-999/1000

  Response Header: 

Range: one or more sub-ranges of the object.

For example:
Indicates the first 500 bytes: bytes = 0-499
Indicates the second 500 bytes: bytes = 500-999
Indicates the last 500 bytes: bytes =-500
Range after 500 bytes: bytes = 500-[Download resumable data transfer (generally, the range format is 500 -)]
First and last bytes: bytes = 0-0,-1
Specify the following ranges: bytes = 500-600,601-999.
However, the server can ignore this request header. If the unconditional GET contains the Range request header, the response will be returned with the status code 206 (PartialContent) instead of 200 (OK ). [206 indicates that the server has completed some get requests, that is, resumable data transfer]

3. Supports resumable file downloads.

The class contains annotations.

FileDownload. class. php

1 <? PHP 2 # File Download (resumable upload supported) 3 class FileDownload 4 {5 # download speed 6 private $ _ speed = 512; 7 8/** 9 * @ desc download file 10*11 * @ param $ file string download file path 12 * @ param $ name string file name when saving the file, if this parameter is left blank, the final download file is named 13 * @ param $ reload bool. whether to use resumable upload to download 14 */15 public function download ($ file, $ name = '', $ reload = false) 16 {17 if (file_exists ($ file) # judge whether the file exists 18 {19 if ($ name = '') # determine whether the name parameter has 20 {21 $ name = basename ($ fil E); # store with the original file name 22} 23 $ fHandle = fopen ($ file, 'rb'); # Open in read-only mode; for portability considerations, use the B mark to open a file (different systems have different line breaks) 24 $ fileSize = filesize ($ file); # file size 25 $ ranges = $ this-> getRange ($ fileSize ); # When resumable upload, first check the download range of 26 headers ('cache-control: public'); #27 headers ('content-type: application/octet-stream '); # Tell the browser the type of the response object (byte stream, the browser uses the download method by default) 28 header ('content-disposition: attachment; filename = '. $ name); # Do not open this file, stimulate the browser to pop up Loading window 29 # determine whether to use the resume mode for download 30 # And the request header ranges cannot be null (null indicates the first request for download) 31 if ($ reload & $ ranges! = Null) 32 {33 header ('HTTP/1.1 206 Partial content'); # Send custom message 206 resume status code 34 header ('Accept-Ranges: bytes '); # indicates that the server supports Range requests. The supported units are 35 bytes # The remaining length is 36 headers (sprintf ('content-length: % U ', $ ranges ['end']-$ ranges ['start']); 37 # range information 38 header (sprintf ('content-range: bytes % s-% s/% s ', $ ranges ['start'], $ ranges ['end'], $ fileSize )); 39 # fHandle pointer jump to the breakpoint position 40 fseek ($ fHandle, sprintf ('% U', $ ranges ['start']); 4 1} 42 else 43 {44 header ('HTTP/1.1 200 OK '); 45 header ('content-length:'. $ fileSize); 46} 47 while (! Feof ($ fHandle) 48 {49 echo fread ($ fHandle, round ($ this-> _ speed * 1024,0); 50 ob_flush (); # release data from PHP buffer 51 // sleep (2); // used for testing, slowing down download speed 52} 53 ($ fHandle! = Null) & fclose ($ fHandle); 54} 55 else 56 {57 # No file 58 header ("HTTP/1.1 404 Not Found"); 59 return false; 60} 61} 62 63/** 64 * @ desc get the range information of the Request Header 65*66 * @ param $ fileSize int size of the file 67*68 * @ return array | null returns the range information or null 69 */70 public function getRange ($ fileSize) 71 {72 if (isset ($ _ SERVER ['HTTP _ range']) &! Empty ($ _ SERVER ['HTTP _ range']) 73 {74 # Request Header RANGE: bytes = 41078-\ r \ n 75 $ range = $ _ SERVER ['HTTP _ range']; 76 $ RANGE = preg_replace ('/[\ s |,]. */', '', $ range); 77 $ range = explode ('-', substr ($ range, 6 )); # You only need to split 41078-into an array 78 # The range information of the resumable upload header is 4444-in this form, therefore, the split array only has two elements: 79 $ range = array_combine (array ('start', 'end'), $ range ); 80 if (empty ($ range ['start']) 81 {82 $ range ['start'] = 0; 83} 8 4 if (empty ($ range ['end']) 85 {86 $ range ['end'] = $ fileSize; 87} 88 return $ range; 89} 90 return null; # No range information for the first request 91} 92 93/** 94 * @ desc sets the file download speed 95*96 * @ param $ speed int download speed 97 */98 public function setSpeed ($ speed) 99 {100 if (is_numeric ($ speed) & $ speed> 16 & $ speed <4096) 101 {102 $ this-> _ speed = $ speed; 103} 104} 105 106 107} 108?>
4. test and analyze steps 4.1. Prerequisites
  • Change the download speed of the sixth line in the above class file to 10
  • Remove the comments from the first line of the class file to delay it.
  • Download and test using Firefox
  • Use Wireshark packet capture tool for packet capture Analysis
  • Test. php file
1 <? Php2 include 'filedownload. class. php '; 3 $ a = new FileDownload (); 4 # resumable upload 5 is not supported # $ B = $ a-> download ('. /aa.txt', 'bb.txt '); 6 # supports resumable data transfer 7 # $ B = $ a-> download ('. /aa.txt', 'bb.txt ', 1); 8?>

Start test:

4.2. The test supports resumable download.

Procedure:

1. Enable the packet capture tool for monitoring

2. Access the link in Firefox and download the link using Enter.

  

3. Confirm the download

4. Pause twice in the middle and the download is successful.

Download successful!

Analysis packet capture:

1. Enter first. The first step is DNS search. Here I will not talk about it, you can refer to the content here http://www.cnblogs.com/phpstudy2015-6/p/6810130.html#_label18

2. After obtaining the corresponding IP address of the domain name, the browser initiates a TCP connection request to port 80 of the server. See the following packet capture figure-1. One to three lines of TCP connections at the end, TCP three-way handshake. For details, refer to the http://www.cnblogs.com/phpstudy2015-6/p/6810130.html#_label2 of this article I wrote

Packet Capture-1

3. After the TCP connection, the browser initiates an HTTP request, that is, the 4th rows in the packet capture graph-1. Is the http GET request. The first request does not contain the information header range

Http request Diagram 

4. TCP Data Transmission starts after an http request. See the packet capture figure-1, after Row 3, data transmission at the tcp layer starts to be performed sequentially. (The 192.168.95.11Web host sends data to the 192.168.95.10 browser twice in a row. The browser receives and responds to the Web host once, tell the Web host that the data has been received and the data is complete and correct. You can continue transmission !)

5. Pause the download now ,. Please refer to the following packet capture figure-2. Stop the download (that is, disconnect from the Web server) when the row 4 is displayed ). Because this is a sudden disconnection, the Web host does not know that the browser has been disconnected, so it has been sending data to the browser (73 ~ 76) but the Web server did not receive a response from the browser, and finally it did not send data.

The request does not receive the http response from the Web server. According to the original request, after downloading the entire file, the Web will send an http response message, but the browser suddenly breaks down unilaterally. At this time, the data is not completely transmitted. How can we send you a message!

Packet Capture-2

6. Continue to download. See the packet capture figure-3.

When you click "Continue download", a new http request is sent to the server.

77th ~ Line 79 is a TCP connection (three-way handshake)

80th send http Request Information

Please refer to the following http request information. This time it contains the request header Range, which is an important Web mechanism. When the download is paused, the browser will remember the number of accepted bytes. When the download is resumed, this important request header information will be added when the http request information is constructed. This is also a prerequisite for supporting resumable upload.

The browser carries the Range header information to the Web server. At this time, we need to process this important information at the code layer. That is, the number of bytes is taken out, and the pointer is located in the file, and then the file is read and resumed. [This is the logical key of resumable data transfer]

Packet Capture-3

7. pause again. Continue to download and observe and compare. Pause twice. You can see two red crosslines In the rightmost of the packet capture graph-1.

8. the download is successful. The Web server sends an http Response to the browser.

350th rows at the end of the response line

See the following http Response diagram. The response status code is 206.

The red line marks the custom response header in our code.

 Packet Capture-4

Http Response Diagram

9. TCP handshakes and the client disconnects. See the above packet capture figure-4

352nd ~ 354 indicates TCP disconnection. Why is there only three communications for the four handshakes?

For details about TCP disconnection, refer to my previous article.

For the first time, the browser sends the FIN package (indicating to disconnect) and ACK (confirming the serial number ). Seq = 1, 361

The second and third times, the Web server receives the package sent from the browser and replies to the FIN package (I also want to disconnect) and ACK (confirm the serial number ). Seq = 174554, ack = 362 [seq sent from the browser = 361 + 1 = 362, converted into ack = 362 and sent to the browser, indicates that I already know.] [in this case, the browser sends seq = 174554 together and tells the browser that I want to close the connection]

The fourth time, the browser replied to the Web server, ack = 174555 [the browser changed seq = 174554 + 1 sent from the Web server to ack = 174555 and sent it to the Web host, indicating that I already know]

TCP has always said that four handshakes are disconnected. I think this should be logical four handshakes. From the packet capture perspective, the second and third handshakes are merged into one communication.

4.3 The test does not support resumable download.

Procedure:

1. Enable the packet capture tool for monitoring

2. Access the link in Firefox and download the link using Enter.

3. Pause download

4. Continue to download. Download failed! Why! Next we will analyze and analyze

Packet Capture analysis:

1. No exceptions occurred for TCP connections and http get requests.

2. There is no exception before the packet capture analysis is disconnected.

3. Continue to download packet capture Analysis

TCP connection is normal

The http request information seems to be normal, but it is different from the program we wrote. The request information contains the Range request header, which requires data within the Range. Our program defines non-resumable data transfer, that is, each access is a rewrite download, therefore, the data transmitted by the Web cannot match the data transmitted before the browser, and an error occurs!

5. Summary

Learn the OSI network model, TCP/IP network model, and learn more about TCP transmission, http protocol, DNS lookup, and http URL access details, finally, the HTTP protocol application-resumable data transfer, the harvest is quite rich. The above is my understanding of resumable data transfer and relevant tests. If something is wrong, I hope you can point it out so that I can correct it.

 

(The above are some of your own opinions. If you have any shortcomings or errors, please point them out)

Author: The leaf with the wind http://www.cnblogs.com/phpstudy2015-6/

Address: http://www.cnblogs.com/phpstudy2015-6/p/6821478.html

Disclaimer: This blog post is original and only represents the point of view or conclusion I have summarized at a certain time in my work and study. When reprinting, please provide the original article link clearly on the Article Page

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.