Principle of resumable Data Transfer

Source: Internet
Author: User
Tags ftp commands ftp client microsoft iis
Resumable data transfer is the core of large file data transmission. This article describes how to implement resumable data transfer for large files based on multi-thread and Socket technologies.

Basic Implementation ideas

The basic idea of multi‑thread resumable data transfer is to divide the files to be transmitted on the sending end (also called the client) into multiple blocks of considerable size and use multiple threads, send these blocks to the target server at the same time. The service program on the server listens to data transmission requests. Each time a new request is received, a new thread is created, which corresponds to the sending thread of the client, receive data and record the data transmission process

Figure 1 shows the nth part of the point-to-point file resumable data transfer process. On the transmission initiator (client), large files are divided into N equal-sized files in advance, and N transmission threads are created to connect to the target server. After receiving each connection request, the server notifies the client that files can be transmitted. When the client receives a message that can transmit files, it first sends a data transmission information block (including the first part and the starting position in the block) request to the server, when the server receives the request, it sends the data transmission information to the client. The client then transmits the data specified by the data transmission information block to the server. The server updates the data transmission information block.

(1) Principle of resumable Data Transfer
In fact, the principle of resumable upload is very simple, that is, the Http request is different from the general download.
For example, when a browser requests a file on the server, the request is as follows:
Assume that the server domain name is wwww.sjtu.edu.cn, and the file name is down.zip.
GET/down.zip HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd. ms-
Excel, application/msword, application/vnd. ms-powerpoint ,*/*
Accept-Language: zh-cn
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
Connection: Keep-Alive


After receiving the request, the server searches for the requested file, extracts the file information, and then returns it to the browser. The returned information is as follows:


200
Content-Length = 106786028
Accept-Ranges = bytes
Date = Mon, 30 Apr 2001 12:56:11 GMT
ETag = W/"02ca57e173c11: 95b"
Content-Type = application/octet-stream
Server = Microsoft-Microsoft IIS/5.0
Last-Modified = Mon, 30 Apr 2001 12:56:11 GMT



The so-called resumable upload means that the download starts from where the object has been downloaded. Therefore, it is sent
Add one more message to the Web server-where to start.
The following is a self-compiled "Browser" to transmit the request information to the Web server, starting from 2000070 bytes.
GET/down.zip HTTP/1.0
User-Agent: NetFox
Value RANGE: bytes = 2000070-
Accept: text/html, image/gif, image/jpeg, *; q =. 2, */*; q =. 2


Take a closer look and you will find that there is an additional row RANGE: bytes = 2000070-
In this line, the file down.zip is uploaded starting from 2000070 bytes, and the previous bytes are not required.
After receiving the request, the server returns the following information:
206
Content-Length = 106786028
Content-Range = bytes 2000070-106786027/106786028
Date = Mon, 30 Apr 2001 12:55:20 GMT
ETag = W/"02ca57e173c11: 95b"
Content-Type = application/octet-stream
Server = Microsoft-Microsoft IIS/5.0
Last-Modified = Mon, 30 Apr 2001 12:55:20 GMT


Compared with the information returned by the preceding server, a new row is added:
Content-Range = bytes 2000070-106786027/106786028
The Returned Code is also changed to 206, instead of 200.


With the above principles, you can program resumable data transfer.


(2) Key Points for implementing resumable data transfer in Java


(1) method used to implement the submit RANGE: bytes = 2000070 -.
Of course, it is certainly possible to use the most primitive Socket, but it is too time-consuming. In fact, this function is provided in the Java net package. The Code is as follows:
URL url = new URL ("http://www.sjtu.edu.cn/down.zip ");
HttpURLConnection httpConnection = (HttpURLConnection) url. openConnection



();
// Set User-Agent
HttpConnection. setRequestProperty ("User-Agent", "NetFox ");
// Set the start position of resumable upload
HttpConnection. setRequestProperty ("RANGE", "bytes = 2000070 ");
// Obtain the input stream
InputStream input = httpConnection. getInputStream ();


The byte stream that starts from 2000070 in the down.zip file of the input stream.
As you can see, it is quite easy to implement resumable upload in Java.
The next thing to do is how to save the obtained stream to the file.


The method used to save the file.
I use the RandAccessFile class in the IO package.
The operation is quite simple. Assume that the file is saved from 2000070. The Code is as follows:
RandomAccess oSavedFile = new RandomAccessFile ("down.zip", "rw ");
Long nPos = 2000070;
// Locate the file pointer to the nPos position
OSavedFile. seek (nPos );
Byte [] B = new byte [1024];
Int nRead;
// Read the byte stream from the input stream and write it to the file
While (nRead = input. read (B, 0,1024)> 0)
{
OSavedFile. write (B, 0, nRead );
}

It's easy.
The next step is to integrate it into a complete program. Including a series of thread control and so on.


(3) implementation of the resumable upload Kernel
It mainly uses six classes, including one test class.
SiteFileFetch. java is responsible for capturing the entire file and controlling Internal Threads (FileSplitterFetch class ).
FileSplitterFetch. java is responsible for capturing some files.
FileAccess. java stores files.
SiteInfoBean. java indicates the information of the file to be crawled, such as the Directory, name, and URL of the file to be crawled.
Utility. java tool class, put some simple methods.
TestMethod. java test class.


The following is the source program:
/*
** SiteFileFetch. java
*/
Package NetFox;
Import java. io .*;
Import java.net .*;


Public class SiteFileFetch extends Thread {


SiteInfoBean siteInfoBean = null; // File Information Bean
Long [] nStartPos; // start position
Long [] nEndPos; // end position
FileSplitterFetch [] fileSplitterFetch; // subthread object
Long nFileLength; // file length
Boolean bFirst = true; // whether to obtain the object for the first time
Boolean bStop = false; // stop flag
File tmpFile; // temporary File download information
DataOutputStream output; // output stream to the file


Public SiteFileFetch (SiteInfoBean bean) throws IOException
{
SiteInfoBean = bean;
// TmpFile = File. createTempFile ("zhong", "1111", new File (bean. getSFilePath ()));
TmpFile = new File (bean. getSFilePath () + File. separator + bean. getSFileName () + ". info ");
If (tmpFile. exists ())
{
BFirst = false;
Read_nPos ();
}
Else
{
NStartPos = new long [bean. getNSplitter ()];
NEndPos = new long [bean. getNSplitter ()];
}


}


Public void run ()
{
// Obtain the object Length
// Split the file
// Instance FileSplitterFetch
// Start the FileSplitterFetch thread
// Wait for the subthread to return
Try {
If (bFirst)
{
NFileLength = getFileSize ();
If (nFileLength =-1)
{
System. err. println ("File Length is not known! ");
}
Else if (nFileLength =-2)
{
System. err. println ("File is not access! ");
}
Else
{
For (int I = 0; I new FileSplitterFetch (siteInfoBean. getSSiteURL (),
SiteInfoBean. getSFilePath () + File. separator + siteInfoBean. getSFileName (),
NStartPos [I], nEndPos [I], I );
Utility. log ("Thread" + I + ", nStartPos =" + nStartPos [I] + ", nEndPos =" + nEndPos [I]);
FileSplitterFetch [I]. start ();
}
// FileSplitterFetch [nPos. length-1] = new FileSplitterFetch (siteInfoBean. getSSiteURL (),
SiteInfoBean. getSFilePath () + File. separator + siteInfoBean. getSFileName (), nPos [nPos. length-1], nFileLength, nPos. length-1 );
// Utility. log ("Thread" + (nPos. length-1) + ", nStartPos =" + nPos [nPos. length-1] + ",
NEndPos = "+ nFileLength );
// FileSplitterFetch [nPos. length-1]. start ();


// Wait until the sub-thread ends
// Int count = 0;
// Whether to end the while LOOP
Boolean breakWhile = false;


While (! BStop)
{
Write_nPos ();
Utility. sleep (500 );
BreakWhile = true;


For (int I = 0; i4)
// SiteStop ();
}


System. err. println ("the File Download is complete! ");
}
Catch (Exception e) {e. printStackTrace ();}
}


// Obtain the object Length
Public long getFileSize ()
{
Int nFileLength =-1;
Try {
URL url = new URL (siteInfoBean. getSSiteURL ());
HttpURLConnection httpConnection = (HttpURLConnection) url. openConnection ();
HttpConnection. setRequestProperty ("User-Agent", "NetFox ");


Int responseCode = httpConnection. getResponseCode ();
If (responseCode> = 400)
{
ProcessErrorCode (responseCode );
Return-2; //-2 represent access is error
}


String sHeader;


For (int I = 1; I ++)
{
// DataInputStream in = new DataInputStream (httpConnection. getInputStream ());
// Utility. log (in. readLine ());
SHeader = httpConnection. getHeaderFieldKey (I );
If (sHeader! = Null)
{
If (sHeader. equals ("Content-Length "))
{
NFileLength = Integer. parseInt (httpConnection. getHeaderField (sHeader ));
Break;
}
}
Else
Break;
}
}
Catch (IOException e) {e. printStackTrace ();}
Catch (Exception e) {e. printStackTrace ();}


Utility. log (nFileLength );


Return nFileLength;
}


// Save the download information (File pointer location)
Private void write_nPos ()
{
Try {
Output = new DataOutputStream (new FileOutputStream (tmpFile ));
Output. writeInt (nStartPos. length );
For (int I = 0; iDataInputStream (new FileInputStream (tmpFile ));
Int nCount = input. readInt ();
NStartPos = new long [nCount];
NEndPos = new long [nCount];
For (int I = 0; I 0 & nStartPos <nEndPos &&! BStop)
{
NStartPos + = fileAccessI. write (B, 0, nRead );
// If (nThreadID = 1)
// Utility. log ("nStartPos =" + nStartPos + ", nEndPos =" + nEndPos );
}


Utility. log ("Thread" + nThreadID + "is over! ");
BDownOver = true;
// NPos = fileAccessI. write (B, 0, nRead );
}
Catch (Exception e) {e. printStackTrace ();}
}
}


// Print the Response Header
Public void logResponseHead (HttpURLConnection con)
{
For (int I = 1; I ++)
{
String header = con. getHeaderFieldKey (I );
If (header! = Null)
// ResponseHeaders. put (header, httpConnection. getHeaderField (header ));
Utility. log (header + ":" + con. getHeaderField (header ));
Else
Break;
}
}


Public void splitterStop ()
{
BStop = true;
}


}


/*
** FileAccess. java
*/
Package NetFox;
Import java. io .*;


Public class FileAccessI implements Serializable {


RandomAccessFile oSavedFile;
Long nPos;


Public FileAccessI () throws IOException
{
This ("", 0 );
}


Public FileAccessI (String sName, long nPos) throws IOException
{
OSavedFile = new RandomAccessFile (sName, "rw ");
This. nPos = nPos;
OSavedFile. seek (nPos );
}


Public synchronized int write (byte [] B, int nStart, int nLen)
{
Int n =-1;
Try {
OSavedFile. write (B, nStart, nLen );
N = nLen;
}
Catch (IOException e)
{
E. printStackTrace ();
}


Return n;
}


}


/*
** SiteInfoBean. java
*/
Package NetFox;


Public class SiteInfoBean {


Private String sSiteURL; // Site's URL
Private String sFilePath; // Saved File's Path
Private String sFileName; // Saved File's Name
Private int nSplitter; // Count of Splited Downloading File


Public SiteInfoBean ()
{
// Default value of nSplitter is 5
This ("", 5 );
}


Public SiteInfoBean (String sURL, String sPath, String sName, int nSpiltter)
{
SSiteURL = sURL;
SFilePath = sPath;
SFileName = sName;
This. nSplitter = nSpiltter;


}


Public String getSSiteURL ()
{
Return sSiteURL;
}


Public void setSSiteURL (String value)
{
SSiteURL = value;
}


Public String getSFilePath ()
{
Return sFilePath;
}


Public void setSFilePath (String value)
{
SFilePath = value;
}


Public String getSFileName ()
{
Return sFileName;
}


Public void setSFileName (String value)
{
SFileName = value;
}


Public int getNSplitter ()
{
Return nSplitter;
}


Public void setNSplitter (int nCount)
{
NSplitter = nCount;
}
}


/*
** Utility. java
*/
Package NetFox;


Public class Utility {


Public Utility ()
{


}


Public static void sleep (int nSecond)
{
Try {
Thread. sleep (nSecond );
}
Catch (Exception e)
{
E. printStackTrace ();
}
}


Public static void log (String sMsg)
{
System. err. println (sMsg );
}


Public static void log (int sMsg)
{
System. err. println (sMsg );
}
}


/*
** TestMethod. java
*/
Package NetFox;


Public class TestMethod {


Public TestMethod ()
{// Xx/weblogic60b2_win.exe
Try {
SiteInfoBean bean = new SiteInfoBean ("http: // localhost/xx/weblogic60b2_win.exe", "L: emp", "weblogic60b2_win.exe", 5 );
// SiteInfoBean bean = new SiteInfoBean ("http: // localhost: 8080/down.zip", "L: emp", "weblogic60b2_win.exe", 5 );
SiteFileFetch fileFetch = new SiteFileFetch (bean );
FileFetch. start ();
}
Catch (Exception e) {e. printStackTrace ();}


}


Public static void main (String [] args)
{
New TestMethod ();
}
}




I. The most important thing is that resumable data transfer requires the support of the server, which is necessary.
The traditional ftp server does not support resumable data transfer because it does not support REST commands. The traditional FTP commands (I mean SERVER-side commands) do not include REST commands.
Second, the client needs to know how to use a series of commands such as REST for resumable data transfer.
Check the detailed process of resumable data transfer (ftp server ):
First, the client uses the REST command to tell the ftp server that it needs to be transferred from a certain point of the file, and then uses the STOR or RETR command to start transferring the file. The General Command process is as follows:
TYPE I
200 Type set to I.
PASV
227 Entering Passive Mode (98,250)
REST 187392
350 Restarting at 187392. Send STORE or RETRIEVE to initiate transfer.
RETR/pub/audio/pci/maestro-3/win2k/1056.zip
150 Opening BINARY mode data connection for/pub/audio/pci/maestro-3/win2k/1056.zip (936098 bytes ).
First, use the TYPE command to tell the ftp server to transmit files in BINARY mode;
Then, the PASV command is used to tell the ftp server to transmit files in passive open mode;
Then, the REST 187392 command is used to tell the ftp server to transmit the data starting from the 187392 bytes of the file;
Finally, the RETR command is used to transmit files.
From the above we can see that this ftp server supports REST commands, and some FTP servers (especially old ones) do not support this command, in this case, even if the ftp client supports resumable data transfer, it is useless at all!
Supports breakpoint ftp server: Serv-u ftp, and a series of new FTP servers;
Resumable upload is not supported: IIS4 and earlier versions do not support resumable upload. If IIS5 is available, you can test it. log on to the ftp server and enter the REST 1000 command to check whether the SERVER knows it, breakpoint is supported.
The above is the breakpoint of the ftp server. The resumable HTTP is as follows:
In earlier versions, http server does not support breakpoints. HTTP/1.1 is supported as follows:
In the HTTP request header information, it is usually like this:
GET http://xxx.xxx.xxx.xxx/index.html HTTP/1.1
Host: www.163.net
Accept :*/*
The above is the main content of the HTTP Request Header, which is the information sent by the browser and other clients to the http server.
In this Request header, the first Line is "Request Line" and "GET" is "Request Method" (usually GET and CGI are used for requests on an HTML page). http://bbs.netbuddy.org/index.htmlis url,http/1.1is the response number.
Host: bbs.netbuddy.org is the name of the HTTP server, which is also a new concept of HTTP/1.1. In the past, a virtual Host was used to have a Host name corresponding to multiple IP addresses. Now it's okay. This is too far away from the question, not to mention)
For resumable data transfer, the browser and other clients must send messages in the request header.
Value Range: bytes = 1140736-
Such a request tells the http server that the file must be transmitted starting from 1140736 bytes.
At the last point, you may have a problem after reading the above description. How can we implement multi-point transfer? There are several threads that connect to the server and use breakpoint commands to send files. During the transfer process, the previous ones will be checked (for example, the First ant) if the part of the obtained file exceeds the starting point (for example, the second ant), stop the ant, and merge the parts to get a complete file.

----------------------------------------------
C implementation method:

3. Specific implementation

In the implementation process, I used multiple threads of MFC and Windows Socket, which were implemented on the client and server. Because data transmission is often equivalent, it is very easy to integrate the client and the server, you just need to integrate them together. The following describes the implementation of the client and server respectively.

3.1 Key Data Structure

The file information data structure is used to transmit the attribute information of the nth block of the file between the server and the client. The detailed definition is as follows:

Structfileinfo

{

Int fileno; // file number

Int type; // Message type

Long len; // The length of the file (Block). The length of the file is used when the client sends data to the server;

// When the server sends the uploaded part of the information to the client, it is the length of the uploaded part;

Long seek; // start position to identify the start position of the original file to be transmitted

Char name [MAX_PATH_LEN]; // file name

};

The sending progress record structure records the file transfer process. The detailed definitions are as follows:

StructSentInfo

{

Long totle; // The length of data that has been successfully sent;

Int block; // block ID;

Long filelen; // The total file length;

Int threadno; // the ID of the thread responsible for transmitting the nth data block;

CString name; // file name

};

Client-side Client File Sending instance encapsulation. You can record the property information, sent size, sending thread handle, sending thread status, and sending statistics during the sending process of client files. The specific definitions are as follows:

ClassCClient: publicCObject

{

Protected:

  

// Attributes

Public:

CClient (CString ip );

~ CClient ();

SentInfo doinfo;

Long m_index; // block Index

BOOL sendOk [BLOCK]; // BLOCK sending end status

CString SendFileName;

CString DestIp; // the destination IP address.

THREADSTATUS SendStatus;

Int GetBlockIndex (); // obtain the sequence number of the file block to be transferred, for example, 0, 1, 2...

CCriticalSection m_gCS;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.