Network Programming-resumable upload and multi-thread download Modes

Source: Internet
Author: User
Tags file transfer protocol

Overview

In today's Internet era, Software downloading is one of the most frequently used software. Over the past few years, the download technology has continued to develop. The original download function is only a "Download" process, that is, reading files continuously from the web server. The biggest problem is that, due to network instability, once the connection is disconnected, the download process is interrupted, and you have to start all over again.

Then, the concept of "resumable upload" came out. As the name suggests, if the download is interrupted, after the connection is re-established, the downloaded part is skipped, and only download the part that has not been downloaded.
Regardless of whether the multi-thread download technology was invented by Mr. Hong erong, it is an indisputable fact that the technology has gained unprecedented attention. After the popular "Network ant" software was launched, many download software also followed suit? Quot; multi-thread download "technology, and even the number of download threads that can be supported have become a factor for people to evaluate the software to be downloaded. The foundation of "multi-thread download" is that the Web server supports remote random reading, that is, supports "resumable data transfer ". In this way, the file can be divided into several parts during the download process, and each part creates a download thread for download.

Now, do not write special download software. It is sometimes necessary to add the download function to your own software. Such as enabling your software to support automatic online upgrades, or automatically downloading new data in the software for data updates, this is a very useful and practical function. The topic of this article is how to compile a download module that supports "resumable upload" and "multithreading. Of course, the download process is very complex and cannot be fully clarified in an article. Therefore, the links that are not directly related to the download process are basically ignored, for Exception Handling and network error handling, please note. The development environment I use is C ++ Builder 5.0. If you are using another development environment or programming language, Please modify it as needed.

HTTP protocol Overview

Downloading files is the process of interaction between computers and web servers. The professional name of the language in which they interact is the protocol. There are many file transfer protocols. The most common protocols are HTTP (Hypertext Transfer Protocol) and FTP (file transfer protocol). I use HTTP.

There are only three basic HTTP commands: Get, post, and head. GET requests a specific object from the Web server, such as an HTML page or file. The web server sends this object as a response through a socket connection. The head command gives the server a basic description of this object, for example, the object type, size, and update time. The post command is used to send data to the Web server. Generally, information is sent to a separate application, and dynamic results are generated after processing and returned to the browser. Download is implemented using the GET command.

Basic download process

Writing and downloading programs can directly use the socket function, but this requires developers to understand and be familiar with the TCP/IP protocol. To simplify the development of Internet client software, Windows provides a set of wininet APIs to encapsulate common network protocols and greatly reduce the threshold for developing Internet software. As shown in wininet API function 1, the call sequence is basically from top to bottom. For the specific function prototype, see msdn.

When using these functions, you must strictly differentiate the handles they use. These handles are of the same type and are all hinternet, but they have different functions, which is very confusing. According to the order in which these handles are generated and the call relationship, there are three levels. The next level of handles are obtained by the upper level of handles.

Internetopen is the first function to be called. It returns the highest level of hinternet handle. I used to define it as hsession, that is, session handle.

Internetconnect uses the hsession handle. The returned HTTP connection handle is defined as hconnect.

Httpopenrequest uses the hconnect handle. The returned handle is the HTTP request handle, which is defined as hrequest.

Httpsendrequest, httpqueryinfo, internetsetfilepointer, and internetreadfile both use the handle returned by httpopenrequest, that is, hrequest.

When these handles are no longer used, use the internetclosehandle function to close them to release the resources they occupy.

First, create a thread module named thttpgetthread, which is automatically suspended after creation. I want the thread to be automatically destroyed after completion, so set it in the constructor:

Freeonterminate = true; // automatically delete

Add the following member variables:

Char buffer [httpget_buffer_max + 4]; // data buffer
Ansistring furl; // the URL of the downloaded object
Ansistring foutfilename; // saved path and name
Hinternet fhsession; // session handle
Hinternet fhconnect; // HTTP connection handle
Hinternet fhrequest; // HTTP request handle
Bool fsuccess; // whether the download is successful
Int ifilehandle; // handle of the output file

1. Establish a connection

By function, the download process can be divided into four parts: establish a connection, read the information of the file to be downloaded, analyze, download the file, and release the resources occupied. The connection establishment function is as follows. parseurl is used to obtain the host name and the web path of the downloaded file from the download URL address. doonstatustext is used to output the current status:

// Initialize the download Environment
Void thttpgetthread: starthttpget (void)
{
Ansistring hostname, filename;
Parseurl (hostname, filename );
Try
{
// 1. Establish a session
Fhsession = internetopen ("http-get-demo ",
Internet_open_type_preconfig,
Null, null,
0); // synchronous Mode
If (fhsession = NULL) Throw (exception ("error: interopen "));
Doonstatustext ("OK: interopen ");
// 2. Establish a connection
Fhconnect = internetconnect (fhsession,
Hostname. c_str (),
Internet_default_http_port,
Null, null,
Internet_service_http, 0, 0 );
If (fhconnect = NULL) Throw (exception ("error: internetconnect "));
Doonstatustext ("OK: internetconnect ");
// 3. initialize the download request
Const char * faccepttypes = "*/*";
Fhrequest = httpopenrequest (fhconnect,
"Get", // get data from the server
Filename. c_str (), // name of the file to be read
"HTTP/1.1", // protocol used
Null,
& Faccepttypes,
Internet_flag_reload,
0 );
If (fhrequest = NULL) Throw (exception ("error: httpopenrequest "));
Doonstatustext ("OK: httpopenrequest ");
// 4. Send a download request
Httpsendrequest (fhrequest, null, 0, null, 0 );
Doonstatustext ("OK: httpsendrequest ");
} Catch (exception & exception)
{
Endhttpget (); // close the connection and release resources
Doonstatustext (exception. Message );
}
}
// Extract the Host Name and download file path from the URL
Void thttpgetthread: parseurl (ansistring & hostname, ansistring & filename)
{
Ansistring url = furl;
Int I = URL. pos ("http ://");
If (I> 0)
{
URL. Delete (1, 7 );
}
I = URL. pos ("/");
Hostname = URL. substring (1, I-1 );
Filename = URL. substring (I, URL. Length ());
}

As you can see, the program calls the internetopen, internetconnect, and httpopenrequest functions sequentially in the order shown in figure 1 to get three related handles, and then sends the download request to the Web server through the httpsendrequest function.

The first parameter of internetopen is irrelevant. If the last parameter is set to internet_flag_async, an asynchronous connection is established. This is very practical. Considering the complexity of this article, I did not use it. However, for readers who require higher download requirements, it is strongly recommended that the asynchronous method be used.

Httpopenrequest opens a request handle. The command is "get", which indicates File Download. The protocol used is "HTTP/1.1 ".

Note that the faccepttypes parameter of httpopenrequest indicates the file type that can be opened. Setting it to "*/*" indicates that all file types can be opened, you can change its value as needed.

2. read and analyze the information of the file to be downloaded

After sending a request, you can use the httpqueryinfo function to obtain information about the file, or obtain information about the server and related operations supported by the server. For downloading programs, the most common method is to pass the http_query_content_length parameter to get the file size, that is, the number of bytes contained in the file. The module is as follows:

// Obtain the size of the object to be downloaded
Int _ fastcall thttpgetthread: getwebfilesize (void)
{
Try
{
DWORD buflen = httpget_buffer_max;
DWORD dwindex = 0;
Bool retqueryinfo = httpqueryinfo (fhrequest,
Http_query_content_length,
Buffer, & buflen,
& Dwindex );
If (retqueryinfo = false) Throw (exception ("error: httpqueryinfo "));
Doonstatustext ("OK: httpqueryinfo ");
Int filesize = strtoint (buffer); // File Size
Doongetfilesize (filesize );
} Catch (exception & exception)
{
Doonstatustext (exception. Message );
}
Return filesize;
}

The doongetfilesize in the module is an event that generates the file size. After obtaining the file size, you can perform appropriate file chunks Based on the multi‑thread download program to determine the starting point and size of each file chunk.

3. File Download Module

Before starting the download, you should also arrange how to save the download results. There are many methods. I directly use the file function provided by C ++ builder to open a file handle. Of course, you can also use Windows APIs. For small files, you can also consider buffering all files into the memory.

// Open the output file to save the downloaded data
DWORD thttpgetthread: openoutfile (void)
{
Try
{
If (fileexists (foutfilename ))
Deletefile (foutfilename );
Ifilehandle = filecreate (foutfilename );
If (ifilehandle =-1) Throw (exception ("error: filecreate "));
Doonstatustext ("OK: createfile ");
} Catch (exception & exception)
{
Doonstatustext (exception. Message );
}
Return 0;
}
// Execute the download process
Void thttpgetthread: dohttpget (void)
{
DWORD dwcount = openoutfile ();
Try
{
// Issue the start download event
Doonstatustext ("startget: internetreadfile ");
// Read data
DWORD dwrequest; // number of bytes requested for download
DWORD dwread; // number of bytes actually read
Dwrequest = httpget_buffer_max;
While (true)
{
Application-> processmessages ();
Bool readreturn = internetreadfile (fhrequest,
(Lpvoid) buffer,
Dwrequest,
& Dwread );
If (! Readreturn) break;
If (dwread = 0) break;
// Save data
Buffer [dwread] = '';
Filewrite (ifilehandle, buffer, dwread );
Dwcount = dwcount + dwread;
// Issue a download process event
Doonprogress (dwcount );
}
Fsuccess = true;
} Catch (exception & exception)
{
Fsuccess = false;
Doonstatustext (exception. Message );
}
Fileclose (ifilehandle );
Doonstatustext ("end: internetreadfile ");
}

The download process is not complex. Just like reading local files, execute a simple loop. Of course, such convenient programming still benefits from Microsoft's encapsulation of network protocols.

4. Release occupied Resources

This process is very simple. You can call the internetclosehandle function in the reverse order of the generated handles.

Void thttpgetthread: endhttpget (void)
{
If (fconnected)
{
Doonstatustext ("closing: internetconnect ");
Try
{
Internetclosehandle (fhrequest );
Internetclosehandle (fhconnect );
Internetclosehandle (fhsession );
} Catch (...){}
Fhsession = NULL;
Fhconnect = NULL;
Fhrequest = NULL;
Fconnected = false;
Doonstatustext ("closed: internetconnect ");
}
}

I think it is a good programming habit to set the variable to null after the handle is released. In this example, if the download fails, you need to reuse the handle variables again when re-downloading.

5. Call of function modules

The call of these modules can be arranged in the execute method of the thread object, as shown below:

Void _ fastcall thttpgetthread: Execute ()
{
Frepeatcount = 5;
For (INT I = 0; I <frepeatcount; I ++)
{
Starthttpget ();
Getwebfilesize ();
Dohttpget ();
Endhttpget ();
If (fsuccess) break;
}
// Issue the download completion event
If (fsuccess) dooncomplete ();
Else doonerror ();
}

Here, a loop is executed, that is, if an error is generated and the download is automatically re-performed, the number of repetitions can be set as a parameter in actual programming.

Resumable upload

It is not very complicated to implement the resumable upload function on the basic download code. There are two main problems:

1. Check the local download information and determine the number of bytes that have been downloaded. Therefore, you should modify the function used to open the output file. We can create a secondary file to save the downloaded information, such as the number of bytes that have been downloaded. It is easy to process. First, check whether the output file exists. If yes, obtain the size of the file and use it as the downloaded part. Because Windows does not directly retrieve the File Size API, I wrote the getfilesize function to retrieve the file size. Note that the same code is omitted.

DWORD thttpgetthread: openoutfile (void)
{
......
If (fileexists (foutfilename ))
{
DWORD dwcount = getfilesize (foutfilename );
If (dwcount> 0)
{
Ifilehandle = fileopen (foutfilename, fmopenwrite );
FileSeek (ifilehandle,); // move the file pointer to the end
If (ifilehandle =-1) Throw (exception ("error: filecreate "));
Doonstatustext ("OK: openfile ");
Return dwcount;
}
Deletefile (foutfilename );
}
......
}

2. Adjust the file pointer on the Web before starting to download the file (that is, execute the internetreadfile function. This requires the Web server to support random File Read operations. Some servers impose restrictions on this, so this possibility should be determined. The changes to the dohttpget module are as follows, and the same code is also omitted:

Void thttpgetthread: dohttpget (void)
{
DWORD dwcount = openoutfile ();
If (dwcount> 0) // adjust the file pointer
{
Dwstart = dwstart + dwcount;
If (! Setfilepointer () // the server does not support operations
{
// Clear the output file
FileSeek (ifilehandle,); // move the file pointer to the header
}
}
......
}

Multi-thread download

To implement multi-threaded download, the main problem is the creation and management of the download thread. After the download is completed, all parts of the file are merged accurately. At the same time, the download thread must make necessary modifications.

1. Download thread Modification

To adapt to multithreading, I add the following member variables to the download thread:

Int findex; // index in the thread Array
DWORD dwstart; // download start location
DWORD dwtotal; // number of bytes to be downloaded
DWORD fgetbytes; // The total number of downloaded bytes.

Add the following property values:

_ Property ansistring url = {READ = furl, write = furl };
_ Property ansistring outfilename = {READ = foutfilename, write = foutfilename };
_ Property bool successed = {READ = fsuccess };
_ Property int Index = {READ = findex, write = findex };
_ Property DWORD startpostion = {READ = dwstart, write = dwstart };
_ Property DWORD getbytes = {READ = dwtotal, write = dwtotal };
_ Property tonhttpcompelete oncomplete = {READ = foncomplete, write = foncomplete };

In addition, add the following processing in dohttpget during the download process,

Void thttpgetthread: dohttpget (void)
{
......
Try
{
......
While (true)
{
Application-> processmessages ();
// Modify the number of bytes to be downloaded so that dwrequest + dwcount <dwtotal;
If (dwtotal> 0) // dwtotal = 0 indicates that the download ends at the end of the file.
{
If (dwrequest + dwcount> dwtotal)
Dwrequest = dwtotal-dwcount;
}
......
If (dwtotal> 0) // dwtotal <= 0 indicates that the download ends at the end of the file.
{
If (dwcount> = dwtotal) break;
}
}
}
......
If (dwcount = dwtotal) fsuccess = true;
}

2. Create a multi-threaded download component

I first created a tcomponent-based component module named thttpgetex and added the following member variables:

// Internal variable
Thttpgetthread ** httpthreads; // Save the created thread
Ansistring * outtmpfiles; // Save the temporary files of each part of the result File
Bool * fsuccesss; // Save the download result of each thread
// The following are attribute variables
Int fhttpthreadcount; // Number of threads used
Ansistring furl;
Ansistring foutfilename;

Each variable is used as a code comment. fsuccess plays a special role in it. It will be explained in detail below. Because the running of the thread is irreversible, and the component may download different files continuously, the download thread can only be dynamically created and destroyed immediately after use. The thread creation module is as follows. The getsystemtemp function obtains the temporary folder of the system. onthreadcomplete is the event after the thread download is complete. Its code is described below:

// Allocate resources
Void thttpgetex: assignresource (void)
{
Fsuccesss = new bool [fhttpthreadcount];
For (INT I = 0; I <fhttpthreadcount; I ++)
Fsuccesss [I] = false;
Outtmpfiles = new ansistring [fhttpthreadcount];
Ansistring shortname = extractfilename (foutfilename );
Ansistring Path = getsystemtemp ();
For (INT I = 0; I <fhttpthreadcount; I ++)
Outtmpfiles [I] = path + shortname + "-" + inttostr (I) + ". HPT ";
Httpthreads = new thttpgetthread * [fhttpthreadcount];
}
// Create a download thread
Thttpgetthread * thttpgetex: createhttpthread (void)
{
Thttpgetthread * httpthread = new thttpgetthread (this );
Httpthread-> url = furl;
...... // Initialization event
Httpthread-> oncomplete = onthreadcomplete; // thread download completion event
Return httpthread;
}
// Create an array of download threads
Void thttpgetex: createhttpthreads (void)
{
Assignresource ();
// Obtain the file size to determine the start position of each thread to download
Thttpgetthread * httpthread = createhttpthread ();
Httpthreads [FHttpThreadCount-1] = httpthread;
Int filesize = httpthread-> getwebfilesize ();
// Divide the file into fhttpthreadcount Blocks
Int avgsize = filesize/fhttpthreadcount;
Int * starts = new int [fhttpthreadcount];
Int * bytes = new int [fhttpthreadcount];
For (INT I = 0; I <fhttpthreadcount; I ++)
{
Starts [I] = I * avgsize;
Bytes [I] = avgsize;
}
// Modify the size of the last part
Bytes [FHttpThreadCount-1] = avgsize + (filesize-avgsize * fhttpthreadcount );
// Check whether the server supports resumable Data Transfer
Httpthread-> startpostion = starts [FHttpThreadCount-1];
Httpthread-> getbytes = bytes [FHttpThreadCount-1];
Bool canmulti = httpthread-> setfilepointer ();
If (canmulti = false) // not supported, direct download
{
Fhttpthreadcount = 1;
Httpthread-> startpostion = 0;
Httpthread-> getbytes = filesize;
Httpthread-> Index = 0;
Httpthread-> outfilename = outtmpfiles [0];
} Else
{
Httpthread-> outfilename = outtmpfiles [FHttpThreadCount-1];
Httpthread-> Index = FHttpThreadCount-1;
// Supports resumable upload and multiple threads
For (INT I = 0; I <FHttpThreadCount-1; I ++)
{
Httpthread = createhttpthread ();
Httpthread-> startpostion = starts [I];
Httpthread-> getbytes = bytes [I];
Httpthread-> outfilename = outtmpfiles [I];
Httpthread-> Index = I;
Httpthreads [I] = httpthread;
}
}
// Delete temporary variables
Delete starts;
Delete bytes;
}

The function for downloading an object is as follows:

Void _ fastcall thttpgetex: downloadfile (void)
{
Createhttpthreads ();
Thttpgetthread * httpthread;
For (INT I = 0; I <fhttpthreadcount; I ++)
{
Httpthread = httpthreads [I];
Httpthread-> resume ();
}
}

After the download is completed, the onthreadcomplete event is triggered to determine whether all download threads have been completed. If yes, all parts of the file are merged. It should be noted that there is a thread synchronization problem here, otherwise several threads will conflict with each other when generating this event at the same time, and the results will be chaotic. There are many Synchronization Methods. My method is to create a thread mutex object.

Const char * mutextothread = "http-get-thread-mutex ";
Void _ fastcall thttpgetex: onthreadcomplete (tobject * sender, int index)
{
// Create a mutex object
Handle hmutex = createmutex (null, false, mutextothread );
DWORD err = getlasterror ();
If (ERR = error_already_exists) // already exists, wait
{
Waitforsingleobject (hmutex, infinite); // 8000l );
Hmutex = createmutex (null, false, mutextothread );
}
// When a thread ends, check whether all is considered complete
Fsuccesss [Index] = true;
Bool S = true;
For (INT I = 0; I <fhttpthreadcount; I ++)
{
S = S & fsuccesss [I];
}
Releasemutex (hmutex );
If (s) // the download is complete and all parts of the file are merged.
{
// 1. Copy the first part
Copyfile (outtmpfiles [0]. c_str (), foutfilename. c_str (), false );
// Add other parts
Int Hd = fileopen (foutfilename, fmopenwrite );
FileSeek (HD,); // move the file pointer to the end
If (Hd =-1)
{
Doonerror ();
Return;
}
Const int bufsize = 1024*4;
Char Buf [bufsize + 4];
Int reads;
For (INT I = 1; I <fhttpthreadcount; I ++)
{
Int HS = fileopen (outtmpfiles [I], fmopenread );
// Copy data
Reads = fileread (HS, (void *) BUF, bufsize );
While (reads> 0)
{
Filewrite (HD, (void *) BUF, reads );
Reads = fileread (HS, (void *) BUF, bufsize );
}
Fileclose (HS );
}
Fileclose (HD );
}
}

Conclusion

At this point, the key part of multi-thread download is complete. However, in actual applications, there are still many factors to consider, such as network speed and disconnection. Of course, there are still some details to consider, but it is difficult to understand them one by one due to space limitations. If readers can refer to this article to write a satisfactory download program, I am very pleased. I also hope that readers can learn from each other and make progress together.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.