Implement multi‑thread resumable data transfer through FTP in VC-random File

Source: Internet
Author: User
Tags ftp commands ftp site printable characters ftp protocol
Http://www.lihuasoft.net/article/show.php? Id = 2887

I will not talk about the benefits of FTP download here. Many projects will use FTP download as an important function. The wininet class provided by Microsoft can use the following functions:

Internetopen;
Internetconnect;
Getcurrentdirectory;
Setcurrentdirectory;
Ftpgetfile;

It is easy to download FTP.ArticleThere are also many. However, to implement multi-threaded FTP download, using these functions is far-fetched. Using socket based on the FTP protocol will be quite flexible. Next I will explain the entire development process step by step: the development environment BCB (component mode), please make slight changes in the VC environment. After reading this article, BCB developers can not only have a certain understanding of the development principles of flash get and other software, but also have a great guiding role in developing components, please read it patiently. Very easy !!

First, we will introduce some FTP protocols:

 

 

Figure 1 FTP service

To transfer files between FTP and Server FTP, there must be two connections: The Command Channel and the data connection. From the name, we can see that the Command Channel transmits commands, the data channel is used to transfer files. Data transfer between the server and the server is not explained here.

The main Commands used are user, pass, type, size, rest, CWD, PWD, RETR, PASV, port, and quit;

User: The parameter indicates the user's telnet string. User tags are required to access the server. This command is usually the first command issued after the connection is controlled. Some hosts also require passwords and accounts. The server can receive new user commands at any time to change access control and (or) account information. This allows you to re-start the logon process, so the transmission parameters remain unchanged. The ongoing file transmission is completed under the previous access control parameters.
Pass: the parameter is a telnet string that marks the user password. This command follows the USER command, which is an indispensable step for access control on some sites. Therefore, passwords are important and cannot be displayed. The server cannot hide the passwords. Therefore, this task must be completed by the user's FTP process.

Type: The parameter indicates the type. For some types, the second parameter is required. The first parameter is defined by a single Telnet character. The second parameter is a decimal integer that specifies the byte size. The parameters are separated by <SP>. The format is as follows:

Figure 2 type parameters

The default type is ASCII non-printable characters. If the parameter is not changed and only the first parameter is changed later, the default value is used.

Size: return the size of the specified file from the FTP server.

Rest: The parameter field indicates the point where the server wants to start again. This command does not transfer files, but skips the data after the specified point. This command should be followed by other FTP commands that require file transmission.

CWD: This command allows users to work in different directories or datasets without changing their login or account information. The transmission parameters remain unchanged. A parameter is generally a directory name or a collection of system-related files.

PWD: change the current working directory.

RETR: Start to transfer the specified file. (Transfer from the offset specified by the rest parameter)

PASV: This Command requires the server DTP to listen on the specified data port and enter the passive receiving Request status. The parameter is the host and port address.

Port: The data connection port to be used. Generally, no command response is required. If you use this command, you need to send a 32-bit IP address and a 16-bit TCP port number. The above information is transmitted by commas (,) in decimal format.

Quit: log out.
The specific usage of each parameter is as follows:

User Sandy \ r \ N // Log On with the username Sandy
Pass Sandy \ r \ N // The password is Sandy
Type I \ r \ n
Size sandy.txt \ r \ N // If the sandy.txt file exists, the size of the file is returned.
Rest 100 \ r \ N // specify the file transfer offset again
CWD infor/\ r \ N // obtain the current working directory
PWD temp/\ r \ N // change the current working directory
RETR \ r \ N // start File Transfer
PASV \ r \ N // enters Passive Mode
Port H1, H2, H3, H4, P1, P2 \ r \ N // enter the active mode, where H1, H2, H3, and H4 are IP addresses. P1 and P2 are hexadecimal port numbers.

The following describes the usage sequence of each function and some precautions:

The prerequisite for using these commands is that the client and the server have established a connection. For example, the FTP server address is 192.168.1.81 and the port is 21. Use the Winsock API function to establish a socket connection, and then use user and pass to log on to the FTP server. You need to download the file. To ensure that the file must be in the current working directory, you can use the command CWD and PWD. View and change the current working directory. Use the size command to obtain the file size. To download multiple threads, the server must support this function. Generally, we use the REST command at the beginning to determine whether the FTP site supports multi-thread download. The port and PASV commands are used to establish data connections. Their main difference is: port requires you to specify an IP address and port to establish a connection with the server. The PASV command server returns H1, H2, H3, H4, P1, and P2 data for client connection. After the data connection is established, you can use rest and retr for multi-thread and resumable download.

The above describes some basic FTP download knowledge. The following describes how to save resumable upload files.

There are at least 10 methods to store resumable upload files, but each method has advantages and disadvantages, the following describes a file storage method that I often use at work: for example, to download a 364544-byte file, the file name is namelock. avi. Because resumable upload is required, the file size, the size of the downloaded file, and the tasks of each thread must be saved during the download process.

There are two methods:

1. Two files can be generated: The content file and the configuration file.

2. Only one file is required: load the configuration file data to the end of the Content File.

Both of them are good methods. I am using the previous one because of my limited level (access to critical resources is always unable to achieve mutual consistency, and there is always a problem .). The suffix here is a symbolic thing. Taking our company as an example, we have our own MPEG encoding and decoding technology. For example, an MP3 song of 5 MB can be converted to about KB by encoding. fun file (the first three words of funinhand ). We can use our own decoding player to download and play while decoding. The sound quality is comparable to that of MP3. It truly realizes the Streaming Media Technology on mobile phones. Trusted by high-tech companies at home and abroad. (Sorry, it seems like advertising .) Another attempt is as follows:

The suffix used in the content file is the first three letters of my girlfriend's English name (namelock). Nam. The configuration file uses the first three letters of my English name (sandy). San. So writeProgramIt can also be romantic, because my girlfriend has added a few yuan to my monthly pocket money, haha (you can also follow suit ). To put it bluntly, these two files are strictly temporary files. When the files are downloaded, The namelock. Avi. Nam content file should be renamed as namelock. Avi. The namelock. Avi. SAN configuration file should also be deleted in a timely manner.

Multi-thread FTP download technology: I have introduced the file storage skills, mainly for the multi-thread service. Now there is a namelock. AVI file to be downloaded. The file size is 364544 bytes. Use eight download threads. Step 1: divide the namelock. AVI file into eight sub-modules. Here, we should note that what I said is to divide it into eight character modules, instead of storing the file content in eight different buffers respectively. Instead, eight different file offsets are generated. In many cases, programmers tend to read files into the memory at a time to be lazy. The consequence is unimaginable. This is an ideal method.
Bool dealfile (string filename) // you can specify a function.
{
File * file;
DWORD filesize, Pos;
Int readlen;

// Max_buffer_len is defined in the header file, which ensures that data is not lost and the memory does not escape.
Char * buffer = new char [max_buffer_len];
File = fopen (filename. c_str (), "R + B ");
If (file = NULL) return false;
Fseek (file, 0, 2 );
Filesize = ftell (File); // get the file size
Fseek (file, 0, 0 );
Do {
Readlen = fread (buffer, sizeof (char), max_buffer_len, file );
If (readlen> 0)
{
Pos + = readlen;
// Process the Read File
}
} While (Pos <filesize); // read objects cyclically
Delete [] buffer;
Fclose (File); // release resources
Return true;

}

When downloading files from eight threads, you must read and write the content files and configuration files. Otherwise, the file access may fail. I defined a global variable filelocked. If filelocked = true, the file is being accessed by a thread. So sleep (10) is used for sleep. When a thread enters the file reading and writing, it must set filelocked = true; When the file is accessed, it must set filelocked = false; in this way, the access to the file by each thread can be well controlled. (API provides many good solutions for access to critical resources ).

When the 8 download threads download files at the same time, the complete Part of the download is random. So how can we correctly write random file data into the file according to the offset? In this way, when downloading the file namelock. Avi, first check whether the file namelock. Avi. SAN configuration file exists. If yes, it indicates that some of the files have been downloaded last time and can be resumed. If the file is not found, a file of the same size as the file is generated. All data in the file is 0. (You can use the memset function (buffer, 10000, ''0'') and a configuration file. Then, use the fseek function to correctly overwrite the original 0 data. Next, we will introduce the format of the configuration file.

The configuration file contains the absolute path of the file stored locally, the file size, the number of threads, and the size of the downloaded file, tasks of each thread (separate the start position and end position of the original file with '-' in the middle); for example:

D: \ mm \ namelock. Avi // save the file here
364544 // File Size
5 // five threads are being downloaded
0 // 0 bytes have been downloaded
0-72908 // download task of thread 1
72908-145816 // download task of thread 2
145816-218724 // download task of thread 3
218724-291632 // download task of thread 4
291632-364544 // download task of thread 5

The preceding figure shows the task allocation of each thread when the download starts.

 

D: \ mm \ namelock. Avi
364544
5
113868
72908-72908
113868-145816
145816-218724
218724-291632
291632-364544

The preceding figure shows the task allocation of each thread at a specific time point.

The task allocation of each thread is implemented in this way. When downloading starts, the file is evenly divided into several parts for download. For example, the task starting from the first thread starts from the 0 position of the file to the 72908 position. Thread 1 needs to adjust the task after each piece of data is downloaded. For example, if 20800 bytes of data is downloaded for the first time, the task of thread 1 will be changed to 20800-72908. Until the task is 72908-72908, it indicates that thread 1 has completed the current download task. In this case, thread 1 analyzes the tasks of each thread and finds the thread with the busiest task, for example, thread 3: 14816-218724. Then thread 1 automatically adjusts the task and downloads the 50% task again. Until all threads complete the task. Note: To avoid repeated downloading of part of the data, you must add the number of bytes to receive the buffer when adjusting the task, as shown in the preceding column. When the load is balanced between thread 1 and thread 3, the thread is downloading data. If the remaining data is smaller than the buffer size, the data downloaded from thread 1 and thread 3 will be repeated.

A problem is found during Task adjustment and analysis. That is, reading file data too frequently. So I used a data structure. The configuration file is always opened during the file download process, which improves the speed a lot. Close the file after the file is downloaded. The data structure is as follows:
typedef struct fromtoimpl {
DWORD from; // task start position
DWORD to; // task end position
}m_fromto;
typedef struct infroimpl {
string fileload; // file storage location
DWORD filesize; // file size
int threadcnt; // Number of download threads
DWORD alreadydownloadcnt; // the size of the downloaded file
fromtoimpl * fromtoimpl; // task description of each thread
}m_inforimpl;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.