FTP Multi-Threaded download

Source: Internet
Author: User
Tags ftp site ftp protocol

The benefits of FTP downloads I'm not going to say much here, but many projects will be implemented as an important feature of FTP downloads.   The WinInet classes offered by Microsoft can take advantage of the following functions: InternetOpen; internetconnect; getcurrentdirectory; setcurrentdirectory; ftpgetfile; Easy to achieve FTP download, online on this aspect of the article is also a lot. However, in order to implement the FTP multithread download, the use of these functions is a bit far-fetched. Using the socket based on the FTP protocol to develop will become very flexible. Below I will gradually explain the whole development process: development environment BCB (component mode), VC environment, please make a little change. After reading this article for BCB developers, not only can FlashGet and other software development principles have a certain understanding, especially in the development of components also has a great role in guiding, please read it patiently.   Very simple.. First, introduce some FTP protocols:

Figure One FTP service schematic

User FTP and server FTP to transfer files, need to have two connections: command channel and data connection, from the name can be seen that the command channel is the transmission command, the data channel is used to transfer files. The data transfer between server and server is not explained here much.   The main commands used are: user,pass,type,size,rest,cwd,pwd,retr,pasv,port,quit; User: parameter is a telnet string that marks the user. User tags are required to access the server, which is usually the first command to be issued after the connection is controlled, and some hosts will also require passwords and accounts. The server can receive new user commands at any time to change access control and/or account information. This can restart the login process, so the transmission parameters are unchanged and the file transfer in progress is completed under the previous access control parameters. Pass: The parameter is a telnet string that marks the user's password. This command immediately follows the user command, and at some sites it is an integral step in completing access control.   So the password is an important thing, so it cannot be displayed, the server side has no way to hide the password, so this task has to be completed by the user FTP process. Type: parameter specifies the presentation type. Some types require a second argument, the first parameter is defined by a single Telnet character, the second argument is a decimal integer that specifies the byte size, and the arguments are delimited. The following are the formats:

Figure two type parameter schematic diagram

The default representation type is ASCII nonprinting characters, and if the parameter is unchanged and only the first argument is changed later, the default value is used.   Size: Parameter returns the specified file sizes from the FTP server.   REST: The parameter field represents the point at which the server wants to restart, not the file, but the data after the specified point, which should be followed by an FTP command that requires file transfer. CWD: This command allows a user to work in a different directory or dataset without changing its login or account information. The transfer parameters are unchanged.   parameter is typically a directory name or a system-related collection of files.   PWD: Change the current working directory. RETR: Starts routing the specified file.   (starting at the offset specified by the rest parameter) PASV: This command requires the server DTP to listen on the specified data port and enter the status of the passive receive request, which is the host and port address. PORT: The parameter is the data connection port to use, and typically does not require a command response. If you use this command, you send a 32-bit IP address and a 16-bit TCP port number.   The information above is set in 8-bit order, comma-separated decimal. QUIT: Exit login.

Examples of the specific uses of each parameter are as follows:

User sandy/r/n//username is Sandy Login pass sandy/r/n//password is Sandy TYPE i/r/n size sandy.txt/r/n//If Sandy.txt file exists, then return the file size REST 100/r/n//re-assign File transfer offset CWD infor//r/n//Get current working directory PWD temp///change Current working directory/r/n//start transferring file retr/r/n//Enter passive mode PORT h 1,h2,h3,h4,p1,p2/r/n//Enter active mode, H1,H2,H3,H4 for 4 parts of IP address. P1,P2 is a 16-in port number.

  Here's how to use the various functions and what you should note: The prerequisite for using these commands is that the client and server side establish a connection. For example, FTP server address: 192.168.1.81, Port: 21. Then use the Winsock API function to establish the socket connection, and then use User,pass to login to the FTP server. You need to download the file, make sure the file must be in the current working directory, and use the command CWD and PWD. View and change the current working directory. Get the size of the file using the size command. We want multithreaded downloads, so we require the server to support this feature. Generally we will start with the rest command to determine whether the FTP site supports multithreaded downloads. Port and PASV Two commands are used to establish data connections. Their main difference is that port requires you to specify an IP address and port to establish a connection with the server. The PASV command server returns H1,H2,H3,H4,P1,P2-style data for client connections.   After the data connection is established, it is possible to use REST,RETR for multithreading and breakpoint renewal file download.   The above explains a little ftp download the basic knowledge, the following is mainly about the continuation of the breakpoint file preservation techniques. If you want to talk about the continuation of the file save way to say at least 10, but various methods have pros and cons, the following mainly describes a kind of file I often use in the work of the way: for example, to download a 364544-byte file, the file name is: Namelock.avi.   Because you want the breakpoint to be transmitted, you must save the file size, the size of the downloaded file, and the tasks for each thread during the download process.   There are two ways: one, you can produce two files: content files and configuration files.   Second, just one file: Load the data of the configuration file to the end of the content file. Both of these are good ways. I use the previous one because my level is limited, (access to critical resources is always not bache, old problems.) )。 The suffix name here wants to be in the heart of it, and the suffix is a symbolic thing. Take our company, has its own MPEG coding, decoding technology, such as the original 5m MP3 song, through the encoding can be converted to 500K or so of the. Fun file (Funinhand the first three words). And then use our own decoding player to download side of the decoding side play, the sound quality and MP3 is comparable. The real realization of the mobile phone streaming media technology. By the domestic and foreign high-tech large companies trust. (Sorry, it's kind of like advertising.) Another attempt to speak of these is as follows: The suffix name used in the content file is the first three letters of my girlfriend's English name (Namelock). Nam. The configuration file uses the first three letters of my own English name (Sandy). San. So write program can also be very romantic, because of this, girlfriend gave my monthly life allowance increased a few yuan, haha (we can also follow). Back to the point, these two documents strictly speaking is a temporary file, when the file download finished, Namelock.avi.nam content file should be renamed: Namelock.avi. Namelock.avi.san configuration files should also be deleted in a timely manner.

FTP Multithreading Download Technology Section: I introduced the file preservation techniques, mainly for multithreaded services. Now there is a Namelock.avi file to download. The file size is: 364544 bytes. To use 8 download threads. Step one: Divide the Namelock.avi file into 8 sub modules. The place to note here is what I call the 8-word module, not the contents of the file stored in 8 separate buffers. Instead, 8 different file offsets are generated. Most of the time programmers to lazy often easy to read the file into memory, the consequences of this is unimaginable. A more ideal approach is this. BOOL Dealfile (string fileName)//casually write a function description {FILE *file; DWORD fileSize, POS; int Readlen; Max_buffer_len is defined in the header file to ensure that the data is not lost, and that memory escapes char *buffer = new Char[max_buffer_len];   File = fopen (Filename.c_str (), "r+b"); if (file = = NULL) return false; Fseek (file,0,2); FileSize = ftell (file); Obtain the size of the file fseek (file,0,0); do{Readlen = fread (buffer,sizeof (char), max_buffer_len,file); if (Readlen > 0) {pos + = Readlen;//processing of Read files}}while (PO s < fileSize); Loop read file delete[] buffer; fclose (file); Release the resource return true; }

8 threads When downloading files, read and write content files and configuration files. So if not handled well, it is likely to cause access to the file failure, I defined a global variable filelocked if the filelocked=true description file is being accessed by a thread. So sleep waiting is used (10). When a thread enters a read-write file, it must set filelocked = true; The file must be filelocked = False when it is accessed, so that each thread's access to the file is well controlled.   (Access to critical resources has an API that provides a number of good solutions, please refer to). When 8 download threads download the file simultaneously, the complete part of the download is random. So how to put the random file data in accordance with the correct offset to write to the file. This is how I realized that when I wanted to download the file Namelock.avi, I first looked for the file Namelock.avi.san configuration file. If it exists, the last time you have downloaded some of the file, you can continue to pass the breakpoint. If the file is not found, then a file of the same size as the file is generated, and all the data in the file is 0 (you can use the function memset (buffer,10000, ' ' 0 ')) and a configuration file. Then use the Fseek function to correct the data to cover the original 0, and then introduce the format of the configuration file.

Very simple, the contents of the configuration file mainly include: the absolute path of the file to be saved locally, the size of the file, the number of threads, the size of the file already downloaded, the task of each thread (at the beginning and end of the original file, the middle use '-'); D:/mm/namelock.avi// File saved here 364544//File size 5//5 threads in download 0//downloaded 0 bytes 0-72908//thread 1 Download task 72908-145816/thread 2 download Task 145816-218724 -291632//Thread 4 Download task 291632-364544//thread 5 download task

These are the task assignments for each thread at the start of the download. D:/mm/namelock.avi 364544 5 113868 72908-72908 113868-145816 145816-218724 218724-291632 291632-364544

These are the task assignments for each thread at a particular point in time. This is achieved for each thread task assignment. At the start of the download, the file is divided into chunks for downloading. The first thread starts with the task of downloading from the file's 0-bit to the 72,908-bit location. Thread 1 after each download piece of data to adjust the task, such as the first download of 20800 bytes of data, then thread 1 task will be changed to: 20800-72908. This goes on until the task is 72908-72908 to indicate that thread 1 completes the current download task. At this point, thread 1 analyzes the tasks of each thread and finds one of the most busy threads: Thread 3:14816-218724. Then thread 1 will automatically adjust the task, take 50% of the task to download again. Cycle until each thread completes the task. But here's one thing to note: In order to avoid downloading part of the data repeatedly, when you adjust the task, the starting file will be shifted to the number of bytes that receive the buffer, as shown in the example above.   Thread 1 and thread 3 while the load is being balanced, the thread is downloading the data, and if the remaining data is smaller than the size of the buffer being received, part of the download data for thread 1 and thread 3 will be duplicated. When you adjust tasks and analyze tasks, you will find a problem. is to read the file data too frequently. So I used a data structure. The configuration file is always open during the download of the file, so the speed is much higher. Closes the file after the file has been downloaded. Data structure is as follows: typedef struct fromtoimpl{DWORD from;//task start position DWORD to;//task end position}m_fromto; typedef struct infroimpl{String fileload//File save location DWORD FileSize;//file size int threadcnt;//Download thread number DWORD alreadydownloadcnt ; The file size that has been downloaded Fromtoimpl *fromtoimpl; Task description}m_inforimpl for each thread;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.