Design and Implementation of multi-thread FTP download

Source: Internet
Author: User
Tags ftp commands ftp file glob ftp client ftp protocol
Design and Implementation of multi-thread FTP download

 

                                  
                               National Defense University of Science and Technology Microsoft Club
 

FTP protocol Introduction

File Transfer Protocol (FTP) is a File Transfer Protocol used to manage file transfer between computers. FTP usually refers to the file transfer service.

FTP is a widely used communication protocol on the Internet. It is a collection of rules that support Internet File Transfer. These rules allow Internet users to copy files from one host to another, therefore, it provides users with great convenience and benefits. Like other Internet services, FTP also adopts the Client/Server mode. The method is simple. Start the FTP client program to establish a connection with the remote host, and then send a transmission command to the remote host. The remote host responds after receiving the command and runs the correct command. FTP has a fundamental restriction, that is, if a user is not authorized by an FTP host, the user cannot access the host. In fact, the user cannot log on remotely to the host. That is to say, if a user does not register on a host for authorization and has no user name or password, the user cannot transfer files with the host. Anonymous FTP (anonymous FTP) removes this restriction. The FTP protocol has nothing to do with the operating system. Any program on the operating system can transmit data to each other as long as it complies with the FTP protocol.

The FTP application is located at the application layer in the TCP/IP network architecture, and the FTP protocol used is located at the TCP layer in the network protocol stack. It belongs to an application layer protocol,

FTP application

The FTP application adopts the C/S architecture. The client program implements a command line or graphical interface, translates USER commands into FTP commands, and sends them to the server program. The server program responds to the FTP command and returns the operation success or failure information to the client in the form of FTP Response. Both parties comply with the FTP protocol to complete the file transfer service.

Common client programs include CuteFTP and flashget software with graphical interfaces, and FTP commands on the command line interface (Windows ). Many server programs are mainly used, such as Serv-u ftp server and vsftpd.

FTP Common commands

Common FTP commands are shown in table 1:

Command

Description

User

User Name

Pass

Password

SEST

Specified file offset

RETR

Get File

Stor

Upload files

List

Obtains the list of files in the current directory.

Type

Sets the data format for transmission (binary format, Character Count)

Port

Require the server to actively connect to the client listening port

PASV

The client must listen on the port and accept the client connection request.

CWD

Change the current directory of the server

Quit

Exit

FTP Response

The FTP protocol specifies that the FTP Response starts with three digits followed by the specific response information. The meaning of the first digit in the FTP response is as follows:

• 1: Be sure to prepare a response to start data transmission.

• 2: The response must be completed successfully.

• 3: The intermediary must respond and wait for subsequent commands

• 4: operation failed, but you can try again later

• 5: operation failed and do not try again

For example, if you send the "size test.txt" command to an FTP server that has established a connection, if the test.txt file exists in the current directory, the server may return "213 122220133, when the operation is completed successfully, and the test.txt file size is 122220 bytes.

FTP file download process

There are two intermediate connections between the FTP server and the client: one is used to transmit FTP commands (commands must be actively initiated by the client), and the connection always exists; the other is used to transmit data to the client, A data connection is established whenever the file or directory file list information is to be transmitted, and the data is immediately disconnected after the data transmission is completed.

There are two ways to establish a data connection: Port and PASV. In Port mode, the client listens to a port and the server initiates a data connection. In PASV mode, the server listens to a port and the client initiates a data connection. The client can separately send the port or PASV command to the server to specify the connection mode.

Before downloading an object, you must first log on to the object. The logon status is shown in Figure 4. The box indicates the status, and the arrow indicates the status change. The characters on the arrow indicate the commands sent by the client to the server or the information returned by the server. B Indicates the start status, W indicates the Wait Status, e indicates an error, and s indicates that the logon is successful.

After the login is successful, set the data transmission format (character format or binary mode) for the sending type, and select the data transmission mode for the sending port or PASV. A successful response or connection to the server is established based on the returned information. If all the commands sent above receive a successful response, the following requests are sent to the server to prepare data. First, send the REST command (if necessary), specify the offset for Starting download, and then send the REST command to specify the file to be downloaded.

If all the commands sent above receive a successful response, the server data is ready. What needs to be done below is to establish a data connection with the service, start to accept the data, and save the received data in a local file until the data connection is closed and the download is complete.

The FTP protocol does not provide multi-thread download commands, but it is possible to implement multi-thread download. Because the REST command can specify the offset, we can create multiple threads to download from different parts at the same time, and then reorganize the downloaded parts to implement multi-thread download. In the fourth section, we will introduce in detail.

The main task is to add multi-thread download commands Based on the provided FTP client source code (netkit-FTP. Therefore, you must first understand the source code structure before implementing the multi-threaded download function.

The source code is implemented by standard C and can be downloaded from the ftp://ftp.uk.linux.org/pub/linux/Networking/netkit. Its annotations are detailed, clear, and readable. The interface logic and business logic are processed separately in the program.

The source code contains 11 files: Main. c. glob. c. glob. h. cmds. c. cmds. h. Release tab. c. FTP. c, ftp_var.h, domacro. c. pathnames. c. ruserpass. c.

Glob. C, glob. H, ftp_var.h, tabs. C, domacro. C, pathnames. C, and ruserpass. c mainly define macro variables and global functions. The program tab. c defines all the commands supported by the program and their corresponding functions. Main. c implements the main loop of the program, and waits until you find that you have entered a command, and then pass the command to the command line parameter processing module implemented by cmds. C and cmds. h. The command line parameter Processing Module checks the legitimacy of parameters. After obtaining valid parameters, it sends user requests to the USER command processing module implemented by FTP. C and finally executes user requests. (During execution, most of them need to be connected to the FTP server ).

The idea of multi-threaded download is to create multiple threads, connect to the server at the same time, download files from different locations, and then merge the received data into the same file.

Problems to be Solved

The problems to be solved and corresponding solutions are as follows:

• How do threads share data? The biggest advantage of a thread is that it can easily share the data of a husband thread. Therefore, each thread can share data through global variables.

• How do threads maintain mutual exclusion? Mutual access to shared variables can be achieved through thread locks. Lock before access and unlock after access.

• How to obtain the file size? You can obtain the file size by sending the "size" command to the server.

• How to evenly distribute files to various processes? First, divide the file into equal sizes.

• How do processes download data from a specified location? You can specify the offset to start downloading by sending the "rest" command to the server.

• How to merge different parts of data into the same file? Different threads can open the same local file and move the file pointer to save different parts of the file.

Added global variables and functions

The global variables added based on the original program mainly include:

Loginuser: record the Login User Name

Password: record the password used by the Login User

Downloadfile record the file name to be downloaded

Name of the file saved by the localfile record

Filesize record file size

Macro maxthreadnum defines the maximum number of threads allowed

Macro blocksize defines the basic unit of data downloading by a thread, that is, the block size of a file. thread lock is used to access Shared variables through mutual exclusion of threads in a straight line.

The threaddata array variable of the custom thread_data structure records the data of each thread.

The thread_data structure is defined

Struct thread_data {

Pthread_t ID;

/* The ID of the thread */

Int number;

/* The number of the thread */

Int hasbegin;

/* Indicate the thread has started or not */

Int FD;

/* The file Description */

Long start;

/* The offset of its part */

Long end;

/* The End Of Its thread */

Int errornum;

/* Record the thread's error number */

};

The added functions mainly include:

Void mtget (INT argc, char * argv []) mainly checks the validity of parameters.

Mtrecvfile (const int threadnum, char * local, char * remote) for multi-thread download.

Long threadfunction (char * argv) is a function run by a thread to download a part of a file.

Int connectftp (char * Host, int iport), connect to the server

Int sendcommand (INT sock, char * internal STR), which sends a command to the server

Steps

• Add the global variables mentioned above in ftp_var.h.

• In the release tab. in C: add the variable const char mtreceivehelp [] = "receive file byseveral Threads"; add a row in the initialization of the global variable "expose": {"mtget", mtreceivehelp, 1, 1, 1, mtget, null, null}

• Define the mtget function in cmds. C to check the validity of parameters.

• Define the mtrecvfile function in FTP. C to implement multi-threaded download. Main tasks include initializing variables that record the state of each thread, checking whether the local file has the permission to be created, and obtaining the size of the downloaded file, the task is reasonably allocated to each thread (the allocation interval is recorded in the threaddata variable), and a certain number of threads are established to complete the download task.

• Add the threadfunction function in FTP. C and add the code to complete the download. Task Allocation Algorithm. If the file size is 18kb, the number of threads is 5, and blocksize is 2048, the allocation process is as follows: first, the file is divided into nine basic units by block size, and then the five units are evenly allocated to five threads, that is, each thread completes two basic units (4 kb, except the last thread ). The download process for threads labeled as I is: first loop wait until threaddata [I-1]. Start is set to, and then continue. This ensures that threads are started in order. Download task. Check whether threaddata [I]. Start is 1. If it is 1, exit, indicating that other threads are finishing the tasks of this thread; otherwise, continue. Access the global variables to obtain the FTP server IP address and port number, as well as the login username and password, and log on to the server. Send command type, set the data transmission mode to binary mode, select passive mode, obtain the server data port, and send rest to specify the offset at which the process starts to download. Open the local file and move the file pointer to the corresponding offset. Establish a data channel with the server to start receiving data. After downloading the text thread data, traverse each process, find the unstarted download part, and try to download it. Note: after each request is sent to the server, check the return value to check whether the server status is correct. Otherwise, try again. The mutex lock must be used to access sensitive shared data. Otherwise, the same part may be downloaded twice in this program.

Compilation and debugging

To simplify the compilation process, you have re-compiled the MAKEFILE file and added the-G compilation option to debug the target code using GDB. You have added the connection option to the multi-threaded library-lpthread, the content is as follows:

Exec = FTP

Cc = cc

Primary des =-I

Cflags =-G

Libs =-lpthread

Objs = cmds. O tabs. O domacro. O ftp. O glob. O main. O ruserpass. o

ALL: $ (EXEC)

%. O: %. c

$ (CC) $ (cflags)-C $ <

$ (EXEC): $ (objs)

$ (CC) $ (cflags) $ (libs)-o $ @ $ (objs) $ (ldlibs)

Romfs:

$ (Romfsinst)/bin/$ (EXEC)

Clean:

-Rm-F $ (EXEC) *. GDB *. elf *. o

The debugging tool selects the command line debugging tool GDB. GDB helps you complete the following four functions:

1. Start your program and run it as needed according to your custom requirements.

2. The program to be debugged can be stopped at the specified breakpoint. (The breakpoint can be a conditional expression)

3. When the program is stopped, you can check what happens in your program.

4. dynamically change the execution environment of your program.

Further Improvement

According to the multi-threaded implementation method previously designed, it is easy to add the breakpoint download function. During the download process, each thread has a variable to record the current state. When the connection to the server is lost or is terminated by the user, the state of each thread is saved in the file, when downloading again, first load the breakpoint from the file and start each process to download from the disconnected point until the download is complete.

[References]

• UNIX Network Programming volume1 3rd. Ed. The sockets networking API Writer: W. Richard Steven S, Bill Fenner, Andrew M. rudoffpublisher: Addison Wesley

• Rfc959: File Transfer Protocol (FTP)

• "Linux Kernel Analysis and instance application" Dai Lingli and Ouyang Jin compile the National Defense Industry Press

Multithread download tool linuxdown source code Kong Yang Shanghai Jiao Tong University

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.