Design and implementation of FTP multi-threaded download

Source: Internet
Author: User
Tags anonymous define function ftp commands glob mutex ftp client file transfer protocol ftp protocol
Design and implementation of FTP multi-threaded download

                                  
                               National Defense Science and Technology University Microsoft Club
 


Introduction to the FTP protocol

File TRANSFER PROTOCOL (FTP) is a document transfer protocol that manages file transfer between computers. FTP is usually referred to as a file transfer service.

FTP is a very extensive communication protocol used on the Internet. It is a collection of rules that support Internet file transfers that enable Internet users to copy files from one host to another, thereby providing users with great convenience and benefits. FTP, like other Internet services, is a client/server approach. Using the method is simple, starting the FTP client establishes a connection to the remote host, and then sends a transfer command to the remote host, which responds when the command is received and executes the correct command. FTP has a fundamental limitation, that is, if the user is not authorized by an FTP host, can not access the host, in fact, users can not log on remotely (remote login) into the host. That is, if the user is not registered on a host to obtain authorization, without a username and password, it cannot be transmitted to the host file. However, Anonymous FTP (anonymous FTP) cancels this restriction. The FTP protocol is independent of the operating system, and any program on the operating system can transmit data to each other as long as it conforms to the FTP protocol.

FTP application in the TCP/IP network architecture is located in the application layer, the FTP protocol used in the network protocol stack on the TCP layer above, belongs to an application layer protocol,

FTP Applications

The FTP application uses the C/S architecture. The client program implements a command line or graphical interface, translates the user commands into FTP commands, and sends them to the server-side program. The server-side program responds to the FTP command and returns information about the success or failure of the operation to the client in the form of an FTP response. Both parties comply with the FTP protocol and complete the file transfer service.

Commonly used client programs are: CuteFTP, FlashGet and other software with graphical interface; The FTP command for the command line interface (Windows self-belt). Use more server-side programs mainly have serv-u FTP server, vsftpd and so on.

FTP Common Commands

The common FTP commands are shown in table 1:

Command

Description

USER

User name

Pass

Password

Sest

Specify file Offset

RETR

Get files

STOR

Uploading files

LIST

Get the current directory file list

TYPE

Format transfer data (binary format, character city)

PORT

Require the server to actively connect to the client listening port

PASV

Require clients to listen on ports and accept client connection requests

CWD

Change Server current directory

QUIT

Exit

FTP Response

The FTP protocol specifies that the FTP response starts with a three-digit number, followed by specific response information. The meaning of the FTP response to the first digit number is as follows:

• 1: Ready to respond and start data transfer

• 2: Definitely complete the answer, the operation completed successfully

• 3: Affirmative mediation response, wait for subsequent command

• 4: Operation failed, but can try again later

• 5: Operation failed, and do not try again

For example, if you send a command to a connected FTP server: "Size Test.txt", if there is a test.txt file in the current directory, the server may return "213 122220", indicating that the operation completed successfully and the Test.txt file size is 122220 bytes

FTP Download file Flow

There are two connections between the FTP server and the client: one for transmitting the FTP command (the command must be initiated by the client), the connection always exists, the other is used to transfer data to the client, and a data connection is established whenever the file or directory file list information is transferred, and the transfer is immediately disconnected.

There are two ways to establish a data connection: PORT and PASV. Port mode, the client listens to a certain ports, the server initiates the data connection. In PASV mode, the server listens for a port and the client initiates the data connection. The client can send the PORT or PASV command to the server to specify the connection mode, respectively.

Before downloading the file, you first need to log in, and the status chart for the login is shown in Figure 4. The box represents the state, and the arrows represent the state transformation. The characters on the arrow indicate the command sent to the server by the client or the information returned by the client server. B indicates the start state, W indicates the waiting state, E indicates an error, and S indicates successful landing.

After successful landing, send the type set data transmission format (character formatting or binary mode), and then send port or PASV Select data transfer mode. Get a successful response or establish a data connection with the server based on the return information. If the command you sent earlier is successfully responded to, start sending the following request telling the server to prepare the data. First send the rest command (if necessary), specify the offset at which to start the download, and then send the rest command to specify the file to download.

If the command sent earlier is successfully responded to, the server data is ready to be completed. The next thing you need to do is establish a data connection with the service, start accepting the data, and save the received data in a local file until you disconnect the data and download it.

The FTP protocol itself does not provide a multithreaded download command, but it provides a possibility for multi-threaded downloads. Because the rest command can specify offsets, we can create multiple threads to download from different parts at the same time, and then reorganize the parts of the download to achieve multi-threaded downloads. In the fourth part will be described in detail.



The main task is to read the provided FTP client source code (NETKIT-FTP) on the basis of the addition of multithreaded download commands. Therefore, before implementing multithreaded downloads, you need to understand the structure of the source code first.

The source code is implemented by standard C and can be downloaded from the Ftp://ftp.uk.linux.org/pub/linux/Networking/netkit. Its annotation is more detailed, the level is clear, the readability is strong. The program separates the interface logic from the business logic.

The source code consists of 11 files: main.c, GLOB.C, Glob.h, CMDS.C, Cmds.h, CMDTAB.C, Ftp.c, Ftp_var.h, DOMACRO.C, PATHNAMES.C, RUSERPASS.C.

Among them GLOB.C, Glob.h, Ftp_var.h, CMDTAB.C, DOMACRO.C, PATHNAMES.C, ruserpass.c mainly define some macro global variables and global functions. Where CMDTAB.C defines all the commands supported by the program and their corresponding functions. MAIN.C implements the main loop of the program, waits until the discovery user enters a command, and then passes the command to the command line parameter processing module implemented by CMDS.C, Cmds.h. The command line parameter processing module carries on the parameter legality check, obtains the valid parameter to send the user request to the FTP.C Implementation User Command processing module, finally executes the user request. (Most of the execution needs to be connected to the FTP server).



The idea of multithreading is to build multiple threads, connect to servers, download from different locations of files, and then merge the data you receive into the same file.

issues to be addressed

The issues to be addressed and the corresponding solutions are:

• How each thread shares data. The greatest advantage of threading is that it is convenient to share data from a husband's thread, so that each thread can share data through global variables.

• How do the threads remain mutually exclusive? You can implement mutually exclusive access to shared variables through a thread lock. Lock before access, unlock after access.

• How do I get the size of the file? You can get the size of the file by sending the "size" command to the server.

• How do I divide files evenly between processes? First, divide the file into equal size.

• How are the processes downloaded from the specified location? You can specify the offset at which to start the download by sending the REST command to the server.

• How do I combine different pieces of data into the same file? Different threads can open the same local file, which enables you to save different parts of the file by moving the file pointer.

added global variables and functions

The main global variables added to the original program are:

Loginuser: Log logged in User name

Password: Record the password used by the login user

DownloadFile record the file name to download

File name saved by LocalFile record

FileSize the size of the record file

Macro Maxthreadnum defines the maximum number of threads allowed

Macro BLOCKSIZE defines the basic unit of the thread download data, that is, the file is divided into block size lock lock for thread line mutex access shared variable

The custom structure thread_data the array variable threaddata, recording the data for each thread.

Structure Thread_data defined as

struct Thread_data {

pthread_t ID;

/* The ID of the thread * *

int number;

/* The number of the thread *

int hasbegin;

/* Indicate the thread has started or not * *

int FD;

/* The file description * *

Long start;

/* The offset of it part * *

Long end;

/* The end of it thread * *

int errornum;

/* Record the thread ' s error number * *

};

The added functions include:

void Mtget (int argc, char *argv[]), mainly realizes the parameter validity check.

Mtrecvfile (const int threadnum, char * local, char * remote), to achieve multi-threaded download function.

Long Threadfunction (char * argv), a thread-running function that completes the function of downloading a portion of a file.

int connectftp (char* host,int iport), connecting server

int SendCommand (int sock, char * cmdstr), sending commands to the server

Implementation Steps

• Add the previously mentioned global variables to the ftp_var.h.

• In CMDTAB.C: Add variable const char mtreceivehelp[] = Receive file byseveral threads; add one row to the initialization of global variable Cmdtab: {"Mtget", mtrecei Vehelp, 1, 1, 1, mtget, NULL, NULL}

• Define function Mtget in CMDS.C and verify the validity of the parameters.

• Define function Mtrecvfile in ftp.c to achieve multi-threaded downloads. The main tasks include: Initializing the variables to record the state of each thread, checking whether there is permission to establish local files, obtaining the size of the downloaded file, and allocating the task reasonably to each line thread (the allocation interval is recorded in the variable threaddata), establishing a certain number of threads to complete the download task.

• Add function Threadfunction to ftp.c to add code to complete the download. Task assignment algorithm. Assuming the file size is 18KB and the number of threads is 5,blocksize=2048, the allocation process is that the file is divided into 9 basic units by block size, and 5 units are allocated equally to 5 threads, that is, each thread completes two basic units (4KB, the last thread exception). The download flow of the thread with the label I is: First the loop waits for the Threaddata[i-1].start to be set to one, then continues. This ensures that the thread is started sequentially. Download the task. Check if the Threaddata[i].start is 1, or 1 to exit, indicating that the other thread is completing the task of this thread; Access to the global variables to obtain the FTP server IP address and port number, as well as the login username and password, landing server. Send the command TYPE, set the data transfer mode to binary mode, select Passive mode, and obtain the server's data port, and send REST to specify the offset at which this process begins downloading. Open the local file and move the file pointer to the appropriate offset. Set up a data channel with the server and start receiving data. After downloading the text thread data, iterate through each process, looking for the part of the download that was not started, trying to download the section. Part of the note: Every time you send a request to the server, check the return value to determine whether the server is in the correct state or retry sending. When accessing sensitive shared data, you must use a mutex, otherwise the same part may be downloaded two times in this program.

Compiling and debugging

To simplify the compilation process, rewrite the MakeFile file, add a-G compilation option, and use GDB to debug the target code, and add the connection options for using the multi-line threading-lpthread as follows:

EXEC = FTP

Cc=cc

Includes=-i

Cflags=-g

Libs=-lpthread

OBJS = cmds.o cmdtab.o domacro.o ftp.o glob.o main.o RUSERPASS.O

All: $ (EXEC)

%.o:%.c

$ (CC) $ (cflags)-C $<

$ (EXEC): $ (OBJS)

$ (CC) $ (cflags) $ (LIBS)-O $@ $ (OBJS) $ (ldlibs)

Romfs:

$ (romfsinst)/bin/$ (EXEC)

Clean

-rm-f $ (EXEC) *.gdb *.elf *.O

The Debug tool has selected the command-line debugging tool GDB. GDB mainly helps you to complete the following four aspects of the function:

1, start your program, you can follow your custom requirements of the arbitrary operation of the program.

2, allows the program to be debugged in the specified adjustment to stop at the breakpoint. (A breakpoint can be a conditional expression)

3. When the program is stopped, you can check what happens in your program at this time.

4, dynamic changes in the execution of your program environment.

further improvements

According to the previous design of multithreading implementation, it is easy to increase the ability to download breakpoints. Because during the download process, each thread has a variable that records the current state, and when it finds a connection to the server or is terminated by the user, saves the state of each thread in the file, and when it is downloaded again, first load the breakpoint from the file, start each process from the breakpoint and start the download until the download is complete.

[Reference Document]

• UNIX Network Programming Volume1 3rd. Ed.the Sockets networking API writer:w. Richard Stevens, Bill Fenner, Andrew M. Rudoffpublisher:addison Wesley

rfc959:file TRANSFER PROTOCOL (FTP)

"Linux kernel Analysis and instance application" Delingli, Ouyang, national defense Industry Press

Multi-Threaded Download tool linuxdown source code Kongyang Shanghai Jiaotong University

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.