Take the XX data file from the FTP host.
Tens is just a concept, representing data that is equal to tens of millions or more.
This sharing does not involve distributed collection and storage. It is on a machine processing data, if the amount of data is very large, you can consider distributed processing, if I have this experience, will be in time to share.
1, the application of the FTP tool,
2, tens the key part of the FTP core--column directory to the file, as long as this piece is done, basically performance is not too big problem.
You can send the FTP command "NLST" by Apache to the file
# The FTP column directory executes the command with the configuration of the environment variable first, and does not configure the default column directory method NLST
[Java]
# ds_list_cmd = NLST
Public http://www.aliyun.com/zixun/aggregation/19352.html ">file sendcommandandlisttofile (String command,string Localpathname) throws IOException
{
try {
return client.createfile (command, localpathname);
catch (IOException e) {
Log.error (e);
throw new IOException ("the command" +command + "is incorrect");
}
}
Of course there should be other forms, we can study the
100,000 levels above the amount of data should not use the following this way, if the use of words = = Find death
ftpfile[] dirlist = Client.listfiles ();
3, the batch from the file read to download the file name. Load into memory processing, or read a filename to download a file, do not load all the data into memory, if a lot of things will go wrong
Why do you want to split the batches?
Because it is a large amount of data, if there are 1000W records, the listed directory file size 1G above it
4, the core code of the file download----on the file's breakpoint continued transmission, the size of the FTP file and the size of the local file to judge, and then use FTP to provide a breakpoint to continue to pass the function
Download files must be in binary form
Client.enterlocalpassivemode ()//set to passive mode
Ftpclient.binary (); Be sure to use binary mode
[Java]
/** download the required files and support the continuation of breakpoints, download and delete ftp files, so as not to repeat
* @param pathName remote Files
* @param localpath Local Files
* @param registerfilename log file name directory
* @param size upload files
* @return true Download and delete succeeded
* @throws IOException
* @throws Exception
*/
public boolean downLoad (String pathName, String localpath) throws IOException {
Boolean flag = false;
File File = new file (localpath+ ". tmp");//Set Temporary files
FileOutputStream out = null;
try{
Client.enterlocalpassivemode ()//set to passive mode
Client.setfiletype (Ftp.binary_file_type);//set to binary transmission
if (lff.getisfileexists (file)) {//Judge whether the local file exists, if it exists and is less than the length of the ftp file, the breakpoint is renewed;
Long size = This.getsize (pathName);
Long localfilesize = lff.getsize (file);
if (localfilesize > Size) {
return false;
}
out = new FileOutputStream (file,true);
Client.setrestartoffset (localfilesize);
Flag = Client.retrievefile (New String (Pathname.getbytes (), client.getcontrolencoding ()), out);
Out.flush ();
} else{
out = new FileOutputStream (file);
Flag = Client.retrievefile (New String (Pathname.getbytes (), client.getcontrolencoding ()), out);
Out.flush ();
}
}catch (IOException e) {
Log.error (e);
Log.error ("File download Error!");
Throw e;
}finally{
try{
if (null!=out)
Out.close ();
if (flag)
Lff.rename (file, localpath);
}catch (IOException e) {
Throw e;
}
}
return flag;
}
/**
* Get file length
* @param filenamepath Native files
* @return
* @throws IOException
*/
Public long GetSize (String filenamepath) throws ioexception{
Ftpfile [] ftp = client.listfiles (New String (Filenamepath.getbytes (), client.getcontrolencoding ());
Return ftp.length==0? 0:ftp[0].getsize ();
}
Detects if the local file has been downloaded, if the file size is downloaded.
/**
* The size of the fetch file for the local file
* @param file
* @return
*/
Public long getsize (file file) {
Long size = 0;
if (getisfileexists (file)) {
Size = File.length ();
}
return size;
}
5, because the program to run up to more than 100 threads, online monitoring to do some processing, can detect those dead threads, and timely pull up.
T.setuncaughtexceptionhandler (New ThreadException (exlist));
Principle: Add Uncaughtexceptionhandler to each thread, die when the thread corresponding to the information added to a list, and then let the main thread every once in a while to scan the list, if there is data, directly rebuild a thread to run it
6, if the program is resident memory, don't forget to shut down the unused FTP connection in finally
7, large database acquisition program must take into account one thing disk space full processing
Java Virtual machine for disk space is full, in English environment Linux AIX Machine General report
There is not enough spaces in the file system
General report "Disk space full" in Chinese environment
You can use the following code to verify
[Java]
Linux Aix There is not enough spaces in the file system
Window There is not enough spaces in the file system
if (E.tostring (). Contains ("Enough Space") | | E.tostring (). Contains ("Disk space full")
{
Log.error ("channel" +channel_name + "There are not enough spaces on the disk");
Runtime.getruntime (). exit (0);
}