Java Large data processing

Source: Internet
Author: User
Keywords Ftp name if Java large data

Take the XX data file from the FTP host.

Tens is just a concept, representing data that is equal to tens of millions or more.

This sharing does not involve distributed collection and storage. It is on a machine processing data, if the amount of data is very large, you can consider distributed processing, if I have this experience, will be in time to share.

1, the application of the FTP tool,

2, tens the key part of the FTP core--column directory to the file, as long as this piece is done, basically performance is not too big problem.

You can send the FTP command "NLST" by Apache to the file

# The FTP column directory executes the command with the configuration of the environment variable first, and does not configure the default column directory method NLST

[Java]

# ds_list_cmd = NLST

Public http://www.aliyun.com/zixun/aggregation/19352.html ">file sendcommandandlisttofile (String command,string Localpathname) throws IOException

{

try {

return client.createfile (command, localpathname);

catch (IOException e) {

Log.error (e);

throw new IOException ("the command" +command + "is incorrect");

}

}

Of course there should be other forms, we can study the

100,000 levels above the amount of data should not use the following this way, if the use of words = = Find death

ftpfile[] dirlist = Client.listfiles ();

3, the batch from the file read to download the file name. Load into memory processing, or read a filename to download a file, do not load all the data into memory, if a lot of things will go wrong

Why do you want to split the batches?

Because it is a large amount of data, if there are 1000W records, the listed directory file size 1G above it

4, the core code of the file download----on the file's breakpoint continued transmission, the size of the FTP file and the size of the local file to judge, and then use FTP to provide a breakpoint to continue to pass the function

Download files must be in binary form

Client.enterlocalpassivemode ()//set to passive mode

Ftpclient.binary (); Be sure to use binary mode

[Java]

/** download the required files and support the continuation of breakpoints, download and delete ftp files, so as not to repeat

* @param pathName remote Files

* @param localpath Local Files

* @param registerfilename log file name directory

* @param size upload files

* @return true Download and delete succeeded

* @throws IOException

* @throws Exception

*/

public boolean downLoad (String pathName, String localpath) throws IOException {

Boolean flag = false;

File File = new file (localpath+ ". tmp");//Set Temporary files

FileOutputStream out = null;

try{

Client.enterlocalpassivemode ()//set to passive mode

Client.setfiletype (Ftp.binary_file_type);//set to binary transmission

if (lff.getisfileexists (file)) {//Judge whether the local file exists, if it exists and is less than the length of the ftp file, the breakpoint is renewed;

Long size = This.getsize (pathName);

Long localfilesize = lff.getsize (file);

if (localfilesize > Size) {

return false;

}

out = new FileOutputStream (file,true);

Client.setrestartoffset (localfilesize);

Flag = Client.retrievefile (New String (Pathname.getbytes (), client.getcontrolencoding ()), out);

Out.flush ();

} else{

out = new FileOutputStream (file);

Flag = Client.retrievefile (New String (Pathname.getbytes (), client.getcontrolencoding ()), out);

Out.flush ();

}

}catch (IOException e) {

Log.error (e);

Log.error ("File download Error!");

Throw e;

}finally{

try{

if (null!=out)

Out.close ();

if (flag)

Lff.rename (file, localpath);

}catch (IOException e) {

Throw e;

}

}

return flag;

}

/**

* Get file length

* @param filenamepath Native files

* @return

* @throws IOException

*/

Public long GetSize (String filenamepath) throws ioexception{

Ftpfile [] ftp = client.listfiles (New String (Filenamepath.getbytes (), client.getcontrolencoding ());

Return ftp.length==0? 0:ftp[0].getsize ();

}

Detects if the local file has been downloaded, if the file size is downloaded.

/**

* The size of the fetch file for the local file

* @param file

* @return

*/

Public long getsize (file file) {

Long size = 0;

if (getisfileexists (file)) {

Size = File.length ();

}

return size;

}

5, because the program to run up to more than 100 threads, online monitoring to do some processing, can detect those dead threads, and timely pull up.

T.setuncaughtexceptionhandler (New ThreadException (exlist));

Principle: Add Uncaughtexceptionhandler to each thread, die when the thread corresponding to the information added to a list, and then let the main thread every once in a while to scan the list, if there is data, directly rebuild a thread to run it

6, if the program is resident memory, don't forget to shut down the unused FTP connection in finally

7, large database acquisition program must take into account one thing disk space full processing

Java Virtual machine for disk space is full, in English environment Linux AIX Machine General report

There is not enough spaces in the file system

General report "Disk space full" in Chinese environment

You can use the following code to verify

[Java]

Linux Aix There is not enough spaces in the file system

Window There is not enough spaces in the file system

if (E.tostring (). Contains ("Enough Space") | | E.tostring (). Contains ("Disk space full")

{

Log.error ("channel" +channel_name + "There are not enough spaces on the disk");

Runtime.getruntime (). exit (0);

}

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.