Java read and write large text files (2GB or more)

Source: Internet
Author: User
Tags rewind

In the following program, a text file with a line number of filelines is divided evenly into Splitnum small text file, where the newline character ' R ' is on Linux, and Windows's Java line break is ' \ r \ n ':

 PackageKddcup2012.task2.FileSystem; ImportJava.io.BufferedInputStream; ImportJava.io.BufferedReader; ImportJava.io.File; ImportJava.io.FileInputStream; ImportJava.io.FileWriter; Importjava.io.IOException; ImportJava.io.InputStreamReader;  Public classFilesplit { Public Static voidMain (string[] args)throwsIOException {LongTimer =System.currenttimemillis (); intbuffersize = 20 * 1024 * 1024;//the cache for read files is 20MB//Creating buffered text input streamsFile File =NewFile ("/media/data/Graduation design/kdd cup/data/userid_profile.txt"); FileInputStream FileInputStream=Newfileinputstream (file); Bufferedinputstream Bufferedinputstream=NewBufferedinputstream (FileInputStream); InputStreamReader InputStreamReader=NewInputStreamReader (Bufferedinputstream); BufferedReader input=NewBufferedReader (InputStreamReader, buffersize); intSplitnum = 112-1;//reduce the number of blocks to split by one        intFilelines = 23669283;//number of lines in the input file        LongPersplitlines = Filelines/splitnum;//number of rows per block         for(inti = 0; I <= splitnum; ++i) {//Split//Create one output per blockFileWriter output =NewFileWriter ("/home/haoqiong/part" + i + ". txt"); String Line=NULL; //line-by-row reading, progressive output             for(LongLinecounter = 0; Linecounter < Persplitlines && (line = Input.readline ())! =NULL; ++linecounter) {Output.append ( line+ "\ r");              } output.flush ();              Output.close (); Output=NULL;          } input.close (); Timer= System.currenttimemillis ()-timer; System.out.println ("Processing time:" +timer); }  }  

The above program to handle large text files only need about 30MB of memory space (which is related to the size of the read buffer set), but the speed is not very fast, in the case of disk no other program occupied, the 200MB file is divided into 112 parts need 20 seconds (machine configuration: Centrino2 P7450 CPU, 2GB DDR3 Memory, Ubuntu 11.10 system, HDD maximum read and write speed of about 60mb/s).

In addition, for files that are hundreds of to 2GB in size, the memory-mapped file will block some, but the memory map can only process files below 2GB because the mapped file length cannot exceed the maximum value of the int type in Java.

Java reads a huge text file to ensure that memory does not overflow and performance is guaranteed 
 PackageHelloword.helloword;ImportJava.io.BufferedReader;ImportJava.io.File;ImportJava.io.FileReader;ImportJava.io.RandomAccessFile;ImportJava.nio.ByteBuffer;ImportJava.nio.MappedByteBuffer;ImportJava.nio.channels.FileChannel; Public classReadbig { Public StaticString FFF = "C:\\mq\\read\\from.xml";  Public Static voidMain1 (string[] args)throwsException {Final intBuffer_size = 0x300000;//buffer size is 3MFile F=NewFile (FFF); Mappedbytebuffer InputBuffer=NewRandomaccessfile (F, "R"). Getchannel (). Map (FileChannel.MapMode.READ_ONLY, f.length ()/2, F.length ()/2); byte[] DST =New byte[Buffer_size];//read 3M content at a time        LongStart =System.currenttimemillis ();  for(intoffset = 0; Offset < inputbuffer.capacity (); Offset + =buffer_size) {            if(Inputbuffer.capacity ()-Offset >=buffer_size) {                 for(inti = 0; i < buffer_size; i++) Dst[i]= Inputbuffer.get (offset +i); } Else {                 for(inti = 0; I < inputbuffer.capacity ()-offset; i++) Dst[i]= Inputbuffer.get (offset +i); }            intLength = (inputbuffer.capacity ()% Buffer_size = = 0)?BUFFER_SIZE:inputBuffer.capacity ()%buffer_size; System.out.println (NewString (DST, 0, length));//New//String (Dst,0,length) allows you to remove a cached string and manipulate it        }        LongEnd =System.currenttimemillis (); System.out.println ("Read file file half of content spent:" + (End-start) + "milliseconds"); }     Public Static voidMain2 (string[] args)throwsException {intBufSize = 1024; byte[] bs =New byte[BufSize]; Bytebuffer Bytebuf= Bytebuffer.allocate (1024); FileChannel Channel=NewRandomaccessfile (FFF, "R"). Getchannel ();  while(Channel.read (BYTEBUF)! =-1) {            intSize =bytebuf.position ();            Bytebuf.rewind (); Bytebuf.get (BS); //the file as a string processing, direct printing as an example. System.out.print (NewString (BS, 0, size));        Bytebuf.clear (); }    }     Public Static voidMain3 (string[] args)throwsException {bufferedreader br=NewBufferedReader (NewFileReader (FFF)); String Line=NULL;  while(line = Br.readline ())! =NULL) {System.out.println (line); }    }     Public Static voidMain (string[] args)throwsException {intBufSize = 1024; byte[] bs =New byte[BufSize]; Bytebuffer Bytebuf= Bytebuffer.allocate (1024); FileChannel Channel=NewRandomaccessfile ("D:\\filename", "R"). Getchannel ();  while(Channel.read (BYTEBUF)! =-1) {            intSize =bytebuf.position ();            Bytebuf.rewind ();            Bytebuf.get (BS); //the file as a string processing, direct printing as an example. System.out.print (NewString (BS, 0, size));        Bytebuf.clear (); }    }}

Java read bulk files, memory overflow? How to read by several lines, read multiple times. Best Answer
 PackageHelloword.helloword;ImportJava.io.BufferedInputStream;ImportJava.io.BufferedReader;ImportJava.io.File;ImportJava.io.FileInputStream;Importjava.io.FileNotFoundException;ImportJava.io.FileReader;ImportJava.io.FileWriter;Importjava.io.IOException;ImportJava.io.InputStreamReader;ImportJava.io.RandomAccessFile;ImportJava.util.Scanner; Public classTestprint { Public Static voidMain (string[] args)throwsIOException {String path= "The path of the file you want to read"; Randomaccessfile BR=NewRandomaccessfile (Path, "RW");//here RW looks at you. If it's all about writing R,String str =NULL, app =NULL; inti = 0;  while(str = br.readline ())! =NULL) {i++; App= App +str; if(I >= 100) {//Suppose to read 100 rowsi = 0; //Here you first operate on these 100 lines, then continue readingApp =NULL;    }} br.close (); }    //The following code is recommended when reading a text file that is larger than 2G per line    voidLargefileio (String inputfile, String outputFile) {Try{Bufferedinputstream bis=NewBufferedinputstream (NewFileInputStream (NewFile (inputfile)); BufferedReader in=NewBufferedReader (NewInputStreamReader (bis, "Utf-8"), 10 * 1024 * 1024);//10M CacheFileWriter FW =NewFileWriter (OutputFile);  while(In.ready ()) {String line=In.readline (); Fw.append ( Line+ " ");            } in.close ();            Fw.flush ();        Fw.close (); } Catch(IOException ex) {ex.printstacktrace (); }    }}
the JDK itself supports the reading and writing of oversized files.  the articles on the Internet are basically divided into two main categories:one is to use the BufferedReader class to read and write oversized files;The other is the use of the Randomaccessfile class read, after comparison, and finally used the previous way to read large files, the following is the relevant code, in fact, very simple-------------------------------------------------------------------
 File File = new   File (filepath); Bufferedinputstream fis  = new  bufferedinputstream (new   FileInputStream (file)); BufferedReader reader  = new  BufferedReader ( new  InputStreamReader (FIS, "Utf-8"), 5*1024*1024); //  reading a text file with a 5M buffer   =" " while  (line = Reader.readline ())! = null   Span style= "color: #008000;" >// todo:write Your Business } 
--------------------------------------------------------------------- Note that when instantiating BufferedReader, add a parameter that allocates the cache to

Java read and write large text files (2GB or more)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.