In the following program, a text file with a line number of filelines is divided evenly into Splitnum small text file, where the newline character ' R ' is on Linux, and Windows's Java line break is ' \ r \ n ':
PackageKddcup2012.task2.FileSystem; ImportJava.io.BufferedInputStream; ImportJava.io.BufferedReader; ImportJava.io.File; ImportJava.io.FileInputStream; ImportJava.io.FileWriter; Importjava.io.IOException; ImportJava.io.InputStreamReader; Public classFilesplit { Public Static voidMain (string[] args)throwsIOException {LongTimer =System.currenttimemillis (); intbuffersize = 20 * 1024 * 1024;//the cache for read files is 20MB//Creating buffered text input streamsFile File =NewFile ("/media/data/Graduation design/kdd cup/data/userid_profile.txt"); FileInputStream FileInputStream=Newfileinputstream (file); Bufferedinputstream Bufferedinputstream=NewBufferedinputstream (FileInputStream); InputStreamReader InputStreamReader=NewInputStreamReader (Bufferedinputstream); BufferedReader input=NewBufferedReader (InputStreamReader, buffersize); intSplitnum = 112-1;//reduce the number of blocks to split by one intFilelines = 23669283;//number of lines in the input file LongPersplitlines = Filelines/splitnum;//number of rows per block for(inti = 0; I <= splitnum; ++i) {//Split//Create one output per blockFileWriter output =NewFileWriter ("/home/haoqiong/part" + i + ". txt"); String Line=NULL; //line-by-row reading, progressive output for(LongLinecounter = 0; Linecounter < Persplitlines && (line = Input.readline ())! =NULL; ++linecounter) {Output.append ( line+ "\ r"); } output.flush (); Output.close (); Output=NULL; } input.close (); Timer= System.currenttimemillis ()-timer; System.out.println ("Processing time:" +timer); } }
The above program to handle large text files only need about 30MB of memory space (which is related to the size of the read buffer set), but the speed is not very fast, in the case of disk no other program occupied, the 200MB file is divided into 112 parts need 20 seconds (machine configuration: Centrino2 P7450 CPU, 2GB DDR3 Memory, Ubuntu 11.10 system, HDD maximum read and write speed of about 60mb/s).
In addition, for files that are hundreds of to 2GB in size, the memory-mapped file will block some, but the memory map can only process files below 2GB because the mapped file length cannot exceed the maximum value of the int type in Java.
Java reads a huge text file to ensure that memory does not overflow and performance is guaranteed
PackageHelloword.helloword;ImportJava.io.BufferedReader;ImportJava.io.File;ImportJava.io.FileReader;ImportJava.io.RandomAccessFile;ImportJava.nio.ByteBuffer;ImportJava.nio.MappedByteBuffer;ImportJava.nio.channels.FileChannel; Public classReadbig { Public StaticString FFF = "C:\\mq\\read\\from.xml"; Public Static voidMain1 (string[] args)throwsException {Final intBuffer_size = 0x300000;//buffer size is 3MFile F=NewFile (FFF); Mappedbytebuffer InputBuffer=NewRandomaccessfile (F, "R"). Getchannel (). Map (FileChannel.MapMode.READ_ONLY, f.length ()/2, F.length ()/2); byte[] DST =New byte[Buffer_size];//read 3M content at a time LongStart =System.currenttimemillis (); for(intoffset = 0; Offset < inputbuffer.capacity (); Offset + =buffer_size) { if(Inputbuffer.capacity ()-Offset >=buffer_size) { for(inti = 0; i < buffer_size; i++) Dst[i]= Inputbuffer.get (offset +i); } Else { for(inti = 0; I < inputbuffer.capacity ()-offset; i++) Dst[i]= Inputbuffer.get (offset +i); } intLength = (inputbuffer.capacity ()% Buffer_size = = 0)?BUFFER_SIZE:inputBuffer.capacity ()%buffer_size; System.out.println (NewString (DST, 0, length));//New//String (Dst,0,length) allows you to remove a cached string and manipulate it } LongEnd =System.currenttimemillis (); System.out.println ("Read file file half of content spent:" + (End-start) + "milliseconds"); } Public Static voidMain2 (string[] args)throwsException {intBufSize = 1024; byte[] bs =New byte[BufSize]; Bytebuffer Bytebuf= Bytebuffer.allocate (1024); FileChannel Channel=NewRandomaccessfile (FFF, "R"). Getchannel (); while(Channel.read (BYTEBUF)! =-1) { intSize =bytebuf.position (); Bytebuf.rewind (); Bytebuf.get (BS); //the file as a string processing, direct printing as an example. System.out.print (NewString (BS, 0, size)); Bytebuf.clear (); } } Public Static voidMain3 (string[] args)throwsException {bufferedreader br=NewBufferedReader (NewFileReader (FFF)); String Line=NULL; while(line = Br.readline ())! =NULL) {System.out.println (line); } } Public Static voidMain (string[] args)throwsException {intBufSize = 1024; byte[] bs =New byte[BufSize]; Bytebuffer Bytebuf= Bytebuffer.allocate (1024); FileChannel Channel=NewRandomaccessfile ("D:\\filename", "R"). Getchannel (); while(Channel.read (BYTEBUF)! =-1) { intSize =bytebuf.position (); Bytebuf.rewind (); Bytebuf.get (BS); //the file as a string processing, direct printing as an example. System.out.print (NewString (BS, 0, size)); Bytebuf.clear (); } }}
Java read bulk files, memory overflow? How to read by several lines, read multiple times. Best Answer
PackageHelloword.helloword;ImportJava.io.BufferedInputStream;ImportJava.io.BufferedReader;ImportJava.io.File;ImportJava.io.FileInputStream;Importjava.io.FileNotFoundException;ImportJava.io.FileReader;ImportJava.io.FileWriter;Importjava.io.IOException;ImportJava.io.InputStreamReader;ImportJava.io.RandomAccessFile;ImportJava.util.Scanner; Public classTestprint { Public Static voidMain (string[] args)throwsIOException {String path= "The path of the file you want to read"; Randomaccessfile BR=NewRandomaccessfile (Path, "RW");//here RW looks at you. If it's all about writing R,String str =NULL, app =NULL; inti = 0; while(str = br.readline ())! =NULL) {i++; App= App +str; if(I >= 100) {//Suppose to read 100 rowsi = 0; //Here you first operate on these 100 lines, then continue readingApp =NULL; }} br.close (); } //The following code is recommended when reading a text file that is larger than 2G per line voidLargefileio (String inputfile, String outputFile) {Try{Bufferedinputstream bis=NewBufferedinputstream (NewFileInputStream (NewFile (inputfile)); BufferedReader in=NewBufferedReader (NewInputStreamReader (bis, "Utf-8"), 10 * 1024 * 1024);//10M CacheFileWriter FW =NewFileWriter (OutputFile); while(In.ready ()) {String line=In.readline (); Fw.append ( Line+ " "); } in.close (); Fw.flush (); Fw.close (); } Catch(IOException ex) {ex.printstacktrace (); } }}
the JDK itself supports the reading and writing of oversized files. the articles on the Internet are basically divided into two main categories:one is to use the BufferedReader class to read and write oversized files;The other is the use of the Randomaccessfile class read, after comparison, and finally used the previous way to read large files, the following is the relevant code, in fact, very simple-------------------------------------------------------------------
File File = new File (filepath); Bufferedinputstream fis = new bufferedinputstream (new FileInputStream (file)); BufferedReader reader = new BufferedReader ( new InputStreamReader (FIS, "Utf-8"), 5*1024*1024); // reading a text file with a 5M buffer =" " while (line = Reader.readline ())! = null Span style= "color: #008000;" >// todo:write Your Business }
--------------------------------------------------------------------- Note that when instantiating BufferedReader, add a parameter that allocates the cache to
Java read and write large text files (2GB or more)