Do the project process encountered to resolve more than 100 m TXT file, and warehousing. With the previous FileInputStream, BufferedReader obviously not, although readline this method can be read directly by row, but to read a 140M or so, 68W data files, not only time-consuming and will overflow memory, That is, you can't wait until you've read the 68W data, the memory overflows. So we have to use the relevant objects and methods below NiO.
Use a byte buffer (java.nio.ByteBuffer); a channel for reading, writing, mapping, and manipulating Files (Java.nio.channels.FileChannel); Set up a text note set ( Java.nio.charset.Charset); supports read and write to random access files (java.io.RandomAccessFile).
The idea is: set two buffers, a small, large buffer for each read amount, small buffer for each row of data (make sure the size can hold the longest line in the text). Read when the judge is not a line break 13, is the case to return a row of data, not the words continue to read until the file is finished.
Implementation method:
FileChannel Fc=raf.getchannel ();
Read the number of bytes cached at a time by reading the file
Bytebuffer fbb=bytebuffer.allocate (1024*5);
Fc.read (FBB);
Fbb.flip ();
Bytes cached per line according to your actual requirements
Bytebuffer bb=bytebuffer.allocate (500);
Decide whether to finish reading the file
public Boolean Hasnext () throws IOException {
if (EOF) return false;
if (Fbb.position () ==fbb.limit ()) {//Determine whether the current position is limited to the buffer
if (ReadByte () ==0) return false;
}
while (true) {
if (Fbb.position () ==fbb.limit ()) {
if (ReadByte () ==0) break;
}
byte A=fbb.get ();
if (a==13) {
if (Fbb.position () ==fbb.limit ()) {
if (ReadByte () ==0) break;
}
return true;
}else{
if (Bb.position () < Bb.limit ()) {
Bb.put (a);
}else {
if (ReadByte () ==0) break;
}
}
}
return true;
}
Private int ReadByte () throws ioexception{
// Makes the buffer ready to reread the included data: it keeps the restrictions unchanged and sets the position to zero.
fbb.rewind (); The
//makes the buffer ready for a new sequential channel read or relative get operation: It sets the limit to its current position and then sets the location to zero.
fbb.clear ();
if (This.fc.read (FBB) ==-1) {
EOF=true;
return 0;
}else{
Fbb.flip ();
return fbb.position ();
}
}
public byte[] Next () {
bb.flip ()
It is important here to return the byte array to facilitate, the row is split in the case of merging, otherwise, if the buffer limit is reached, a Chinese character is removed two bytes, it will display an abnormal
byte tm[] = Arrays.copyofrange (Bb.array (), Bb.position (), Bb.limit ());
Bb.clear ();
return TM;
}