1.
Overview
This tutorial shows you how to read large files efficiently in Java. This article is part of the "java--Back to Basics" series of tutorials on Baeldung(http://www.baeldung.com/) .
2.
read in memory
The standard way to read a file line is to read in memory, and both guava and Apache Commons io provide a quick way to read a file line as follows:
123 |
Files.readLines( new File(path), Charsets.UTF_8); FileUtils.readLines( new File(path)); |
The problem with this approach is that all the rows of the file are stored in memory and will soon cause the program to throw a outofmemoryerror exception when the file is large enough.
For example: read a file that is about 1G:
12345 |
@Test public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException { String path = ... Files.readLines( new File(path), Charsets.UTF_8); } |
This approach starts with a small amount of memory:(approximately 0Mb of memory is consumed )
12 |
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb [main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb |
However, when the files are all read into memory , we can finally see (consumes about 2GB of memory):
12 |
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb [main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb |
This means that this process consumes approximately 2.1GB of memory-the reason is simple: all the rows of the file are now stored in memory.
Putting all of the contents of a file in memory quickly runs out of available memory --no matter how large the actual available memory is, this is obvious.
In addition, we don't usually need to put all the lines of a file into memory at once --instead, we just need to iterate through each line of the file, then do the appropriate processing, and throw it out after processing. So that's exactly what we're going to do--by iterating through the lines, rather than putting all the rows in memory.
3.
file Stream
Now let's take a look at this solution-we'll use the Java.util.Scanner class to scan the contents of a file, one line at a time to read it continuously:
123456789101112131415161718192021 |
FileInputStream inputStream =
null
;
Scanner sc =
null
;
try {
inputStream =
new FileInputStream(path);
sc =
new Scanner(inputStream,
"UTF-8"
);
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() !=
null
) {
throw sc.ioException();
}
}
finally {
if (inputStream !=
null
) {
inputStream.close();
}
if (sc !=
null
) {
sc.close();
}
}
|
This scenario will traverse all the rows in the file-allowing each row to be processed without maintaining a reference to it. In short, they are not stored in memory :(about 150MB of memory consumption)
12 |
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb [main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb |
4.
Apache Commons IO
Flow
You can also use the Commons IO Library implementation to take advantage of the custom lineiterator that the library provides:
123456789 |
LineIterator it = FileUtils.lineIterator(theFile,
"UTF-8"
);
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
}
finally {
LineIterator.closeQuietly(it);
}
|
Because the entire file is not all in memory, it also leads to quite conservative memory consumption:(approximately 150MB of memory is consumed )
12 |
[main] INFO o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb [main] INFO o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb |
5.
Conclusion
This short article describes how to handle large files without having to read and run out of memory -This provides a useful solution for processing large files.
All of these examples are implemented and snippets can be obtained on my GitHub project-this is an eclipse-based project, so it should be easy to import and run.
Java reads large files efficiently