Java reads large files efficiently

Source: Internet
Author: User

1. Overview

This tutorial shows you how to read large files efficiently in Java. This article is part of the "java--Back to Basics" series of tutorials on Baeldung(http://www.baeldung.com/) .

2. read in memory

The standard way to read a file line is to read in memory, and both guava and Apache Commons io provide a quick way to read a file line as follows:

123 Files.readLines(newFile(path), Charsets.UTF_8); FileUtils.readLines(newFile(path));

The problem with this approach is that all the rows of the file are stored in memory and will soon cause the program to throw a outofmemoryerror exception when the file is large enough.

For example: read a file that is about 1G:

12345 @Testpublicvoid givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {    String path = ...    Files.readLines(newFile(path), Charsets.UTF_8);}

This approach starts with a small amount of memory:(approximately 0Mb of memory is consumed )

12 [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116Mb

However, when the files are all read into memory , we can finally see (consumes about 2GB of memory):

12 [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490Mb

This means that this process consumes approximately 2.1GB of memory-the reason is simple: all the rows of the file are now stored in memory.

Putting all of the contents of a file in memory quickly runs out of available memory --no matter how large the actual available memory is, this is obvious.

In addition, we don't usually need to put all the lines of a file into memory at once --instead, we just need to iterate through each line of the file, then do the appropriate processing, and throw it out after processing. So that's exactly what we're going to do--by iterating through the lines, rather than putting all the rows in memory.

3. file Stream

Now let's take a look at this solution-we'll use the Java.util.Scanner class to scan the contents of a file, one line at a time to read it continuously:

123456789101112131415161718192021 FileInputStream inputStream = null;Scanner sc = null;try {    inputStream = new FileInputStream(path);    sc = new Scanner(inputStream, "UTF-8");    while (sc.hasNextLine()) {        String line = sc.nextLine();        // System.out.println(line);    }    // note that Scanner suppresses exceptions    if (sc.ioException() != null) {        throw sc.ioException();    }} finally {    if (inputStream != null) {        inputStream.close();    }    if (sc != null) {        sc.close();    }}

This scenario will traverse all the rows in the file-allowing each row to be processed without maintaining a reference to it. In short, they are not stored in memory :(about 150MB of memory consumption)

12 [main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605Mb
4. Apache Commons IO Flow

You can also use the Commons IO Library implementation to take advantage of the custom lineiterator that the library provides:

123456789 LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");try {    while (it.hasNext()) {        String line = it.nextLine();        // do something with line    }} finally {    LineIterator.closeQuietly(it);}

Because the entire file is not all in memory, it also leads to quite conservative memory consumption:(approximately 150MB of memory is consumed )

12 [main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752Mb[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564Mb
5. Conclusion

This short article describes how to handle large files without having to read and run out of memory -This provides a useful solution for processing large files.

All of these examples are implemented and snippets can be obtained on my GitHub project-this is an eclipse-based project, so it should be easy to import and run.

Java reads large files efficiently

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.