Java reads large files efficiently

Last Update:2016-05-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Overview

This tutorial shows you how to read large files efficiently in Java. This article is part of the "java--Back to Basics" series of tutorials on Baeldung(http://www.baeldung.com/) .

2. read in memory

The standard way to read a file line is to read in memory, and both guava and Apache Commons io provide a quick way to read a file line as follows:

123	`Files.readLines(newFile(path), Charsets.UTF_8);` `FileUtils.readLines(newFile(path));`

The problem with this approach is that all the rows of the file are stored in memory and will soon cause the program to throw a outofmemoryerror exception when the file is large enough.

For example: read a file that is about 1G:

12345 @Testpublicvoid givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException { String path = ... Files.readLines(newFile(path), Charsets.UTF_8);}

This approach starts with a small amount of memory:(approximately 0Mb of memory is consumed )

12	`[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory:` `128Mb[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory:` `116Mb`

However, when the files are all read into memory , we can finally see (consumes about 2GB of memory):

12	`[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory:` `2666Mb[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory:` `490Mb`

This means that this process consumes approximately 2.1GB of memory-the reason is simple: all the rows of the file are now stored in memory.

Putting all of the contents of a file in memory quickly runs out of available memory --no matter how large the actual available memory is, this is obvious.

In addition, we don't usually need to put all the lines of a file into memory at once --instead, we just need to iterate through each line of the file, then do the appropriate processing, and throw it out after processing. So that's exactly what we're going to do--by iterating through the lines, rather than putting all the rows in memory.

3. file Stream

Now let's take a look at this solution-we'll use the Java.util.Scanner class to scan the contents of a file, one line at a time to read it continuously:

123456789101112131415161718192021 FileInputStream inputStream = null;Scanner sc = null;try { inputStream = new FileInputStream(path); sc = new Scanner(inputStream, "UTF-8"); while (sc.hasNextLine()) { String line = sc.nextLine(); // System.out.println(line); } // note that Scanner suppresses exceptions if (sc.ioException() != null) { throw sc.ioException(); }} finally { if (inputStream != null) { inputStream.close(); } if (sc != null) { sc.close(); }}

This scenario will traverse all the rows in the file-allowing each row to be processed without maintaining a reference to it. In short, they are not stored in memory :(about 150MB of memory consumption)

12	`[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory:` `763Mb[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory:` `605Mb`

4. Apache Commons IO Flow

You can also use the Commons IO Library implementation to take advantage of the custom lineiterator that the library provides:

123456789 LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");try { while (it.hasNext()) { String line = it.nextLine(); // do something with line }} finally { LineIterator.closeQuietly(it);}

Because the entire file is not all in memory, it also leads to quite conservative memory consumption:(approximately 150MB of memory is consumed )

12	`[main] INFO o.b.java.CoreJavaIoIntegrationTest - Total Memory:` `752Mb[main] INFO o.b.java.CoreJavaIoIntegrationTest - Free Memory:` `564Mb`

5. Conclusion

This short article describes how to handle large files without having to read and run out of memory -This provides a useful solution for processing large files.

All of these examples are implemented and snippets can be obtained on my GitHub project-this is an eclipse-based project, so it should be easy to import and run.

Java reads large files efficiently

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java reads large files efficiently

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java reads large files efficiently

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support