Java efficiently reads large files and Java reads files

Last Update:2016-06-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Java efficiently reads large files and Java reads files
1. Overview

This tutorial demonstrates how to use Java to efficiently read large files. This article is part of the "Java-regression basics" series of tutorials on Baeldung (http://www.baeldung.com.

2. Read data in memory

The standard way to read a file row is to read it in memory. Both Guava and Apache Commons IO provide the following method to quickly read the file row:

Files.readLines(new File(path), Charsets.UTF_8);FileUtils.readLines(new File(path));

This method causes the problem that all the lines of the file are stored in the memory. When the file is large enough, the program will soon throw an OutOfMemoryError exception.

For example, read an object of about 1 GB:

@Testpublic void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {    String path = ...    Files.readLines(new File(path), Charsets.UTF_8);}

At the beginning, this method only occupies a small amount of memory: (about 0 MB of memory is consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, when all the files are read into the memory, we can see that (about 2 GB of memory is consumed ):

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

This means that this process consumes about GB of memory-the reason is simple: all the lines of the file are stored in the memory ].

Putting all the file content in the memory will soon exhaust the available memory-no matter how large the actual available memory is, this is obvious.

In addition, we usually do not need to put all the lines of the file into the memory at a time-on the contrary, we only need to traverse each row of the file and then perform corresponding processing, after processing, discard it. Therefore, this is exactly what we will do-through row iteration, rather than putting all rows in the memory.

3. file stream

Now let's take a look at this solution-we will use the java. util. Contents class to scan the file content and read it row by row consecutively:

FileInputStream inputStream = null;Scanner sc = null;try {  inputStream = new FileInputStream(path);  sc = new Scanner(inputStream, "UTF-8");  while (sc.hasNextLine()) {    String line = sc.nextLine();    // System.out.println(line);  }  // note that Scanner suppresses exceptions  if (sc.ioException() != null) {    throw sc.ioException();  }} finally {  if (inputStream != null) {    inputStream.close();  }  if (sc != null) {    sc.close();  }}

This scheme will traverse all rows in the file-each row can be processed without reference. In short, they are not stored in the memory: (about MB of memory is consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb

4. Apache Commons IO stream

You can also use the Commons IO library and use the custom LineIterator provided by the Library:

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");try {    while (it.hasNext()) {        String line = it.nextLine();        // do something with line    }} finally {    LineIterator.closeQuietly(it);}

Because not all files are stored in the memory, this leads to a very conservative memory consumption: (approximately 150 MB of memory consumption)

[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb

5. Conclusion

This short article describes how to process large files without repeated reading or consuming memory. This provides a useful solution for processing large files.

All the implementation and code snippets of these examples can be obtained from my github project -- this is an Eclipse-based project, so it should be easily imported and run.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Java efficiently reads large files and Java reads files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Java efficiently reads large files and Java reads files

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support