Java efficiently reads large files and Java reads files

Source: Internet
Author: User

Java efficiently reads large files and Java reads files
1. Overview

This tutorial demonstrates how to use Java to efficiently read large files. This article is part of the "Java-regression basics" series of tutorials on Baeldung (http://www.baeldung.com.

2. Read data in memory

The standard way to read a file row is to read it in memory. Both Guava and Apache Commons IO provide the following method to quickly read the file row:

Files.readLines(new File(path), Charsets.UTF_8);FileUtils.readLines(new File(path));

This method causes the problem that all the lines of the file are stored in the memory. When the file is large enough, the program will soon throw an OutOfMemoryError exception.

For example, read an object of about 1 GB:
@Testpublic void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {    String path = ...    Files.readLines(new File(path), Charsets.UTF_8);}

At the beginning, this method only occupies a small amount of memory: (about 0 MB of memory is consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb

However, when all the files are read into the memory, we can see that (about 2 GB of memory is consumed ):

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb

This means that this process consumes about GB of memory-the reason is simple: all the lines of the file are stored in the memory ].

Putting all the file content in the memory will soon exhaust the available memory-no matter how large the actual available memory is, this is obvious.

In addition, we usually do not need to put all the lines of the file into the memory at a time-on the contrary, we only need to traverse each row of the file and then perform corresponding processing, after processing, discard it. Therefore, this is exactly what we will do-through row iteration, rather than putting all rows in the memory.

3. file stream

Now let's take a look at this solution-we will use the java. util. Contents class to scan the file content and read it row by row consecutively:

FileInputStream inputStream = null;Scanner sc = null;try {  inputStream = new FileInputStream(path);  sc = new Scanner(inputStream, "UTF-8");  while (sc.hasNextLine()) {    String line = sc.nextLine();    // System.out.println(line);  }  // note that Scanner suppresses exceptions  if (sc.ioException() != null) {    throw sc.ioException();  }} finally {  if (inputStream != null) {    inputStream.close();  }  if (sc != null) {    sc.close();  }}

This scheme will traverse all rows in the file-each row can be processed without reference. In short, they are not stored in the memory: (about MB of memory is consumed)

[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 763 Mb[main] INFO  org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 605 Mb
4. Apache Commons IO stream

You can also use the Commons IO library and use the custom LineIterator provided by the Library:

LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");try {    while (it.hasNext()) {        String line = it.nextLine();        // do something with line    }} finally {    LineIterator.closeQuietly(it);}

Because not all files are stored in the memory, this leads to a very conservative memory consumption: (approximately 150 MB of memory consumption)

[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Total Memory: 752 Mb[main] INFO  o.b.java.CoreJavaIoIntegrationTest - Free Memory: 564 Mb
5. Conclusion

This short article describes how to process large files without repeated reading or consuming memory. This provides a useful solution for processing large files.

All the implementation and code snippets of these examples can be obtained from my github project -- this is an Eclipse-based project, so it should be easily imported and run.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.