Java handles large text scenarios

Source: Internet
Author: User
Tags bug id file copy

Reprinted from: http://langgufu.iteye.com/blog/2107023

Java handles large files, generally with bufferedreader,bufferedinputstream such as buffered IO classes, but if the file is oversized, the faster way is to use Mappedbytebuffer.

Mappedbytebuffer is a File memory mapping scheme introduced by Java NIO, which is very high in read and write performance. The main thing about NIO is the implementation of support for asynchronous operations. One of these is that by registering a socket channel (Socketchannel) into a selector (Selector), the selection (select) method, which is invoked from time to times, can return the satisfied selection key (Selectionkey), which contains the socket event information. This is the SELECT model.
Socketchannel's reading and writing is done by a class called Bytebuffer (Java.nio.ByteBuffer). The design of this class itself is good, more convenient than the direct operation Byte[]. There are two modes of bytebuffer: direct/indirect. The most typical (and only one) of the indirect patterns is heapbytebuffer, which is the Operation Heap memory (byte[]). But memory is limited, if I want to send a 1G file? It's impossible to really allocate 1G of memory. At this point, you must use the "direct" mode, Mappedbytebuffer, file mapping.
Stop first and talk about the memory management of the operating system. The memory of the general operating system is divided into two parts: physical memory and virtual memory. Virtual memory typically uses a page image file, which is a special file on the hard disk. The operating system is responsible for reading and writing the contents of the paging file, which is called "Page break/Switch". Mappedbytebuffer is similar, you can think of the whole file (no matter how big the file) as a bytebuffer.mappedbytebuffer is just a special bytebuffer, that is, the subclass of Bytebuffer. Mappedbytebuffer maps files directly to memory (where memory refers to virtual memory, not physical memory). In general, you can map the entire file, and if the file is larger, you can map it in segments, as long as you specify that part of the file.

Three different ways:
FileChannel provides a map method for mapping files to memory image files: Mappedbytebuffer map (int mode,long position,long size); You can map the size of a file from position to a memory image file, mode indicates how the memory image file can be accessed: read_only,read_write,private.
A. Read_Only, (read-only): Attempting to modify the resulting buffer will cause the readonlybufferexception to be thrown. (mapmode.read_only)
B. Read_write (read/write): Changes to the resulting buffer will eventually propagate to the file; The change is not necessarily visible to other programs that map to the same file. (Mapmode.read_write)
C. Private (private): changes to the resulting buffer are not propagated to the file, and the change is not visible to other programs that map to the same file; instead, a private copy of the modified portion of the buffer is created. (mapmode.private)

Three methods:

A. Fore (); buffer is Read_write mode, this method forces the file to be written to the buffer content modification
B. Load () Loads the contents of the buffer into memory and returns a reference to the buffer
C. isLoaded () If the contents of the buffer are in physical memory, returns true, otherwise false

Three features:

After calling the channel's map () method, you can map part or all of the file to memory, the mapped memory buffer is a direct buffer and inherits from Bytebuffer, but it has more advantages over Bytebuffer:

A. Read fast
B. Write fast
C. Write Anywhere, anytime

Here's a look at the code:

1PackageStudy2ImportJava.io.FileInputStream;3ImportJava.io.FileOutputStream;4ImportJava.nio.ByteBuffer;5ImportJava.nio.MappedByteBuffer;6ImportJava.nio.channels.FileChannel;78PublicClassMapmemerybuffer {910PublicStaticvoid Main (string[] args)ThrowsException {Bytebuffer bytebuf = bytebuffer.allocate (1024 * 14 * 1024);12Byte[] bbb =NewBYTE[14 * 1024 * 1024];FileInputStream FIS =New FileInputStream ("E://data/other/ultraedit_17.00.0.1035_sc.exe");FileOutputStream fos =New FileOutputStream ("E://data/other/outfile.txt");FileChannel FC =Fis.getchannel ();16Long Timestar = System.currenttimemillis ();//Get the current timeFc.read (BYTEBUF);//1 Read18//Mappedbytebuffer MBB = Fc.map (FileChannel.MapMode.READ_ONLY, 0, Fc.size ());System.out.println (Fc.size ()/1024);20Long timeend = System.currenttimemillis ();//Get the current timeSystem.out.println ("Read Time:" + (Timeend-timestar) + "MS");Timestar =System.currenttimemillis ();Fos.write (BBB);//2. Write24//Mbb.flip ();Timeend =System.currenttimemillis ();System.out.println ("Write Time:" + (Timeend-timestar) + "MS");27  Fos.flush (); 28  Fc.close (); 29  Fis.close (); 30 } 31 32 } 33  run Result: Span style= "COLOR: #008080" >34 14235 35 read time:24ms Span style= "COLOR: #008080" >36 write time:21ms 37 We comment out the annotations 1 and 2, replace them with the annotated statement below, and look at the effect of the operation. 14235 38 read time:2ms 39 Write time:0ms              

It can be seen that the speed has been greatly improved. Mappedbytebuffer is fast, but there are some problems, such as memory footprint and file shutdown. Files opened by Mappedbytebuffer are closed only when garbage is collected, and this point is indeterminate. That's what it says in Javadoc: a mapped byte buffer and the file mapping that it represents remain valid until the buffer itself is Garba Ge-collected.
This provides a solution:

Accesscontroller.doprivileged (New Privilegedaction () {public  Object run () {try {Method g Etcleanermethod = Buffer.getclass (). GetMethod ("cleaner", new class[0]); Getcleanermethod.setaccessible (true (Sun.misc.Cleaner) Getcleanermethod.invoke (Bytebuffer, Span style= "COLOR: #0000ff" >new object[0]); Cleaner.clean (); } catch (Exception e) {e.printstacktrace ();} return null;}});  

About Mappedbytebuffer Resource release issues

A new package was added to the JDK1.4: NIO (java.nio.*). The biggest feature of this library (I think) is the addition of support for asynchronous sockets. In fact in other languages, including the most primitive socket implementations (BSD sockets), this is an early feature: asynchronous callback read/write events, dynamic selection of events of interest through selectors, and so on. Let's talk about the memory management of the operating system. The memory of the general operating system is divided into two parts: physical memory and virtual memory. Virtual memory typically uses a page image file, which is a certain (some) special file on the hard disk. The operating system is responsible for reading and writing the contents of the paging file, which is called "Page break/Switch". Mappedbytebuffer is similar, you can think of the entire file (no matter how large the file) as a bytebuffer. This is a good design, except for the headache that will be discussed later. Java.lang.Object
Java.nio.Buffer
Java.nio.ByteBuffer
Java.nio.MappedByteBufferMappedByteBuffer is a more convenient class to use. The content is the memory-mapped area of the file. The mapped byte buffers are through the FileChannel.mapmethod to create the. The mapped byte buffer and the file-mapping relationship it represents remain valid until the buffer itself becomes a garbage-collected buffer. This class extends with operations specific to memory-mapped file regions ByteBufferClass. The design of this class itself is good, more convenient than the direct operation of byte[]. There are two modes of bytebuffer: direct/indirect. The most typical (and only one) of the indirect patterns is heapbytebuffer, the Operation heap memory (Byte []). But memory is limited after all, what if I want to send a 1G file? It's impossible to really allocate 1G of memory. At this point, you must use the "direct" mode, Mappedbytebuffer, file mapping. This is described in the JDK API documentation: All or part of a mapped byte buffer may be inaccessible at any time, for example, if we intercept the mapped file. An inaccessible zone that attempts to access a mapped byte buffer will not change the contents of the buffer and cause an unspecified exception to be thrown at some point after access or access. It is therefore strongly recommended that appropriate precautions be taken to prevent this program or another program running concurrently from performing operations on mapped files (except for read and write file contents). Mappedbytebuffer can only be obtained by calling FileChannel's map (), and there is no other way. But surprisingly, Sun provided a map () but did not provide unmap (). So what happens? So the problem arises. It is very easy to implement the file copy function via Mappedbytebuffer, which can be implemented in the following ways.
1//File copy2Publicvoid CopyFile (String filename,string srcpath,string destpath)ThrowsIOException {3 File Source =New File (srcpath+ "/" +filename);4 File dest =New File (destpath+ "/" +filename);5 FileChannel in =NULL, out =Null;6Try{7 in =NewFileInputStream (source). Getchannel ();8 out =NewFileOutputStream (dest). Getchannel ();9Long size =In.size ();Ten Mappedbytebuffer buf = In.map (FileChannel.MapMode.READ_ONLY, 0, size);11Out.write (BUF);12  In.close ();   Out.close ()  Source.delete (); // After the file copy is complete, delete the source file 15}catch (Exception e) {16  E.printstacktrace (); 17} finally {18  In.close ();   Out.close ()  }21}       
However, if you want to implement the file file after copying, delete the source file, the above method is problematic. Because the Source.delete () returns False when the deletion fails, the main reason is that the variable buf still has a handle to the source file, and the file is in a non-deleted state. Since Mappedbytebuffer is from the FileChannel map (), why does it not provide unmap ()? Sun himself did not speak clearly why. O ' Reilly's <<java Nio>> said it was because of "security", but how Unmap () would be unsafe and the author was not clear. There are also bug reports on the Sun website: Bug id:4724038 link is http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038, but Sun doesn't think it's a bug, And just a RFE (Request for enhancement), needs to be improved. Fortunately, there is a friend called Bellomi put forward a solution, I have also tested, can achieve the desired function. The specific implementation code is as follows:
1PublicStaticvoid Clean (Final Object Buffer)ThrowsException {2 accesscontroller.doprivileged (NewPrivilegedaction () {3PublicObject Run () {4Try{5 Method Getcleanermethod = Buffer.getclass (). GetMethod ("cleaner",New Class[0]); 6 getcleanermethod.setaccessible (true);  7 Sun.misc.Cleaner Cleaner = (sun.misc.Cleaner) getcleanermethod.invoke (Buffer,new object[0]);  8  Cleaner.clean ();  9} Span style= "COLOR: #0000ff" >catch (Exception e) {10  E.printstacktrace (); 11 }12 return null;}}); 13 14}         

Do not know why Sun does not provide a derivation of bytebuffer. After all, this is a very useful class, if you allow derivation, then I can manipulate not only the heap memory and files, I can extend to any storage device.

Java handles large text scenarios

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.