Use commons-compress to decompress GBK format WinZip file to UTF8, as well as error using Ziparchiveinputstream read out data is all empty solution

Source: Internet
Author: User

First on the right way:

The correct way should be, first create a zipfile, and then to its entries to do the traversal, each entry is actually a file or folder, the folder is detected when the folder is created, other situations create files, which use Zipfile.getinputstream ( Entry) can get the input stream of the current file (note that the input stream of the file is not the input stream of the compressed file). Then write it down in writer. Well, obviously it's very simple. Here is an example of reading the GBK format of the compressed package, the file encoding in the compressed package is also in the GBK format (that is, files written in Windows and packaged), the output is UTF8 decompression (cross-platform use).

def decompresszip (Source:file, dest:string, sourcecharacters:string = "GBK", destcharacters:string = "UTF-8") = {      if(source.exists) {var os:outputstream=NULLvar inputstream:inputstreamreader=NULLvar outwriter:outputstreamwriter=NULLVal ZipFile=NewZipFile (source, sourcecharacters) var entries=zipfile.getentries Entries.foreach (Entry=if(Entry.isdirectory ())NewFile (dest +entry.getname). Mkdirs ()Else if(Entry! =NULL) {                  Try{val name=entry.getname Val Path= Dest +name var content=NewArray[char] (entry.getSize.toInt) InputStream=NewInputStreamReader (Zipfile.getinputstream (entry), sourcecharacters) println (Inputstream.read (cont ENT)) Val entryfile=NewFile (path) checkfileparent (entryfile) OS=NewFileOutputStream (entryfile) Outwriter=Newoutputstreamwriter (OS, destcharacters); Outwriter.write (NewString (content)} Catch {                       CaseE:throwable =e.printstacktrace ()}finally{                        if(OS! =NULL) {Os.flush Os.close} if(Outwriter! =NULL) {Outwriter.flush Outwriter.close} if(InputStream! =NULL) Inputstream.close}}) Zipfile.close}}

Error Demonstration:

I do not know why, many of the online tutorials are using Ziparchiveinputstream to extract, however:

The class is preferred if reading from files as are limited by not ZipFile ZipArchiveInputStream being able to read the central directory h Eader before returning entries. In particularZipArchiveInputStream

    • May return entries, that is, is not part of the central directory at all and shouldn ' t is considered part of the archive.
    • May return several entries with the same name.
    • would not return internal or external attributes.
    • may return incomplete extra field data.
    • may return unknown sizes and CRC values for entries until the next entry have been reached if the archive uses the Data des Criptor feature.

The use of ZipFile has been recommended in version 1.3 of Commons-compress.

Personally, I have tried ziparchiveinputstream and found a problem, ziparchiveinputstream create a way is cumbersome, need to specify a inputstream, and this method in the API documentation is so written

Constructors
Constructor and Description
ZipArchiveInputStream(InputStream inputStream)Create an instance using UTF-8 encoding
ZipArchiveInputStream(InputStream inputStream, String encoding)Create an instance using the specified encoding
ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields)Create an instance using the specified encoding
ZipArchiveInputStream(InputStream inputStream, String encoding, boolean useUnicodeExtraFields, boolean allowStoredEntriesWithDataDescriptor)Create an instance using the specified encoding

  Parameters: inputStream -The stream to wrap

This construction method does not indicate what the InputStream parameter is, and tries it on the Internet, using:

Val ZipFile =NewZipFile (source, sourcecharacters) var entries=ZipFile.getEntriesentries.foreach (Entry=if(Entry! =NULL) {        Try{val name=entry.getname Val Path= Dest +name var content=NewArray[char] (entry.getSize.toInt) Zais=NewZiparchiveinputstream (Zipfile.getinputstream (Entry)) Val Entryfile=NewFile (path) checkfileparent (entryfile) OS=NewFileOutputStream (entryfile) ioutils.copy (Zais, OS) ....... .....

Read the data is empty, use Zais.read read out Array[byte] and convert it to string discovery is a whitespace character string, the direct output ARRAY[BYTE] discovery is 0. Later read the document probably know what the reason, this ziparchiveinputstream read should be a zip file, However, Zipfile.geiinputstream returned is the input stream of the extracted files, so this problem will occur, try to commons-compress Spark relies on the 12 release of version 1.4 and the latest 1.14 version of this method is wrong, so I suspect that their 12 years after the transfer of the blog is not through their own use and testing to forward. This zipfile and ziparchiveinputstream always feel strange ...

Use Commons-compress to extract GBK format WinZip file to UTF8, and error using Ziparchiveinputstream read out the data is all empty solution

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.