Solve the problem of Chinese path and garbled Characters During ZipFile decompression in java
When decompressing a jar or zip file in JAVA, you can use the built-in JDK APIs JarFile and ZipFile. when decompressing these two formats in windows, the following errors are often reported:
Exception in thread "main" java.lang.IllegalArgumentException: MALFORMEDat java.util.zip.ZipCoder.toString(ZipCoder.java:58)at java.util.zip.ZipFile.getZipEntry(ZipFile.java:531)at java.util.zip.ZipFile.access$900(ZipFile.java:56)at java.util.zip.ZipFile$1.nextElement(ZipFile.java:513)at java.util.zip.ZipFile$1.nextElement(ZipFile.java:483)
There are two possibilities for this problem:
1. The obtained zip or jar is damaged. In this case, you can decompress it using software such as winRAR.
If no error is reported, it can be proved that the obtained compressed file is correct.
2. Character Set problems. zip or jar files contain Chinese names or file paths.
// In windows, use winRAR to compress the File file = new File ("C:/Users/aty/Desktop/demo.zip"); ZipFile zip = new ZipFile (file); Enumeration
Entrys = zip. entries (); while (entrys. hasMoreElements () {ZipEntry entry = entrys. nextElement (); System. out. println (entry. getName ();} zip. close ();
In Windows, use winrarto create a demo.zip file containing Chinese characters. Run the above Code and an exception is thrown. This is because in Chinese Windows, the default Operating System character set is GBK, but ZipFile can only recognize the UTF-8 format. So when the zip file contains Chinese characters (but not UTF-8 code), java ZipFile will report an error.
It is very easy to solve this problem after JDK1.7 and later. JDK provides constructors that allow you to specify the character set of the zip file.
public ZipFile(String name, Charset charset) throws IOException
If you are using JDK or, you can use org.apache.tools.zip. ZipFile in ant.jar. it is similar to ZipFile in JDK.
We can see that ZipFile in JDK1.7 allows us to specify the character set, but the JarFile class still uses the UTF-8 format.