This occurs when the zip file created under WIN is decompressed in linux, and the Chinese path and file name are garbled, so I wrote a script to convert the file name in the zip file. However, if the zip file is created in the WIN system of Japanese, Korean, or traditional Chinese characters, because you do not know the original encoding format... this occurs when the zip file created under WIN is decompressed in linux, and the Chinese path and file name are garbled, so I wrote a script to convert the file name in the zip file. However, if the zip file is created in the Japanese, Korean, or traditional WIN system, transcoding cannot be performed because you do not know the original encoding format.
How to solve...
Reply content:
This occurs when the zip file created under WIN is decompressed in linux, and the Chinese path and file name are garbled, so I wrote a script to convert the file name in the zip file. However, if the zip file is created in the Japanese, Korean, or traditional WIN system, transcoding cannot be performed because you do not know the original encoding format.
How to solve...
LZ's id looks familiar... you have been asking questions about this level for so many years... you are not easy...
open( '/path/to/your.zip' );/* can not open ..? are you kidding me ..? */if ( true !== $res ) throw new Exception( 'Can Not Open Zip File / ' . $res );/* default value of file encoding ... */$encoding = 'EMTPY';/* controller ... change this if mb_detect_encoding return wrong answer ... */$controller = null;/* get file list ... */for ( $i = 0; $i < $zip->numFiles; ++ $i ) { /* get file encoding ... */ $encoding = mb_detect_encoding( $zip->getNameIndex( $i ), $controller ); /* we do not need english named files ... */ if ( 'ASCII' !== $encoding ) break;}/* clean table ... */$zip->close();/* simply output ... */echo $encoding;
The code is like this... judge the system based on the file name...
Windows in simplified Chinese will return EUC-CN... traditional Chinese I guess it should be EUC-TW or BIG5...
Linux and MacOS are UTF-8... files in pure English don't mess up...
There should be encoding problems. For example, compression in mac. If the file is a Chinese name, it is garbled during decompression under win. Therefore, during mac compression, only English file names are recommended.
11 years, someone raised a similar problem: http://bbs.csdn.net/topics/370123319, the solution in the article is: View http://www.pkware.com/documents/cases... and search for “info-zip Unicode Path Extra Field"
After searching this section:
4.6 Third Party Mappings------------------------ 4.6.1 Third party mappings commonly used are: 0x07c8 Macintosh 0x2605 ZipIt Macintosh 0x2705 ZipIt Macintosh 1.3.5+ 0x2805 ZipIt Macintosh 1.3.5+ 0x334d Info-ZIP Macintosh 0x4341 Acorn/SparkFS 0x4453 Windows NT security descriptor (binary ACL) 0x4704 VM/CMS 0x470f MVS 0x4b46 FWKCS MD5 (see below) 0x4c41 OS/2 access control list (text ACL) 0x4d49 Info-ZIP OpenVMS 0x4f4c Xceed original location extra field 0x5356 AOS/VS (ACL) 0x5455 extended timestamp 0x554e Xceed unicode extra field 0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc) 0x6375 Info-ZIP Unicode Comment Extra Field 0x6542 BeOS/BeBox 0x7075 Info-ZIP Unicode Path Extra Field 0x756e ASi UNIX 0x7855 Info-ZIP UNIX (new) 0xa220 Microsoft Open Packaging Growth Hint 0xfd4a SMS/QDOS
Hope to be useful.
@ Ven is the file name encoding, slightly changed the code upstairs, my system is linux, so to recode non-UTF-8 for UTF-8
open($zipfile_name); if(true !== $res) throw new Exception('Can Not Open Zip File '.$res); $encoding = "UTF-8"; $controller = array("ASCII","UTF-8", "GB2312", "GBK", "BIG5"); for($i = 0; $i < $zip->numFiles; ++ $i){ $entry = $zip->getNameIndex($i); $encoding = mb_detect_encoding($entry, $controller); if( "UTF-8" !== $encoding) $entry = iconv($encoding, "UTF-8", $entry); echo $entry." ---> ".$encoding.chr(10); } $zip->close();}detect_encoding($argv[1]);?>
For the correct answer, see @ Sunyanzi.
Due to Windows system's historical reasons, some zip packages generated by the compression software obtain results similar to "CP936" When mb_detect_encoding () is used to check the file name encoding. I was dizzy here, thinking that the function could not correctly detect the encoding. In fact, CP936 is Microsoft's own set of standards, basically equal to GBK.
For more information about the encoding ing between other "CP ***" codes, see this article: Windows code page.
ZIP format, it seems that the file name is not encoded.
As for the unzipping you mentioned, garbled code occurs. This is a problem with the extraction software ......