Encoding-How to Use PHP to detect the encoding system in which files in a ZIP package are created

Source: Internet
Author: User
Tags ziparchive
This occurs when the zip file created under WIN is decompressed in linux, and the Chinese path and file name are garbled, so I wrote a script to convert the file name in the zip file. However, if the zip file is created in the WIN system of Japanese, Korean, or traditional Chinese characters, because you do not know the original encoding format... this occurs when the zip file created under WIN is decompressed in linux, and the Chinese path and file name are garbled, so I wrote a script to convert the file name in the zip file. However, if the zip file is created in the Japanese, Korean, or traditional WIN system, transcoding cannot be performed because you do not know the original encoding format.
How to solve...

Reply content:

This occurs when the zip file created under WIN is decompressed in linux, and the Chinese path and file name are garbled, so I wrote a script to convert the file name in the zip file. However, if the zip file is created in the Japanese, Korean, or traditional WIN system, transcoding cannot be performed because you do not know the original encoding format.
How to solve...

LZ's id looks familiar... you have been asking questions about this level for so many years... you are not easy...

 open( '/path/to/your.zip' );/* can not open ..? are you kidding me ..? */if ( true !== $res )    throw new Exception( 'Can Not Open Zip File / ' . $res );/* default value of file encoding ... */$encoding = 'EMTPY';/* controller ... change this if mb_detect_encoding return wrong answer ... */$controller = null;/* get file list ... */for ( $i = 0; $i < $zip->numFiles; ++ $i ) {    /* get file encoding ... */    $encoding = mb_detect_encoding( $zip->getNameIndex( $i ), $controller );    /* we do not need english named files ... */    if ( 'ASCII' !== $encoding ) break;}/* clean table ... */$zip->close();/* simply output ... */echo $encoding;

The code is like this... judge the system based on the file name...

Windows in simplified Chinese will return EUC-CN... traditional Chinese I guess it should be EUC-TW or BIG5...

Linux and MacOS are UTF-8... files in pure English don't mess up...

There should be encoding problems. For example, compression in mac. If the file is a Chinese name, it is garbled during decompression under win. Therefore, during mac compression, only English file names are recommended.

11 years, someone raised a similar problem: http://bbs.csdn.net/topics/370123319, the solution in the article is: View http://www.pkware.com/documents/cases... and search for “info-zip Unicode Path Extra Field"

After searching this section:

4.6 Third Party Mappings------------------------                    4.6.1 Third party mappings commonly used are:          0x07c8        Macintosh          0x2605        ZipIt Macintosh          0x2705        ZipIt Macintosh 1.3.5+          0x2805        ZipIt Macintosh 1.3.5+          0x334d        Info-ZIP Macintosh          0x4341        Acorn/SparkFS           0x4453        Windows NT security descriptor (binary ACL)          0x4704        VM/CMS          0x470f        MVS          0x4b46        FWKCS MD5 (see below)          0x4c41        OS/2 access control list (text ACL)          0x4d49        Info-ZIP OpenVMS          0x4f4c        Xceed original location extra field          0x5356        AOS/VS (ACL)          0x5455        extended timestamp          0x554e        Xceed unicode extra field          0x5855        Info-ZIP UNIX (original, also OS/2, NT, etc)          0x6375        Info-ZIP Unicode Comment Extra Field          0x6542        BeOS/BeBox          0x7075        Info-ZIP Unicode Path Extra Field          0x756e        ASi UNIX          0x7855        Info-ZIP UNIX (new)          0xa220        Microsoft Open Packaging Growth Hint          0xfd4a        SMS/QDOS

Hope to be useful.

@ Ven is the file name encoding, slightly changed the code upstairs, my system is linux, so to recode non-UTF-8 for UTF-8

 open($zipfile_name);    if(true !== $res)        throw new Exception('Can Not Open Zip File '.$res);    $encoding = "UTF-8";    $controller = array("ASCII","UTF-8", "GB2312", "GBK", "BIG5");    for($i = 0; $i < $zip->numFiles; ++ $i){        $entry = $zip->getNameIndex($i);        $encoding = mb_detect_encoding($entry, $controller);        if( "UTF-8" !== $encoding)            $entry = iconv($encoding, "UTF-8", $entry);        echo $entry." ---> ".$encoding.chr(10);    }    $zip->close();}detect_encoding($argv[1]);?>

For the correct answer, see @ Sunyanzi.
Due to Windows system's historical reasons, some zip packages generated by the compression software obtain results similar to "CP936" When mb_detect_encoding () is used to check the file name encoding. I was dizzy here, thinking that the function could not correctly detect the encoding. In fact, CP936 is Microsoft's own set of standards, basically equal to GBK.
For more information about the encoding ing between other "CP ***" codes, see this article: Windows code page.

ZIP format, it seems that the file name is not encoded.
As for the unzipping you mentioned, garbled code occurs. This is a problem with the extraction software ......

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.