Encoding-How to use PHP to detect which files within a ZIP package are created under what encoding system

Source: Internet
Author: User
Tags ziparchive
It originated in the zip created under win under Linux decompression, Chinese path and file name will be garbled, so I wrote a script to convert the code inside the zip file name. However, if it is a zip created under the Japanese, Korean, or Traditional Chinese win system, it cannot be transcoded because it does not know the original encoding format.
How to solve ...

Reply content:

It originated in the zip created under win under Linux decompression, Chinese path and file name will be garbled, so I wrote a script to convert the code inside the zip file name. However, if it is a zip created under the Japanese, Korean, or Traditional Chinese win system, it cannot be transcoded because it does not know the original encoding format.
How to solve ...

LZ ID Look familiar ... I've been asking this level question for so many years ... You're not so easy.

 
  Open ('/path/to/your.zip ');/* Can not open ... Is you kidding me ...? */if (True!== $res)    throw new Exception (' Can not Open Zip File/'. $res);/* Default value of File encoding ... */$encoding = ' emtpy ';/* Controller ... change this if mb_detect_encoding return wrong answer ... */$controller = null;/* Get file list */for ($i = 0; $i < $zip->numfiles; + + $i) {/    * get file Encoding ... */    $encoding = mb_ Detect_encoding ($zip->getnameindex ($i), $controller);    /* We do not need 中文版 named files ... *    /if (' ASCII '!== $encoding) break;} /* Clean Table ... */$zip->close ();/* Simply output ... */echo $encoding;

This is the code ... According to the file name to determine the system ...

Windows in Simplified Chinese will return EUC-CN ... Traditional Chinese I guess it should be euc-tw or BIG5 ...

Linux and MacOS are all UTF-8 ... Don't mess up the documents in plain English ...

Should be a coding problem, for example: Compression under the Mac, if the file is a Chinese name, to win under the decompression of all is garbled. So when compressing under the Mac, try to use only English filenames.

In 11, someone mentioned a similar question: http://bbs.csdn.net/topics/370123319, the solution in this article is to say: Check http://www.pkware.com/documents/cases ... and search " Info-zip Unicode Path Extra Field "

After searching this section:

4.6 Third party Mappings------------------------4.6.1 Third party Mappings commonly used is: 0x07c8 Macintosh 0x2605 zipit Macintosh 0x2705 zipit Macintosh 1.3.5+ 0x2           805 zipit Macintosh 1.3.5+ 0x334d info-zip Macintosh 0x4341 acorn/sparkfs          0x4453 Windows NT Security descriptor (binary ACL) 0x4704 vm/cms 0x470f MVS 0x4b46 FWKCS MD5 (see below) 0x4c41 OS/2 access Control List (text ACL) 0x4d49 I Nfo-zip OpenVMS 0x4f4c xceed original location extra field 0x5356 Aos/vs (ACL) 0x 5455 extended timestamp 0x554e xceed Unicode extra field 0x5855 info-zip UNIX (orig          Inal, also OS/2, NT, etc) 0x6375 info-zip Unicode Comment Extra Field 0x6542 Beos/bebox 0x7075 INFO-zip Unicode Path Extra Field 0x756e ASi unix 0x7855 info-zip Unix (new) 0xa220 Microsoft Open Packaging growth Hint 0xfd4a Sms/qdos

Hope to be useful.

@Ven is the code of the file name, a little change of the downstairs, my system is Linux, so to re-encode non-UTF-8 to UTF-8

 
  Open ($zipfile _name);    if (true!== $res)        throw new Exception (' Can not Open Zip File '. $res);    $encoding = "UTF-8";    $controller = Array ("ASCII", "UTF-8", "GB2312", "GBK", "BIG5");    for ($i = 0; $i < $zip->numfiles; + + $i) {        $entry = $zip->getnameindex ($i);        $encoding = mb_detect_encoding ($entry, $controller);        if ("UTF-8"!== $encoding)            $entry = Iconv ($encoding, "UTF-8", $entry);        echo $entry. "--->". $encoding. chr;    $zip->close ();} Detect_encoding ($argv [1]);? >

See @sunyanzi's answer to the correct answer, and add some more here.
Due to the historical reasons of Windows system, the ZIP package generated by some compression software, when checking the file name encoding with mb_detect_encoding (), will get a result similar to "CP936". I was stunned here because the function failed to detect the code correctly. In fact, CP936 is Microsoft's own set of standards, basically equals GBK.
And about the other "cp***" encoding correspondence, perhaps you can see this article: Windows code page

Zip format, it seems that the file name does not encode this said.
As to what you said the decompression garbled, this is the problem of decompression software ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.