Prevent Garbled text in the ZIP file generated by 7-zip on Mac OS X

Source: Internet
Author: User

May 3, 2009 @ PM · filed under flow account

For a long time, I have found that in Mac OS X, whether using stuffit expander or unzip in the command line, when extracting some zip files generated in windows, the obtained Chinese file name is garbled. However, this situation has rarely occurred and has not attracted my attention. Today, I finally felt I could not bear it. I searched Google with a few simple keywords, and I didn't seem to have found anything specifically about this. So I decided to spend some time studying it.

Most of my zip files in windows are generated using the free software 7-zip. Although its strength lies in its own 7z format, for compatibility reasons, I only use it to compress ZIP files and decompress all common compressed file formats. The zip file standards have not been recorded since the very beginning, and currently there is no file name encoding information, so a large number of zip files are all locally encoded by the creator of the compressed package, for example, the file names in the ZIP files I encountered are encoded in simplified Chinese GBK. While my Mac OS X locale is a en_US.UTF-8, after unlocking it naturally cannot correctly identify the GBK encoded file name.

After reading the Wikipedia entry in ZIP file format, I found that UTF-8 is recommended for file name encoding in the latest zip standard. This is not surprising, apparently for cross-platform requirements, UTF-8 coding is the best choice. The question below is how to make 7-zip generate a zip file encoded in UTF-8 that is compressed into the file name.

In a test in windows, I found an interesting phenomenon: When locale in Windows is simplified Chinese (China) ("control panel-region and language options-advanced-non-UnicodeProgramSelect-Chinese (PRC), zip files compressed from 7-zip are compressed by GBK encoding, and when locale in Windows is English (US, compressed zip file encoding is actually a UTF-8! In addition, in the Chinese locale, whether it is in GBK or UTF-8 encoding file name ZIP files can be properly unlocked; and in the English locale, only can unlock the UTF-8 encoding file, the GBK-encoded file is garbled. This indicates that 7-zip has the ability to process UTF-8 encoded file names. But why only English (USA)CodePages (that is, the most basic ASCII) will use UTF-8?

Continue searching and find that the concept of zip internal file name encoding has been introduced since version 4.58 in the version history of the UTF-8. The default mode is to use the current code page if the locale code page contains characters for the file name that generates the ZIP file; if not, use the UTF-8. At the same time, 7-zip also provides two modes to force the file name to be encoded in a UTF-8, or to force the file name to be encoded in the current locale (that is, not converted ).

With this in mind, the solution becomes simple: when compressing a zip file, use the-MCU option to force the encoding of the compressed file name using the UTF-8. The operations in the graphic interface are as follows:

Note that the "add to. Zip" item in the Explorer Shell context menu of 7-zip can only generate ZIP files with default parameters. If you want to generate a UTF-8 ZIP file that does not contain garbled characters on other platforms, you can only use "add to archive ..." Menu item.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.