How go handles Chinese filenames in zip

Source: Internet
Author: User
Tags deprecated gopher
This is a creation in Article, where the information may have evolved or changed.

The standard library of Go has its own zip library.

The zip package, however, is UTF8 encoded by default when processing internal file names.
For Windows Chinese users, generating and reading the ZIP internal file name is GBK encoded by default.
Therefore, a conversion is required when dealing with file names that involve GBK.

The official go language go.text sub-standard library already supports various encodings, the following is the function of UTF8 to GBK:

import "golang.org/x/text/encoding/simplifiedchinese"func utf8ToGBK(text string) (string, error) {    dst := make([]byte, len(text)*2)    tr := simplifiedchinese.GB18030.NewEncoder()    nDst, _, err := tr.Transform(dst, []byte(text), true)    if err != nil {        return text, err    }    return string(dst[:nDst]), nil}

When generating a zip file, use the utf8ToGBK processing file name:

func main() {    file, err := os.Create("中文-测试.zip")    if err != nil {        log.Fatal(err)    }    defer file.Close()    wzip := zip.NewWriter(file)    defer func() {        if err := wzip.Close(); err != nil {            log.Fatal(err)        }    }()    // 压缩文件    var files = []struct{ Name, Body string }{        {"11/1/readme.txt", "UTF8 字符串."},        {"11/1/readme2.txt", "This archive contains some text files."},        {"汉字/2/gopher.txt", "Gopher names:\nGeorge\nGeoffrey\nGonzo"},        {"11/中文.txt", "中文Get animal handling licence.\nWrite more examples."},        {"空目录/", ""},    }    for _, file := range files {        name, _ := utf8ToGBK(file.Name) // 文件名转换为 GBK编码        f, err := wzip.Create(name)        if err != nil {            log.Fatal(err)        }        _, err = f.Write([]byte(file.Body))        if err != nil {            log.Fatal(err)        }    }}

This makes it possible to generate file name compression files with Simplified Chinese in windows.

2014 supplement:

In fact, in the new zip specification,
Support for UTF8 encoded filenames has been provided.

File:    APPNOTE.TXT - .ZIP File Format SpecificationVersion: 6.3.34.4.4 general purpose bit flag: (2 bytes)Bit 11: Language encoding flag (EFS).  If this bit is set,    the filename and comment fields for this file    MUST be encoded using UTF-8. (see APPENDIX D)

Specifically, the header information for each file is in the Flags 11bit bit of the field.
If the bit bit is 0, the table is encoded locally (local encoding is GBK), and if 1 it is encoded with UTF8.

Header information corresponds to the archive/zip of the zip library. Members of the FILEHEADER structure Flags :

type FileHeader struct {    // Name is the name of the file.    // It must be a relative path: it must not start with a drive    // letter (e.g. C:) or leading slash, and only forward slashes    // are allowed.    Name string    CreatorVersion     uint16    ReaderVersion      uint16    Flags              uint16    Method             uint16    ModifiedTime       uint16 // MS-DOS time    ModifiedDate       uint16 // MS-DOS date    CRC32              uint32    CompressedSize     uint32 // deprecated; use CompressedSize64    UncompressedSize   uint32 // deprecated; use UncompressedSize64    CompressedSize64   uint64    UncompressedSize64 uint64    Extra              []byte    ExternalAttrs      uint32 // Meaning depends on CreatorVersion    Comment            string}

If you want to generate a UTF8 encoded file name, you can specify the field manually:

Func Main () {

file, err := os.Create("中文-测试.zip")if err != nil {    log.Fatal(err)}defer file.Close()wzip := zip.NewWriter(file)defer func() {    if err := wzip.Close(); err != nil {        log.Fatal(err)    }}()// 压缩文件var files = []struct{ Name, Body string }{    {"11/1/readme.txt", "UTF8 字符串."},    {"11/1/readme2.txt", "This archive contains some text files."},    {"汉字/2/gopher.txt", "Gopher names:\nGeorge\nGeoffrey\nGonzo"},    {"11/中文.txt", "中文Get animal handling licence.\nWrite more examples."},    {"空目录/", ""},}for _, file := range files {    header := &zip.FileHeader{        Name:   file.Name,        Flags:  1 << 11, // 使用utf8编码        Method: zip.Deflate,    }    f, err := wzip.CreateHeader(header)    if err != nil {        log.Fatal(err)    }    _, err = f.Write([]byte(file.Body))    if err != nil {        log.Fatal(err)    }}

}

In fact, the default should be to assume that the zip.Create file name is UTF8 encoded, which avoids the problem of decoding caused by different local coding between different machines.
CL54360043 has been submitted for this change and it is unclear whether it will be accepted.

Unfortunately, Win7 's own Zip browser always ignores the field (always with local encoding).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.