This is a creation in Article, where the information may have evolved or changed.
The standard library of Go has its own zip library.
The zip package, however, is UTF8 encoded by default when processing internal file names.
For Windows Chinese users, generating and reading the ZIP internal file name is GBK encoded by default.
Therefore, a conversion is required when dealing with file names that involve GBK.
The official go language go.text
sub-standard library already supports various encodings, the following is the function of UTF8 to GBK:
import "golang.org/x/text/encoding/simplifiedchinese"func utf8ToGBK(text string) (string, error) { dst := make([]byte, len(text)*2) tr := simplifiedchinese.GB18030.NewEncoder() nDst, _, err := tr.Transform(dst, []byte(text), true) if err != nil { return text, err } return string(dst[:nDst]), nil}
When generating a zip file, use the utf8ToGBK
processing file name:
func main() { file, err := os.Create("中文-测试.zip") if err != nil { log.Fatal(err) } defer file.Close() wzip := zip.NewWriter(file) defer func() { if err := wzip.Close(); err != nil { log.Fatal(err) } }() // 压缩文件 var files = []struct{ Name, Body string }{ {"11/1/readme.txt", "UTF8 字符串."}, {"11/1/readme2.txt", "This archive contains some text files."}, {"汉字/2/gopher.txt", "Gopher names:\nGeorge\nGeoffrey\nGonzo"}, {"11/中文.txt", "中文Get animal handling licence.\nWrite more examples."}, {"空目录/", ""}, } for _, file := range files { name, _ := utf8ToGBK(file.Name) // 文件名转换为 GBK编码 f, err := wzip.Create(name) if err != nil { log.Fatal(err) } _, err = f.Write([]byte(file.Body)) if err != nil { log.Fatal(err) } }}
This makes it possible to generate file name compression files with Simplified Chinese in windows.
2014 supplement:
In fact, in the new zip specification,
Support for UTF8 encoded filenames has been provided.
File: APPNOTE.TXT - .ZIP File Format SpecificationVersion: 6.3.34.4.4 general purpose bit flag: (2 bytes)Bit 11: Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8. (see APPENDIX D)
Specifically, the header information for each file is in the Flags
11bit bit of the field.
If the bit bit is 0, the table is encoded locally (local encoding is GBK), and if 1 it is encoded with UTF8.
Header information corresponds to the archive/zip of the zip library. Members of the FILEHEADER structure Flags
:
type FileHeader struct { // Name is the name of the file. // It must be a relative path: it must not start with a drive // letter (e.g. C:) or leading slash, and only forward slashes // are allowed. Name string CreatorVersion uint16 ReaderVersion uint16 Flags uint16 Method uint16 ModifiedTime uint16 // MS-DOS time ModifiedDate uint16 // MS-DOS date CRC32 uint32 CompressedSize uint32 // deprecated; use CompressedSize64 UncompressedSize uint32 // deprecated; use UncompressedSize64 CompressedSize64 uint64 UncompressedSize64 uint64 Extra []byte ExternalAttrs uint32 // Meaning depends on CreatorVersion Comment string}
If you want to generate a UTF8 encoded file name, you can specify the field manually:
Func Main () {
file, err := os.Create("中文-测试.zip")if err != nil { log.Fatal(err)}defer file.Close()wzip := zip.NewWriter(file)defer func() { if err := wzip.Close(); err != nil { log.Fatal(err) }}()// 压缩文件var files = []struct{ Name, Body string }{ {"11/1/readme.txt", "UTF8 字符串."}, {"11/1/readme2.txt", "This archive contains some text files."}, {"汉字/2/gopher.txt", "Gopher names:\nGeorge\nGeoffrey\nGonzo"}, {"11/中文.txt", "中文Get animal handling licence.\nWrite more examples."}, {"空目录/", ""},}for _, file := range files { header := &zip.FileHeader{ Name: file.Name, Flags: 1 << 11, // 使用utf8编码 Method: zip.Deflate, } f, err := wzip.CreateHeader(header) if err != nil { log.Fatal(err) } _, err = f.Write([]byte(file.Body)) if err != nil { log.Fatal(err) }}
}
In fact, the default should be to assume that the zip.Create
file name is UTF8 encoded, which avoids the problem of decoding caused by different local coding between different machines.
CL54360043 has been submitted for this change and it is unclear whether it will be accepted.
Unfortunately, Win7 's own Zip browser always ignores the field (always with local encoding).