Python ZIP file Compression

Source: Internet
Author: User
Tags zipinfo pkware

From a simple perspective, the zip format will be a good choice, and Python's support for the ZIP format is simple and easy to use.
1) Simple Application
If you only want to use python for compression and decompression, you don't need to repeat the document. Here we provide a simple usage for you to understand at a glance.
Import zipfile
F = zipfile.zipfile('filename.zip ', 'w', zipfile. zip_deflated)
F.write('file1.txt ')
F.write('file2.doc ')
F.write('file3.rar ')
F. Close ()
F.zip file. zipfile ('filename ')
F. extractall ()
F. Close ()
I don't know if the above example is simple enough?
1.1 zipfile. zipfile (filename [, mode [, compression [, allowzip64])
There is no doubt about filename.
Mode is the same as normal file operations. 'R' indicates opening an existing read-only ZIP file; 'W' indicates clearing and opening a write-only ZIP file, or create a write-only ZIP file. 'A' indicates opening a zip file and adding content.
Compression indicates the compression format. The optional compression formats include zip_store and zip_deflated. Zip_store is the default value, indicating no compression; zip_deflated indicates compression. If you do not know what deflated is, we recommend that you complete the makeup.
When allowzip64 is true, it indicates that 64-bit compression is supported. Generally, this option is used when the compressed file is larger than 2 GB. By default, this value is false, this is because Unix systems do not support it.
1.2 zipfile. Close ()
To be honest, there is nothing to say. If there is one, it means that any file you write will not be actually written to the disk until it is closed.
1.3 zipfile. Write (filename [, arcname [, compress_type])
Acrname is the name of the compressed file. It is the same as filename by default.
Compress_type exists because the ZIP file allows different compression types for compressed files.
1.4 zipfile. extractall ([path [, member [, password])
There is nothing to say about the path to extract the directory.
List of names of files to be decompressed by Member
Password this option is required when the ZIP file has a password
This is enough for simple applications.
2) advanced applications
2.1 zipfile. is_zipfile (filename)
Determine whether a file is compressed.
2.2 zipfile. namelist ()
Back to file list
2.3 zipfile. Open (name [, mode [, password])
Open a file in the compressed file
2.4 zipfile. infolist ()
2.5 zipfile. getinfo (name)
The above file returns the zipinfo object, but one returned is the list, and the other returned is a zipinfo
Zipinfo class
2.6 zipinfo. filename
2.7 zipinfo. date_time
The format of the returned value is (year, month, date, hour, minute, second)
2.8 zipinfo. compress_type
2.9 zipinfo. Comment
2.10zipinfo.extra
2.11zipinfo.create _ System
2.12zipinfo.extract _ version
2.13zipinfo.reserved always 0
2.14zipinfo.flag _ bits
2.15zipinfo.volume
Unzip zipinfo.internal _ ATTR
2.17zipinfo.external _ ATTR
2.18zipinfo.header _ offset
2.19zipinfo.crc
2.20zipinfo.file _ size
2.21zipinfo.compress _ size
2.22zipfile.testzip ()
Check the CRC of each file. If an error occurs, the corresponding file list is returned.
2.23zipfile.setpassword (password)
2.24zipfile.read (name [, password])
Returns the corresponding file.
2.25zipfile.printdir ()
Print the compressed Folder Information
2.26zipfile.writestr (zipinfo_or_arcname, bytes)
Pyzipfile class
In addition to the above methods and attributes, zipfile. pyzipfile also has a special method.
2.27pyzipfile.writepy (pathname, basename)
Generally, only the. PyC and. Pyo files are compressed, and The. py files are not compressed.
----------------------------------------------------------------------
ZIP file format information
A zip file consists of three parts: compression of the source file data area + compression of the source file directory end mark
1) compressing the source file data Zone
In this data area, each compressed source file/directory is a record. The record format is as follows: [File Header + file data + Data descriptor]
A. File Header Structure
Composition Length
File Header mark 4 bytes (0x04034b50)
Pkware version 2 bytes required to decompress the file
2 bytes
Compression Method 2 bytes
Last modified file time 2 bytes
Last modified file date 2 bytes
CRC-32 validation 4 bytes
Size after compression 4 bytes
Uncompressed size 4 bytes
File Name Length: 2 bytes
Extended record length 2 bytes
File Name (uncertain length)
Extended field (indefinite length)

B. file data

C. Data Descriptor
Composition Length
CRC-32 validation 4 bytes
Size after compression 4 bytes
Uncompressed size 4 bytes
This data descriptor only exists when the 3rd bits marked in the global mode are set to 1 (see the following description), immediately after the last byte of the compressed data. This data descriptor is used only when the output ZIP file cannot be searched. For example, in a zip file on a drive that cannot be retrieved (such as a tape drive. This data descriptor is not generally available for ZIP files on disks.

2) compress the source file directory
Each record in this data area corresponds to a data record in the data area of the compressed source file.
Composition Length
4 bytes (0x02014b50)
Pkware version 2 bytes used for compression
Pkware version 2 bytes required to decompress the file
2 bytes
Compression Method 2 bytes
Last modified file time 2 bytes
Last modified file date 2 bytes
CRC-32 validation 4 bytes
Size after compression 4 bytes
Uncompressed size 4 bytes
File Name Length: 2 bytes
Extended field length 2 bytes
File comment length: 2 bytes
Disk start Number 2 bytes
Internal File Attribute 2 bytes
External file attribute 4 bytes
Partial header offset 4 bytes
File Name (uncertain length)
Extended field (indefinite length)
File comment (uncertain length)

3) End mark of the compressed source file directory
Composition Length
Directory end mark 4 bytes (0x02014b50)
Current disk Number 2 bytes
Directory start disk Number 2 bytes
Total number of records on this disk: 2 bytes
Total number of records in the directory area: 2 bytes
Directory size 4 bytes
The offset of the first disk in the directory area is 4 bytes.
ZIP file annotation length: 2 bytes
ZIP file annotation (uncertain length)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.