1. Introduction
LZ77 algorithm is a dictionary to do data compression algorithm, by Israel's two great God Abraham Lempel and Jacob Ziv published in 1977 paper "A Universal algorithm for sequential data Compression "In the paper.
Data compression coding based on statistics, such as Huffman encoding, requires a priori knowledge-the character frequency of the source, and then compresses. In most cases, however, this priori knowledge is difficult to obtain beforehand. Therefore, it is very important to design a more general data compression coding. The LZ77 data compression algorithm comes into being, its core idea is to use the repetitive structure information of data to compress data. Give a simple example, such as
Take it in righteousness, keep it in righteousness, and Zhou. Take it to cheat power, keep it to Bluff, Qin also.
Take it, and righteousness, and the people, and keep it, and the deceit. are repeated, and they can be expressed simply by pointing to where they appear before. To indicate the position, we define a relative position, as shown in
After the relative position of the message string for the use of fraud, guarding against the force, Qin also. , if the message string before the relative position can be matched, the encoding is the start and the end index of the message string that matches it, and the original character is encoded if it fails to match. The message string after the relative position can be encoded as: [(1-3), (Bluff), (6), (7-9), (Bluff), (12), (6), (Qin), (15-16)], as shown in the figure:
The above example shows how to use index values to represent words to achieve data compression purposes. The core idea of the LZ77 algorithm is the same, and its specific compression process is slightly more complicated than the above example. 2. Principle
This paper mainly discusses the LZ77 algorithm to do compression and decompression, on the LZ77 algorithm, the only translatable, lossless compression (that is, decompression can not be lost to restore information) of the nature of its mathematical proof refer to the original paper [1]. sliding window
As to how to describe the repetitive structure information, the LZ77 algorithm gives a more exact mathematical explanation. First, the length of the definition string S is L (s) L (s), where S (1,j), 1≤j≤l (s) s (1,j), 1≤j≤l (s) is the prefix of s S. for S (1,j) s (1,j) and I≤j i≤j, L (i) L (i) is the maximum value of L L (l≤l (s) −j l≤l (s) −j) that satisfies the following conditions: