Graph description LZ77 compression algorithm coding Python implementation principle

Source: Internet
Author: User
Preface LZ77 is a lossless compression algorithm published in 1977 by AbrahamLempel, Israel. LZ77 is a typical dictionary-based compression algorithm. Many compression technologies are based on LZ77. In view of its position in the field of data compression, this article will introduce in detail its principles in combination with images and source code. Principles: first, we will introduce several professional terms. 1. lookaheadbuffer

LZ77 is a lossless compression algorithm published in 1977 by Abraham Lempel, Israel. LZ77 is a typical dictionary-based compression algorithm. Many compression technologies are based on LZ77. In view of its position in the field of data compression, this article will introduce in detail its principles in combination with images and source code.

Principles:

First, we will introduce several technical terms.

1. lookahead buffer (I don't know how to express it in Chinese. it is temporarily called the code to be written ):

Waiting for encoding

2. search buffer:

The encoded area searches for the buffer zone.

3. Sliding window:

Window of the specified size, including "search buffer" (left) + "waiting for encoding" (right)

Next, we will introduce the specific encoding process:

To encode the code to be encoded, the encoder searches for the buffer in the sliding window until a matching string is found. The distance between the starting string that matches the string and the buffer to be encoded is called the "offset value", and the length of the matching string is called the "matching length ". During encoding, the encoder searches in the search area until it finds the maximum matching string and outputs (o, l). here, o is the offset value and l is the matching length. Then the window slides l to start encoding. If no matching string is found, the output (0, 0, c) and c are the characters waiting for encoding in the encoding area, and the window slides "1 ". The algorithm implementation will be similar to the following:

while( lookAheadBuffer not empty ) { get a pointer (position, match) to the longest match in the window for the lookAheadBuffer;output a (position, length, char()); shift the window length+1 characters along; }

The main steps are as follows:

1. set the encoding position to the start of the input stream.

2. find the maximum matching string in the search area in the awaiting coding area of the sliding window

3. if the string is found, the output (offset value, matching length), and the window slides forward, "matching length"

4. if not found, the output (0, 0, the first character of the encoding area), the window slides forward to a unit.

5. if the encoding area is not empty, go back to step 2.

The description is too complex. let's explain it with examples.

Instance:

Now there is a string "AABCBBABC", which is now encoded.

At the beginning, the window slides into the position

Python code implementation:

Class Lz77: def init (self, inputStr): self. inputStr = inputStr # input stream self. searchSize = 5 # Search buffer (encoding area) size self. aheadSize = 3 # Size of the lookAhead buffer (to be encoded) self. windSpiltIndex = 0 # index self starting from lookHead buffer. move = 0 self. notFind =-1 # no matching string found # get the final index def getWinEndIndex (self) of the sliding window: return self. windSpiltIndex + self. aheadSize # get the start index def getWinStartIndex (self): return self. windSpiltIndex-self. searchSize # determine whether the lookHead buffer is empty def isLookHeadEmpty (self): return True if self. windSpiltIndex + self. move> len (self. inputStr)-1 else False def encoding (self): step = 0 print ("Step Position Match Output") while not self. isLookHeadEmpty (): #1. sliding window self. winMove () #2. get the offset value and length (offset, matchLen) = self. findMaxMatch () #3. set the distance self to slide next in the window. setMoveSteps (matchLen) if matchLen = 0: # The match is 0, indicating that no string match exists. output the next letter nextChar = self. inputStr [self. windSpiltIndex] result = (step, self. windSpiltIndex, '-', '(0, 0)' + nextChar) else: result = (step, self. windSpiltIndex, self. inputStr [self. windSpiltIndex-offset: self. windSpiltIndex-offset + matchLen], '(' + str (offset) + ',' + str (matchLen) + ') #4. output result self. output (result) step = step + 1 # only used to set the first step # sliding window (moving the demarcation point) def winMove (self): self. windSpiltIndex = self. windSpiltIndex + self. move # find the maximum matching character and return the offset value relative to the window demarcation point and the matching length def findMaxMatch (self): matchLen = 0 offset = 0 minEdge = self. minEdge () + 1 # obtain the right boundary of the encoding region # traverse the code to be encoded and find the largest matching string for I in range (self. windSpiltIndex + 1, minEdge): # print ("I: % d" % I) offsetTemp = self. searchBufferOffest (I) if offsetTemp = self. notFind: return (offset, matchLen) offset = offsetTemp # offset value matchLen = matchLen + 1 # Add 1 return (offset, matchLen) # whether the input parameter string exists in the search buffer. If yes, the start index def searchBufferOffest (self, I) that matches the string is returned: searchStart = self. getWinStartIndex () searchEnd = self. windSpiltIndex # the following if statements are special when processing starts. if searchEnd <1: return self. notFind if searchStart <0: searchStart = 0 if searchEnd = 0: searchEnd = 1 searchStr = self. inputStr [searchStart: searchEnd] # Search Area string findIndex = searchStr. find (self. inputStr [self. windSpiltIndex: I]) if findIndex =-1: return-1 return len (searchStr)-findIndex # set the number of steps to slide in the next window def setMoveSteps (self, matchLen ): if matchLen = 0: self. move = 1 else: self. move = matchLen def minEdge (self): return len (self. inputStr) if len (self. inputStr)-1 <self. getWinEndIndex () else self. getWinEndIndex () + 1 def output (self, touple): print ("% d % s" % touple) if name = "main ": lz77 = Lz77 ("AABCBBABC") lz77.encoding ()

I just wrote it down and didn't think too much about the details. Please note that this is not the final code. it is only used to elaborate the principle and is for reference only. The output result is the above output (the code location is offset due to the fixed style of the pougo blog Park. Please note that



The above is a detailed description of the implementation principle of LZ77 compression algorithm encoding Python. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.