LZ77 detailed description of compression algorithm encoding principles (combined with images and simple code)

Source: Internet
Author: User

LZ77 detailed description of compression algorithm encoding principles (combined with images and simple code)
Preface LZ77 is a lossless compression algorithm published in 1977 by Abraham Lempel, Israel. LZ77 is a typical dictionary-based compression algorithm. Many compression technologies are based on LZ77. In view of its position in the field of data compression, this article will introduce in detail its principles in combination with images and source code. Principles: first, we will introduce several professional terms. 1. lookahead buffer (I do not know how to express it in Chinese. It is temporarily called the waiting encoding region): The waiting encoding region 2. search buffer: encoded area, search buffer 3. sliding Window: a window of the specified size, including "Search Buffer" (left) + "to be encoded" (right). Next, we will introduce the specific encoding process: to encode the code to be encoded, the encoder searches for the buffer in the sliding window until it finds the matched string. The distance between the starting string that matches the string and the buffer to be encoded is called the "offset value", and the length of the matching string is called the "matching length ". During encoding, the encoder searches in the search area until it finds the maximum matching string and outputs (o, l). Here, o is the offset value and l is the matching length. Then the window slides l to start encoding. If no matching string is found, the output (0, 0, c) and c are the characters waiting for encoding in the Encoding Area, and the window slides "1 ". The algorithm implementation will be similar to the following: while (lookAheadBuffer not empty) {get a pointer (position, match) to the longest match in the window for the lookAheadBuffer; output a (position, length, char (); shift the window length + 1 characters along;} main steps: 1. set the encoding position to the start of the input stream. find the maximum matching string in the search area in the awaiting coding area of the sliding window. 3. if the string is found, the output (offset value, matching length), and the window slides forward, "matching length" 4. if not found, the output (0, 0, the first character of the Encoding Area), the window slides forward to a unit of 5. if the encoding area is not empty, it is too complicated to return to step 2 to describe it. Let's talk about it with an example: there is a string "AAB" CBBABC ", which is now encoded. At the beginning, the window sliding position is shown in the figure. The buffer to be encoded contains three characters: "AAB". At this time, the Search Buffer is empty. Therefore, the first character of the encoding. Because the search area is empty, the matching string cannot be found. The output (0, 0, A) is shifted to the right of the window. For example, the encoding area is "ABC ". Start encoding. Encode "A" first and find "A" in the search area ". Because the Code does not exceed the code to be encoded, "AB" is started, but no matching string is found in the search area, so it cannot be encoded. Therefore, only "A" can be encoded ". Output (1, 1 ). That is, it is a unit offset from the code to be encoded, and the matching length is 1. The matching length of the right sliding window, that is, moving 1 unit. As shown in the preceding figure, output (0, 0, B) is not found, and 1 ticket number is shifted to the right, for example, output (0, 0, C) is shifted to 1 unit right, for example, output (2, 1), shifts one unit to the right, for example, output (3, 1), shifts one unit to the right, for example, starts encoding "A", and finds matching strings in the Search Buffer. Continue encoding because the buffer to be encoded does not exceed. Start encoding "AB", also found. Do not stop. encode "ABC" and find the matching string. Because the encoding continues, the window is exceeded, so only the encoding "ABC", output (5, 3), offset 5, length 3. Shift three units to the right. If the buffer to be encoded is empty, stop encoding. The final output result is as follows: python code implementation: Copy code 1 class Lz77: 2 def _ init _ (self, inputStr): 3 self. inputStr = inputStr # input stream 4 self. searchSize = 5 # Search Buffer (Encoding Area) size 5 self. aheadSize = 3 # size of The lookAhead buffer (to be encoded) 6 self. windSpiltIndex = 0 # index starting from lookHead buffer 7 self. move = 0 8 self. notFind =-1 # No matching string 9 10 found # Get the final index of the sliding window 11 def getWinEndIndex (self): 12 return self. windSpiltIndex + self. aheadSize 13 14 # Get the start index of the Sliding Window 15 def getWinStartIndex (self): 16 return self. windSpiltIndex-self. searchSize 17 18 # determine whether the lookHead buffer is empty 19 def isLookHeadEmpty (self): 20 return True if self. windSpiltIndex + self. move> len (self. inputStr)-1 else False 21 22 def encoding (self): 23 step = 0 24 print ("Step Position Match Output") 25 while not self. isLookHeadEmpty (): 26 #1. sliding Window 27 self. winMove () 28 #2. get the offset value and length of the maximum matching string 29 (offset, matchLen) = self. findMaxMatch () 30 #3. set the sliding distance 31 self in the next step of the window. setMoveSteps (matchLen) 32 if matchLen = 0: 33 # The matching value is 0, indicating that no string match exists. output the next letter to be encoded 34 nextChar = self. inputStr [self. windSpiltIndex] 35 result = (step, self. windSpiltIndex, '-', '(0, 0)' + nextChar) 36 else: 37 result = (step, self. windSpiltIndex, self. inputStr [self. windSpiltIndex-offset: self. windSpiltIndex-offset + matchLen], '(' + str (offset) + ',' + str (matchLen) + ') 38 #4. output result 39 self. output (result) 40 step = step + 1 # only used to set the first step 41 42 43 # Sliding Window (moving demarcation point) 44 def winMove (self): 45 self. windSpiltIndex = self. windSpiltIndex + self. move 46 47 # Find the maximum matching character and return the offset value relative to the window demarcation point and the matching length 48 def findMaxMatch (self): 49 matchLen = 0 50 offset = 0 51 minEdge = self. minEdge () + 1 # obtain the right border 52 of the encoding region # traverse the code to be encoded and find the maximum matching string 53 for I in range (self. windSpiltIndex + 1, minEdge): 54 # print ("I: % d" % I) 55 offsetTemp = self. searchBufferOffest (I) 56 if offsetTemp = self. notFind: 57 return (offset, matchLen) 58 offset = offsetTemp # offset value 59 60 matchLen = matchLen + 1 # Add 1 61 62 return (offset, matchLen) 63 64 # Whether the input parameter string exists in the Search Buffer. If yes, the start index 65 def searchBufferOffest (self, I) matching the string is returned: 66 searchStart = self. getWinStartIndex () 67 searchEnd = self. windSpiltIndex 68 # The following if statements are special cases when processing starts: 69 if searchEnd <1: 70 return self. notFind 71 if searchStart <0: 72 searchStart = 0 73 if searchEnd = 0: 74 searchEnd = 1 75 searchStr = self. inputStr [searchStart: searchEnd] # search area string 76 findIndex = searchStr. find (self. inputStr [self. windSpiltIndex: I]) 77 if findIndex =-1: 78 return-1 79 return len (searchStr) -findIndex 80 81 # set the number of steps to slide in the next window. 82 def setMoveSteps (self, matchLen): 83 if matchLen = 0: 84 self. move = 1 85 else: 86 self. move = matchLen 87 88 89 def minEdge (self): 90 return len (self. inputStr) if len (self. inputStr)-1 <self. getWinEndIndex () else self. getWinEndIndex () + 1 91 92 def output (self, touple): 93 print ("% d % s" % touple) 94 95 96 97 98 if _ name _ = "_ main _": 99 lz77 = Lz77 ("AABCBBABC") 100 lz77.encoding ()

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.