Graphic Detail LZ77 compression algorithm coding Python implementation principle

Source: Internet
Author: User


Objective



The LZ77 algorithm is a lossless compression algorithm, published by the Israeli Abraham Lempel in 1977. LZ77 is a typical dictionary-based compression algorithm, and many compression techniques are now based on LZ77. In view of its position in the field of data compression, this article will be combined with the image and source details of its principles.



Principle Introduction:



Let's start with a few professional terms.



1.lookahead buffer (do not know how to express in Chinese, temporarily known as the coding Zone):



Regions waiting to be encoded



2. Search buffer:



Already coded area, search buffer



3. Sliding window:



Specifies the size of the window, including "Search buffer" (left) + "coding area" (right)



Next, describe the specific coding process:



In order to encode the area to encode, the encoder finds the matching string in the search buffer of the sliding window until it finds a match. The distance between the start string of the matching string and the buffer to be encoded is called the offset value, and the length of the matching string is called the match length. When the encoder is encoded, it searches the search area until the maximum match string is found and outputs (O, L), where O is the offset value and L is the matching length. Then the window slides L, continuing to start coding. If no matching string is found, the output (0, 0, c), C is the next character waiting to be encoded, and the window slides "1". The algorithm implementation will resemble the following:


while( lookAheadBuffer not empty )
 {
 get a pointer (position, match) to the longest match
 in the window for the lookAheadBuffer;
output a (position, length, char());
 shift the window length+1 characters along;
 }


The main steps are:



1. Set the encoding location as the start of the input stream



2. Find the maximum matching string in the search area in the sliding window



3. If a string is found, output (offset value, matching length), the window slides forward "match length"



4. If not found, output (0, 0, the first character to be coded), the window slides forward one unit



5. If the area you want to encode is not empty, go back to step 2



The description is too complex, or a combination of examples to explain it



Instance:



Now there is the string "AABCBBABC", which is now encoded.



At first, the window slides into position






As the graph shows, the buffer to be encoded has "AAB" three characters, at which time the search buffer is still empty. So the first character of the code, because the search area is empty, it can not find a matching string, output (0,0, a), the window to the right of a unit, such as






In this case, there is "ABC" in the coding area. Start coding. First encode "a" and find "a" in the search area. Because there is no more than the coding area, it begins to encode "AB", but no matching string is found in the search area, so it cannot be encoded. Therefore, only "A" can be encoded.



Output (1, 1). That is, relative to the area to encode, offset by one unit, the matching length is 1. The window right-swipe the matching length, which moves 1 units. Such as






Same, not found, output (0, 0, B), move right 1 number, as






Output (0, 0, C), move right 1 units, as






Output (2, 1), move right 1 units, as






Output (3, 1), move right 1 units, as






Begins coding "A" to find the matching string in the search buffer. Encoding continues because the buffer to be encoded is not exceeded. Start coding "AB", also search to. Do not stop, continue coding "ABC", find the matching string. As the encoding continues, it exceeds the window, so it encodes only "ABC", Output (5, 3), offset 5, length 3. Move 3 units to the right, such as






At this point the buffer to be encoded is empty and the encoding is stopped.



The resulting output is as follows






Python Code implementation:


class Lz77:
    def init (self, inputStr):
        self.inputStr = inputStr #input stream
        self.searchSize = 5 #Search buffer (coded area) size
        self.aheadSize = 3 #lookAhead buffer (area to be encoded) size
        self.windSpiltIndex = 0 #lookHead index of buffer start
        self.move = 0
        self.notFind = -1 #No matching string found

    #Get the end index of the sliding window
    def getWinEndIndex (self):
        return self.windSpiltIndex + self.aheadSize

    #Get the start index of the sliding window
    def getWinStartIndex (self):
        return self.windSpiltIndex-self.searchSize

    #Determine if the lookHead buffer is empty
    def isLookHeadEmpty (self):
        return True if self.windSpiltIndex + self.move> len (self.inputStr)-1 else False

    def encoding (self):
        step = 0
        print ("Step Position Match Output")
        while not self.isLookHeadEmpty ():
            # 1. Sliding window
            self.winMove ()
            # 2. Get the offset and length of the largest matching string
            (offset, matchLen) = self.findMaxMatch ()
            # 3. Set the distance the window needs to slide next
            self.setMoveSteps (matchLen)
            if matchLen == 0:
                #Match is 0, indicating no string match, output the next letter to be encoded
                nextChar = self.inputStr [self.windSpiltIndex]
                result = (step, self.windSpiltIndex, '-', '(0,0)' + nextChar)
            else:
                result = (step, self.windSpiltIndex, self.inputStr [self.windSpiltIndex-offset: self.windSpiltIndex-offset + matchLen], '(' + str (offset) + ',' + str (matchLen) + ')')
            # 4. Output results
            self.output (result)
            step = step + 1 #Used only to set the step

    #Sliding window (moving demarcation point)
    def winMove (self):
        self.windSpiltIndex = self.windSpiltIndex + self.move

    #Find the maximum matching character and return the offset value and matching length from the window demarcation point
    def findMaxMatch (self):
        matchLen = 0
        offset = 0
        minEdge = self.minEdge () + 1 #Get the right edge of the coding area
        #Iterate through the area to be encoded and find the maximum matching string
        for i in range (self.windSpiltIndex + 1, minEdge):
            #print ("i:% d"% i)
            offsetTemp = self.searchBufferOffest (i)
            if offsetTemp == self.notFind:
                return (offset, matchLen)
            offset = offsetTemp #offset value

            matchLen = matchLen + 1 # Every time a match is found, add 1

        return (offset, matchLen)

    #Input parameter string exists in the search buffer, if it exists, returns the starting index of the matching string
    def searchBufferOffest (self, i):
        searchStart = self.getWinStartIndex ()
        searchEnd = self.windSpiltIndex
        #The following ifs are special cases at the beginning of processing
        if searchEnd <1:
            return self.notFind
        if searchStart <0:
            searchStart = 0
            if searchEnd == 0:
                searchEnd = 1
        searchStr = self.inputStr [searchStart: searchEnd] #Search area string
        findIndex = searchStr.find (self.inputStr [self.windSpiltIndex: i])
        if findIndex == -1:
            return -1
        return len (searchStr)-findIndex

    #Set the number of steps to slide in the next window
    def setMoveSteps (self, matchLen):
        if matchLen == 0:
            self.move = 1
        else:
            self.move = matchLen

    def minEdge (self):
        return len (self.inputStr) if len (self.inputStr)-1 <self.getWinEndIndex () else self.getWinEndIndex () + 1

    def output (self, touple):
        print ("% d% d% s% s"% touple)

if name == "main":
    lz77 = Lz77 ("AABCBBABC")
    lz77.encoding () 


Just a simple write down, not too much to consider the details, please note that this is not the final code, just to illustrate the principle, for reference only. The output is the above output (format due to the Pit Daddy blog Garden fixed style, code position has offset, please note











Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.