In the previous article levenshtein Distance Algorithm Implementation, the author has explained the algorithm of the general minimum editing distance. The algorithm uses dynamic planning. The time complexity is O (M * n), M, and n are the lengths of two strings respectively, and the space complexity is O (M * n ), if int is used as the type of matrix element, the space occupied by the matrix is sizeof (INT) * m * n. If the length of the two strings is 10000 characters, the matrix size is 400 mb, which is considerable. Based on the implementation of a fast and efficient levenshtein algorithm, the author re-implemented the levenshtein distance algorithm. The main idea is to use two column vectors to replace the matrix, only the current state and the last operation state are saved each time. After the algorithm ends, the minimum editing distance between any sub-sequence of the two strings cannot be obtained. The algorithm is implemented using python. The Code is as follows:
#! /Usr/bin/ENV Python #-*-coding: UTF-8-*-_ author _ = 'xanxus' S1, S2 = raw_input ('string 1 :'), raw_input ('string 2: ') m, n = Len (S1), Len (S2) colsize, V1, V2 = m + 1, [], [] For I in range (n + 1): v1.append (I) v2.append (I) For I in range (m + 1) [1: m + 1]: for J in range (n + 1) [1: n + 1]: cost = 0 if S1 [I-1] = S2 [J-1]: cost = 0 else: cost = 1 minvalue = V1 [J] + 1 If minvalue> V2 [J-1] + 1: minvalue = v2 [J-1] + 1 If minvalue> V1 [J-1] + cost: minvalue = V1 [J-1] + cost V2 [J] = minvalue for J in range (n + 1): V1 [J] = v2 [J] print V2 [N]
Because memory allocation is reduced, the algorithm efficiency can be improved even if the time complexity remains unchanged.