Implementation of the optimized levensthein Distance Algorithm

Source: Internet
Author: User

In the previous article levenshtein Distance Algorithm Implementation, the author has explained the algorithm of the general minimum editing distance. The algorithm uses dynamic planning. The time complexity is O (M * n), M, and n are the lengths of two strings respectively, and the space complexity is O (M * n ), if int is used as the type of matrix element, the space occupied by the matrix is sizeof (INT) * m * n. If the length of the two strings is 10000 characters, the matrix size is 400 mb, which is considerable. Based on the implementation of a fast and efficient levenshtein algorithm, the author re-implemented the levenshtein distance algorithm. The main idea is to use two column vectors to replace the matrix, only the current state and the last operation state are saved each time. After the algorithm ends, the minimum editing distance between any sub-sequence of the two strings cannot be obtained. The algorithm is implemented using python. The Code is as follows:

#! /Usr/bin/ENV Python #-*-coding: UTF-8-*-_ author _ = 'xanxus' S1, S2 = raw_input ('string 1 :'), raw_input ('string 2: ') m, n = Len (S1), Len (S2) colsize, V1, V2 = m + 1, [], [] For I in range (n + 1): v1.append (I) v2.append (I) For I in range (m + 1) [1: m + 1]: for J in range (n + 1) [1: n + 1]: cost = 0 if S1 [I-1] = S2 [J-1]: cost = 0 else: cost = 1 minvalue = V1 [J] + 1 If minvalue> V2 [J-1] + 1: minvalue = v2 [J-1] + 1 If minvalue> V1 [J-1] + cost: minvalue = V1 [J-1] + cost V2 [J] = minvalue for J in range (n + 1): V1 [J] = v2 [J] print V2 [N]
Because memory allocation is reduced, the algorithm efficiency can be improved even if the time complexity remains unchanged.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.