The Poor's semantic processing Toolbox two: Semantic editing distance __ Natural language Processing

Source: Internet
Author: User
/* Copyright NOTICE: You can reprint, reprint, please indicate the original source of the article and author information.

Author: Zhang Junlin


The semantic editing distance was actually done with the semantic Jaccard last year, and the subject matter of this article was written last year. The reason I see it now is that I can almost see the bottom of my inventory article, or maybe it will be a long time before it comes out. In general, I will write a few saved as a stock, this is to try to do more weeks, to avoid the energy of writing when the hand is not on the stock for standby. The power to write things in the last one months has been drastically attenuated, so did not write any new articles, can only be used in inventory to rescue, visible perseverance to do a laborious and no obvious benefits of things is really not easy, but this year as far as possible to do every Monday more, also be a kind of exercise for their own.


Why here we say is the poor semantic processing toolbox. At the beginning of the article "One of the Poor's semantic Processing Toolbox: Semantic Edition Jaccard" We explained why, and we will not dwell on it here. We go straight to the subject.


| editing distance (edit Distance)


Editing distances are proposed by Vladimir Levenshtein, a Russian scientist, so the editing distance is also called Levenshtein distance. This is a very common metric tool for calculating two string similarities. What it means is the minimum number of edits required to convert a string to another string, and here "edit" typically contains three actions: inserting a character, deleting a character, and replacing a character with another character. Assuming that the cost of each operation is 1, then a string is transformed through the above three operations until the minimum number of edits to another string is the edit distance: N.


For example, if we now have two strings in hand: Edit and Red, convert edit to Red by doing the following:

Step1:edit->redit (Insert R)

Step 2:redit->redt (Delete i)

Step 3:redt->red (Delete t)


A total of three operations, assuming that each operation cost is the same, are 1, then edit and red editing distance is 3.


The edit distance is a typical example of a dynamic programming scenario, given the two strings A and B to compare, we define:


Editdistance (I,J): The length of string A is a substring of I (that is, a substring consisting of the 1th character and the first character in a) and a substring with a string B length of J (that is, a substring of the 1th character to the J character in B).

Then you can define the editing distance recursively as follows:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.