String functions-can the levenshtein function of php be easily understood!

Source: Internet
Author: User
For example, the more detailed the question, the better. Thank you! {Code...}. It only requires adding & #039; H & #039; in the second parameter. Only one step is required! Of course, Return & #039; 1 & #039! This function is quite simple, but: {code...} 3rd parameters: the cost of inserting a character. The default value is 1. 4th parameters: replace 1... with the question. The more detailed the question, the better. Thank you!

levenshtein("Hello World","ello World");

It only needs to add a 'H' in the second parameter, and only takes one step! Of course, '1' is returned!
This function is quite simple,:

levenshtein("Hello World","ello World",10,20,30);

3rd parameters: the cost of inserting a character. The default value is 1.
4th parameters: the cost of replacing a character. The default value is 1.
5th parameters: the cost of deleting a character. The default value is 1.
Where do they mean?
In this example, it is set to 10, 20, and 30 respectively.
Then return '30'. I don't understand it!
What does it mean by 'price?
What does 10, 20, and 30 respectively mean?

levenshtein('aaa','aab',0,1,0);

In this example, it only needs to be replaced once. Why is the number of returned steps '0 '?

Reply content:

For example, the more detailed the question, the better. Thank you!

levenshtein("Hello World","ello World");

It only needs to add a 'H' in the second parameter, and only takes one step! Of course, '1' is returned!
This function is quite simple,:

levenshtein("Hello World","ello World",10,20,30);

3rd parameters: the cost of inserting a character. The default value is 1.
4th parameters: the cost of replacing a character. The default value is 1.
5th parameters: the cost of deleting a character. The default value is 1.
Where do they mean?
In this example, it is set to 10, 20, and 30 respectively.
Then return '30'. I don't understand it!
What does it mean by 'price?
What does 10, 20, and 30 respectively mean?

levenshtein('aaa','aab',0,1,0);

In this example, it only needs to be replaced once. Why is the number of returned steps '0 '?

@ Yi Hongyu: I have answered a lot. I 'd like to add to my question from the underlying implementation.levenshtein('aaa','aab',0,1,0);In this example, why is 0 returned?

The algorithm used at the underlying layer of PHP is the classic matrix method (slightly changed ).s1Ands2As a matrix line (i[0,m]) And columns (j[0,n]), And each location is compared in order. If they are equalcost=0(Because no operation is required), otherwisecost=1(Thiscost=1Is the default cost operation when we do not pass the following three parameters ).M[i,j]The value cannot be equal to the cost directly, because it must ensure the transmission of the previous operation (for example, if you have inserted one character in front, the subsequent characters must be moved to the next place; you have inserted and deleted two characters in front, and you have to move one character forward ),M[i,j]The value is equalM[i-1, j]+1,M[i, j-1]+1,M[i-1, j-1]+costThe smallest of the three values (the three values indicate the cost of insertion, replacement, and deletion respectively, and the minimum value indicates that the operation cost is the smallest ). This way until the item is calculatedM[m, n]The value is the "edit distance" we need. (Finally, I posted the php implementation corresponding to the php underlying c code)

levenshtein('aaa','aab',1,1,1);Cell [3, 3] is the final result:

A A A
0 1 2 3
A 1 0 1 2
A 2 1 0 1
B 3 2 1 1

levenshtein('aaa','aab',0,1,0);Cell [3, 3] for the final result (because [] = 0, and the inserted cost is set to 0, resulting in the following M [I, J-1] results are 0, 0 is the minimum value, resulting in the final return of 0 ):

A A A
0 1 2 3
A 1 0 0 0
A 2 0 0 0
B 3 0 0 0

It can be seen that the three parameters passed after levenshtein are the current cost of the corresponding insert, replace, and delete operations (that is, to replace the above parameters)M[i,j]The value is followed by 1). From the algorithm perspective, the minimum unit of cost for any operation is 1. If we want to get a "reasonable" return value, the value 0 cannot be passed. If 0 is passed, an unreasonable result is returned, or the result has no actual reference value. Of course, this is closely related to the actual algorithm used. In this example, the best operation should be to replace (replace B in aab with a), but according to the algorithm used by PHP, the minimum value in the three operation modes is used, and the transfer is required, resulting in the final result being 0.

# Function levenshtein_php for php underlying c code ($ s1, $ s2, $ cost_ins = 1, $ cost_rep = 1, $ cost_del = 1) {$ l1 = strlen ($ s1); $ l2 = strlen ($ s2); if ($ l1 = 0) {return $ l2 * $ cost_ins ;} if ($ l2 = 0) {return $ l1 * $ cost_del;} $ p1 = array (); $ p2 = array (); for ($ i2 = 0; $ i2 <= $ l2; $ i2 ++) {$ p1 [$ i2] = $ i2 * $ cost_ins;} for ($ i1 = 0; $ i1 <$ l1; $ i1 ++) {$ p2 [0] = $ p1 [0] + $ cost_del; for ($ i2 = 0; $ i2 <$ l2; $ i2 ++) {$ C0 = $ p1 [$ i2] + ($ s1 [$ i1] = $ s2 [$ i2])? 0: $ cost_rep); $ c1 = $ p1 [$ i2 + 1] + $ cost_del; if ($ c1 <$ c0) {$ c0 = $ c1 ;} $ c2 = $ p2 [$ i2] + $ cost_ins; if ($ c2 <$ c0) {$ c0 = $ c2 ;} $ p2 [$ i2 + 1] = $ c0;} $ tmp = $ p1; $ p1 = $ p2; $ p2 = $ tmp ;} $ c0 = $ p1 [$ l2]; return $ c0;} echo levenshtein_php ('aaa', 'aab', 1, 10, 1 ));

PS: in fact, Billy's algorithm with better matrix structure is not involved in this question. I don't have much algorithm base. Try to analyze it. Thank you for your criticism!

In general, it is to detect the similarity between two strings. The fewer steps a string can take to become another string, the more similar it will be.

$a = "levenshtein";$b = "levenjdslkfjslkdjfklsjdfljsdlfjsldfjlsdjflsdjltein";$c = "leveshetin";$r = levenshtein($a, $b); //int(40)$s = levenshtein($a, $c); //int(3)

Slave$aChange$bYou need to add 40 characters in the middle, from$aChange$cYou need to add 2 Characters and delete 1 character, so it is 3.

The so-called cost is the weight/proportion of a specific operation. For example, if you set the cost of deleting characters to 30, what is returned after one deletion is1*30. By setting this parameter, you can try to do more operations to avoid a specific operation. As for the next one, I personally understand this. The so-called replacement is actually a combination of two steps: "Delete" and "add, if you set add and delete to 0, you are not allowed to perform these two operations, and you cannot perform the replacement operation. If you add or delete a non-0 value, 1 is always returned. Of course, this is my personal idea. If there is something wrong with it, you can correct it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.