function code _php technique for calculating string similarity in PHP

Source: Internet
Author: User
Tags first string php class

Similar_text-calculates the similarity of two strings
int Similar_text (string $first, String $second [, float & $percent])
$first necessary. Specify the first string to compare.
$second necessary. Specify the second string to compare.
$percent Optional. A variable name that sets the percentage similarity for storage.

The similarity of the two strings is computed according to the description of Oliver [1993]. Note that the implementation does not use the stack in the Oliver virtual code, but it makes a recursive call, which can cause the entire process to slow down or become faster. Also note that the complexity of the algorithm is O (n**3), and N is the length of the longest string.

Like we're looking for string ABCDEFG and the similarity of the string AEG:

Copy Code code as follows:

$first = "ABCDEFG";
$second = "AEG";
Echo Similar_text ($first, $second); Results Output 3. If you want to display a percentage, you can use its third parameter, as follows:
$first = "ABCDEFG";
$second = "AEG";
Similar_text ($first, $second, $percent);
Echo $percent;


The use and implementation of Similar_text function. The Similar_text () function is primarily used to calculate the number of matching characters for two strings, or to calculate the similarity of two strings (in percent). The Levenshtein () function we are going to introduce today is faster than the Similar_text () function. However, the Similar_text () function can provide more precise results with fewer required modifications. You can consider using the Levenshtein () function when you are pursuing speed with less precision and with a limited string length.

Instructions for use

Look at the description of the Levenshtein () function in the manual:

The Levenshtein () function returns the Levenshtein distance between two strings.

Levenshtein distance, also called edit distance, refers to the minimum number of edits required between two strings and one converted to another. A licensed editing operation involves replacing one character with another, inserting a character, and deleting a character.

For example, convert kitten to sitting:

Sitten (K→s)
Sittin (E→i)
The sitting (→g) Levenshtein () function gives the same weight to each operation (replace, insert, and delete). However, you can define the cost of each operation by setting an optional insert, replace, delete parameter.

Grammar:

Levenshtein (String1,string2,insert,replace,delete)

Parameter description

string1 required. The first string to compare.
string2 required. The second string to compare.
Insert Optional. The cost of inserting a character. The default is 1.
Replace Optional. The cost of replacing one character. The default is 1.
Delete Optional. The cost of deleting a character. The default is 1.
Tips and comments

• If one of the strings is more than 255 characters, the Levenshtein () function returns-1.
The Levenshtein () function is insensitive to capitalization.
The Levenshtein () function is faster than the Similar_text () function. However, the Similar_text () function provides more precise results that require less modification.
Example

Copy Code code as follows:

<?php
Echo Levenshtein ("Hello World", "Ello World");
echo "<br/>";
Echo Levenshtein ("Hello World", "Ello World", 10,20,30);
?>

Output: 1 30

The following are supplementary:

PHP defaults to a function similar_text () to compute the similarity between strings, which can also calculate the similarity of two strings (in percent). But this function feels very inaccurate for Chinese computing, for example:

Copy Code code as follows:

Echo similar_text ("Jilin Poultry company Fire has killed 112 people", "Jilin Bao Yuan Fung poultry Industry company Fire has caused 112 people killed");

These two news headlines are in fact the same, if the use of Similar_text () similar to the result is: 42, that is only similar to 42%, so this feeling is very unreliable, today just collected a section of PHP code is used to compare the similarity of two strings, directly posted code:

<?php class LCS {var $str 1;
  var $str 2;
  var $c = array ();
    /* Returns the longest common subsequence of string one and string two * * Function Getlcs ($str 1, $str 2, $len 1 = 0, $len 2 = 0) {$this->str1 = $str 1;
    $this->str2 = $str 2;
    if ($len 1 = 0) $len 1 = strlen ($str 1);
    if ($len 2 = 0) $len 2 = strlen ($str 2);
    $this-&GT;INITC ($len 1, $len 2);
  return $this->printlcs ($this->c, $len 1-1, $len 2-1);
    }/* Returns the similarity of two strings */function Getsimilar ($str 1, $str 2) {$len 1 = strlen ($str 1);
    $len 2 = strlen ($str 2);
    $len = strlen ($this->getlcs ($str 1, $str 2, $len 1, $len 2));
  Return $len * 2/($len 1 + $len 2);
    function INITC ($len 1, $len 2) {for ($i = 0; $i < $len 1; $i + +) $this->c[$i][0] = 0;
    for ($j = 0; $j < $len 2; $j + +) $this->c[0][$j] = 0; for ($i = 1; $i < $len 1; $i + +) {for ($j = 1; $j < $len 2; $j + +) {if ($this->str1[$i] = $this->
        str2[$j]) {$this->c[$i] [$j] = $this->c[$i -1][$j-1] + 1; else if ($thisc[$i -1][$j] >= $this->c[$i [$j-1]) {$this->c[$i] [$j] = $this->c[$i -1][$j];
        else {$this->c[$i] [$j] = $this->c[$i] [$j-1]; function Printlcs ($c, $i, $j) {if ($i = = 0 | | $j = 0) {if ($this->str1[$i] = $this-
      >str2[$j]) return $this->str2[$j];
    else return ""; } if ($this->str1[$i] = = $this->str2[$j]) {return $this->printlcs ($this->c, $i-1, $j-1). $this-&
    gt;str2[$j]; else if ($this->c[$i -1][$j] >= $this->c[$i] [$j-1]) {return $this->printlcs ($this->c, $i-1, $
    j);
    else {return $this->printlcs ($this->c, $i, $j-1);
}} $lcs = new LCS ();
Returns the longest public subsequence $lcs->getlcs ("Hello word", "hello"); Return similarity echo $lcs->getsimilar ("Jilin Poultry Industry Company Fire has caused 112 deaths", "Jilin Bao Yuan Fung poultry Industry company Fire has caused 112 people killed");

The

also outputs the same result: 0.90322580645161, significantly more accurate.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.