PHP improved calculation of string similarity function Similar_text (), Levenshtein (), Levenshtein
Similar_text () Chinese character edition
Copy the Code code as follows:
<?php
Splitting a string
function Split_str ($STR) {
Preg_match_all ("/./u", $str, $arr);
return $arr [0];
}
Similarity detection
function Similar_text_cn ($str 1, $str 2) {
$arr _1 = Array_unique (Split_str ($str 1));
$arr _2 = Array_unique (Split_str ($str 2));
$similarity = count ($arr _2)-Count (Array_diff ($arr _2, $arr _1));
return $similarity;
}
Levenshtein () Chinese character edition
Copy CodeThe code is as follows:
<?php
Splitting a string
function Mbstringtoarray ($string, $encoding = ' UTF-8 ') {
$arrayResult = Array ();
while ($iLen = Mb_strlen ($string, $encoding)) {
Array_push ($arrayResult, Mb_substr ($string, 0, 1, $encoding));
$string = Mb_substr ($string, 1, $iLen, $encoding);
}
return $arrayResult;
}
Edit Distance
function Levenshtein_cn ($str 1, $str 2, $costReplace = 1, $encoding = ' UTF-8 ') {
$count _same_letter = 0;
$d = Array ();
$MB _len1 = Mb_strlen ($str 1, $encoding);
$MB _len2 = Mb_strlen ($str 2, $encoding);
$MB _str1 = Mbstringtoarray ($str 1, $encoding);
$MB _str2 = Mbstringtoarray ($str 2, $encoding);
for ($i 1 = 0; $i 1 <= $mb _len1; $i 1++) {
$d [$i 1] = array ();
$d [$i 1][0] = $i 1;
}
for ($i 2 = 0; $i 2 <= $mb _len2; $i 2++) {
$d [0][$i 2] = $i 2;
}
for ($i 1 = 1; $i 1 <= $mb _len1; $i 1++) {
for ($i 2 = 1; $i 2 <= $mb _len2; $i 2++) {
$cost = ($str 1[$i 1-1] = = $str 2[$i 2-1])? 0:1;
if ($MB _str1[$i 1-1] = = = $MB _str2[$i 2-1]) {
$cost = 0;
$count _same_letter++;
} else {
$cost = $costReplace; Replace
}
$d [$i 1][$i 2] = min ($d [$i 1-1][$i 2] + 1,//insert
$d [$i 1][$i 2-1] + 1,//delete
$d [$i 1-1][$i 2-1] + $cost);
}
}
return $d [$MB _len1][$mb _len2];
return array (' distance ' = $d [$mb _len1][$mb _len2], ' count_same_letter ' = $count _same_letter);
}
Longest common sub-sequence LCS ()
Copy CodeThe code is as follows:
<?php
Longest common sub-sequence English version
function Lcs_en ($str _1, $str _2) {
$len _1 = strlen ($str _1);
$len _2 = strlen ($str _2);
$len = $len _1 > $len _2? $len _1: $len _2;
$DP = Array ();
for ($i = 0; $i <= $len; $i + +) {
$DP [$i] = array ();
$DP [$i][0] = 0;
$DP [0][$i] = 0;
}
for ($i = 1; $i <= $len _1; $i + +) {
for ($j = 1; $j <= $len _2; $j + +) {
if ($str _1[$i-1] = = $str _2[$j-1]) {
$DP [$i] [$j] = $DP [$i -1][$j-1] + 1;
} else {
$DP [$i] [$j] = $DP [$i -1][$j] > $DP [$i] [$j-1]? $DP [$i -1][$j]: $DP [$i] [$j-1];
}
}
}
return $DP [$len _1][$len _2];
}
Splitting a string
function Mbstringtoarray ($string, $encoding = ' UTF-8 ') {
$arrayResult = Array ();
while ($iLen = Mb_strlen ($string, $encoding)) {
Array_push ($arrayResult, Mb_substr ($string, 0, 1, $encoding));
$string = Mb_substr ($string, 1, $iLen, $encoding);
}
return $arrayResult;
}
Longest common sub-sequence Chinese version
function Lcs_cn ($str 1, $str 2, $encoding = ' UTF-8 ') {
$MB _len1 = Mb_strlen ($str 1, $encoding);
$MB _len2 = Mb_strlen ($str 2, $encoding);
$MB _str1 = Mbstringtoarray ($str 1, $encoding);
$MB _str2 = Mbstringtoarray ($str 2, $encoding);
$len = $mb _len1 > $MB _len2? $MB _len1: $MB _len2;
$DP = Array ();
for ($i = 0; $i <= $len; $i + +) {
$DP [$i] = array ();
$DP [$i][0] = 0;
$DP [0][$i] = 0;
}
for ($i = 1; $i <= $mb _len1; $i + +) {
for ($j = 1; $j <= $mb _len2; $j + +) {
if ($MB _str1[$i-1] = = $MB _str2[$j-1]) {
$DP [$i] [$j] = $DP [$i -1][$j-1] + 1;
} else {
$DP [$i] [$j] = $DP [$i -1][$j] > $DP [$i] [$j-1]? $DP [$i -1][$j]: $DP [$i] [$j-1];
}
}
}
return $DP [$MB _len1][$mb _len2];
}
(100 points) [PHP] Write some of your familiar string handler functions!
Addcslashes addslashes bin2hex Chop CHR chunk_split convert_cyr_string Cyrillic
Convert_uudecode convert_uuencode count_chars crc32 crc32 crypt echo explode
fprintf get_html_translation_table Hebrev
Hebrevc
Hex2bin-decodes a hexadecimally encoded binary string
Html_entity_decode-convert all HTML entities to their applicable characters
Htmlentities-convert all applicable characters to HTML entities
Htmlspecialchars_decode-convert Special HTML entities back to characters
Htmlspecialchars-convert special characters to HTML entities
Implode-join array elements with a string
Join
Lcfirst-make A string ' s first character lowercase
Levenshtein-calculate Levenshtein distance between, strings
Localeconv-get Numeric formatting information
Ltrim-strip whitespace (or other characters) from the beginning of a string
Md5_file
Metaphone-calculate the Metaphone key of a string
Money_format-formats a number as a currency string
Nl_langinfo-query Language and locale information
Nl2br
Number_format-format a number with grouped thousands
Ord
Parse_str
Print
Printf
Quoted_printable_decode-convert a quoted-printable string to an 8 bit string
Quoted_printable_encode-convert a 8 bit string to a quoted-printable string
Quotemeta-quote Meta characters
RTrim
Setlocale-set locale Information
Sha1_file
Sha1
Soundex-calculate the Soundex key of a string
Sprintf-return a formatted string
Sscanf-parses input from a string according to a ... Remaining full text >>
For PHP Levenshtein function can give a plain explanation, the manual can't understand
W3school's explanation:
The Levenshtein () function returns the Levenshtein distance between two strings.
Levenshtein distance, also known as the editing distance, refers to the minimum number of edit operations required between two strings, converted from one to another. Permission edits include replacing one character with another character, inserting a character, and deleting a character.
For example, convert kitten to sitting:
Sitten (K→s)
Sittin (E→i)
Sitting (→G)
The Levenshtein () function gives the same weight for each operation (replace, insert, and delete). However, you can define the cost of each operation by setting the optional Insert, replace, and delete parameters.
Note: the "cost" is the weight. Landlord's example, Hello World→ello World, need to "delete" "H", that is, the fifth parameter, the corresponding weight is 30, so return 30.
http://www.bkjia.com/PHPjc/901291.html www.bkjia.com true http://www.bkjia.com/PHPjc/901291.html techarticle PHP Improved calculation of string similarity function Similar_text (), Levenshtein (), Levenshtein Similar_text () Chinese character copy code code is as follows: PHP//split String function Split ...