Chinese search engine, Chinese word segmentation is the most fundamental part of the system, because the current word based Chinese search algorithm is not too good. Of course, this article is not to do research on Chinese search engine, but to share if you use PHP to do a site search engine. This article is an article in this system
The PHP class for Chinese word segmentation is below, using the Proc_open () function to execute the word breaker, and through the pipeline and its interaction, enter the text to be participle, read the word segmentation results.
Class nlp{
private static $cmd _path;
Do not end with '/'
static function Set_cmd_path ($path) {
Self:: $cmd _path = $path;
}
Private function cmd ($STR) {
$descriptorspec = Array (
0 = = Array ("Pipe", "R"),
1 = = Array ("Pipe", "w"),
);
$cmd = self:: $cmd _path. "/ictclas";
$process = Proc_open ($cmd, $descriptorspec, $pipes);
if (Is_resource ($process)) {
$str = Iconv (' utf-8 ', ' GBK ', $str);
Fwrite ($pipes [0], $STR);
$output = Stream_get_contents ($pipes [1]);
Fclose ($pipes [0]);
Fclose ($pipes [1]);
$return _value = Proc_close ($process);
}
/*
$cmd = "printf ' $input ' |". Self:: $cmd _path. "/ictclas";
EXEC ($cmd, $output, $ret);
$output = Join ("n", $output);
*/
$output = Trim ($output);
$output = Iconv (' GBK ', ' utf-8 ', $output);
return $output;
}
/**
* Word breaker, return word list.
*/
function Tokenize ($STR) {
$tokens = Array ();
$output = Self::cmd ($input);
if ($output) {
$ps tutorial = preg_split ('/s+/', $output);
foreach ($ps as $p) {
List ($seg, $tag) = explode ('/', $p);
$item = Array (
' Seg ' = $seg,
' Tag ' = $tag,
);
$tokens [] = $item;
}
}
return $tokens;
}
}
Nlp::set_cmd_path (DirName (__file__));
?>
Easy to use (make sure the Ictclas compiled executables and dictionaries are in the current directory):
Require_once (' nlp.php ');
Var_dump (nlp::tokenize (' Hello, world! '));
?>
Webmaster experience,
If you want to do search engine segmentation, the need for a strong thesaurus and more intelligent pinyin and writing, habits and other functions of the library.
http://www.bkjia.com/PHPjc/444776.html www.bkjia.com true http://www.bkjia.com/PHPjc/444776.html techarticle Chinese search engine, Chinese word segmentation is the most basic part of the whole system, because the Chinese search algorithm based on the word is not very good. Of course, this article is not to search for Chinese ...