PHP Simple Chinese Word Code _php tutorial

Source: Internet
Author: User
Chinese search engine, Chinese word segmentation is the most fundamental part of the system, because the current word based Chinese search algorithm is not too good. Of course, this article is not to do research on Chinese search engine, but to share if you use PHP to do a site search engine. This article is an article in this system

The PHP class for Chinese word segmentation is below, using the Proc_open () function to execute the word breaker, and through the pipeline and its interaction, enter the text to be participle, read the word segmentation results.

Class nlp{
private static $cmd _path;

Do not end with '/'
static function Set_cmd_path ($path) {
Self:: $cmd _path = $path;
}

Private function cmd ($STR) {
$descriptorspec = Array (
0 = = Array ("Pipe", "R"),
1 = = Array ("Pipe", "w"),
);
$cmd = self:: $cmd _path. "/ictclas";
$process = Proc_open ($cmd, $descriptorspec, $pipes);

if (Is_resource ($process)) {
$str = Iconv (' utf-8 ', ' GBK ', $str);
Fwrite ($pipes [0], $STR);
$output = Stream_get_contents ($pipes [1]);

Fclose ($pipes [0]);
Fclose ($pipes [1]);

$return _value = Proc_close ($process);
}

/*
$cmd = "printf ' $input ' |". Self:: $cmd _path. "/ictclas";
EXEC ($cmd, $output, $ret);
$output = Join ("n", $output);
*/

$output = Trim ($output);
$output = Iconv (' GBK ', ' utf-8 ', $output);

return $output;
}

/**
* Word breaker, return word list.
*/
function Tokenize ($STR) {
$tokens = Array ();

$output = Self::cmd ($input);
if ($output) {
$ps tutorial = preg_split ('/s+/', $output);
foreach ($ps as $p) {
List ($seg, $tag) = explode ('/', $p);
$item = Array (
' Seg ' = $seg,
' Tag ' = $tag,
);
$tokens [] = $item;
}
}

return $tokens;
}
}
Nlp::set_cmd_path (DirName (__file__));
?>
Easy to use (make sure the Ictclas compiled executables and dictionaries are in the current directory):

Require_once (' nlp.php ');
Var_dump (nlp::tokenize (' Hello, world! '));
?>


Webmaster experience,

If you want to do search engine segmentation, the need for a strong thesaurus and more intelligent pinyin and writing, habits and other functions of the library.


http://www.bkjia.com/PHPjc/444776.html www.bkjia.com true http://www.bkjia.com/PHPjc/444776.html techarticle Chinese search engine, Chinese word segmentation is the most basic part of the whole system, because the Chinese search algorithm based on the word is not very good. Of course, this article is not to search for Chinese ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.