Copy CodeThe code is as follows:
Header ("content-type:text/html; Charset=utf-8 ");
Define (' App_root ', str_replace (' \ \ ', '/', dirname (__file__)));
$test = ' Here is a Chinese test code! ';
function Get_tags_arr ($title)
{
Require (app_root. ' /pscws4.class.php ');
$PSCWS = new PSCWS4 ();
$pscws->set_dict (app_root. ' /scws/dict.utf8.xdb ');
$pscws->set_rule (app_root. ' /scws/rules.utf8.ini ');
$pscws->set_ignore (TRUE);
$pscws->send_text ($title);
$words = $pscws->get_tops (5);
$tags = Array ();
foreach ($words as $val) {
$tags [] = $val [' word '];
}
$pscws->close ();
return $tags;
}
Print_r (Get_tags_arr ($test));
//============================================================
function Get_keywords_str ($content) {
Require (app_root. ' /phpanalysis.class.php ');
Phpanalysis:: $loadInit = false;
$pa = new Phpanalysis (' Utf-8 ', ' utf-8 ', false);
$pa->loaddict ();
$pa->setsource ($content);
$pa->startanalysis (FALSE);
$tags = $pa->getfinallyresult ();
return $tags;
}
Print (Get_keywords_str ($test));
Relevant download Address
scws– Simple Chinese Word segmentation system
SCWS in the concept of non-innovative ingredients, using a self-collected word frequency dictionary, supplemented by a certain degree of the name, names, place names, digital age and other rules set, through a small range of the approximate accuracy rate of 90% ~ 95%, has been able to meet some small and medium-sized search engines, keyword extraction and other occasions. SCWS uses pure C code development, to Unix-like OS as the main platform environment, to provide a shared function library to facilitate the implantation of various existing software systems. In addition, it supports GBK,UTF-8,BIG5, such as Chinese character coding, and high efficiency of cutting words.
System platform: Windows/unix
Development language: C
How to use: PHP extension
Demo URL: http://www.ftphp.com/scws/demo.php
Open Source Official website: http://www.ftphp.com/scws/
Sunny Maple Note: As a php extension, it is easy to continue to integrate with existing PHP-based web systems, which is a big advantage.
phpanalysis-php non-component word breaker system
Phpanalysis Word segmentation system is based on string matching word segmentation method, this method is also called mechanical word segmentation method, it is to be analyzed in accordance with a certain strategy of the Chinese character string and a "full" machine Dictionary of the entry to match, if a string found in the dictionary, the matching success (recognize a word). According to the scanning direction, the string matching segmentation method can be divided into positive matching and inverse matching, according to the case of different length priority matching, can be divided into maximum (longest) match and minimum (shortest) match, according to whether with the part of speech labeling process, but also can be divided into simple word segmentation method and the combination of word segmentation and labeling integration method.
System Platform: PHP Environment
Development language: PHP
How to use: HTTP service
Demo URL: http://www.itgrass.com/phpanalysis/
Open Source Official website: http://www.itgrass.com/phpanalysis/
Sunny Maple Note: Simple, easy to use, can do some simple applications, but the large data volume calculation efficiency is not as good as the previous several.
Try a few systems, the basic word segmentation function is not a problem, but in the division of some individual words there are some differences, for the determination of the part of speech, the system is different.
Http://www.php.net/codes/40139.html