We have used scws and phpanalysis, which are well known. For more information, see.
The code is as follows:
Header ("Content-Type: text/html; charset = utf-8 ");
Define ('app _ root', str_replace ('\', '/', dirname (_ FILE __)));
$ Test = 'Here is a Chinese test code! ';
Function get_tags_arr ($ title)
{
Require (APP_ROOT. '/pscws4.class. php ');
$ Pscws = new PSCWS4 ();
$ Pscws-> set_dict (APP_ROOT. '/scws/dict. utf8.xdb ');
$ Pscws-> set_rule (APP_ROOT. '/scws/rules. utf8.ini ');
$ Pscws-> set_ignore (true );
$ Pscws-> send_text ($ title );
$ Words = $ pscws-> get_tops (5 );
$ Tags = array ();
Foreach ($ words as $ val ){
$ Tags [] = $ val ['word'];
}
$ Pscws-> close ();
Return $ tags;
}
Print_r (get_tags_arr ($ test ));
// ================================================ ======================================
Function get_keywords_str ($ content ){
Require (APP_ROOT. '/phpanalysis. class. php ');
PhpAnalysis: $ loadInit = false;
$ Pa = new PhpAnalysis ('utf-8', 'utf-8', false );
$ Pa-> LoadDict ();
$ Pa-> SetSource ($ content );
$ Pa-> StartAnalysis (false );
$ Tags = $ pa-> GetFinallyResult ();
Return $ tags;
}
Print (get_keywords_str ($ test ));
Related
SCWS-simple Chinese word segmentation system
In terms of concept, SCWS does not have any innovative components. it uses a word frequency dictionary collected by itself, supplemented by a certain set of rules such as proprietary names, names, place names, and digital ages, the accuracy of a small-scale test is approximately 90% ~ Between 95%, can basically meet the needs of some small and medium-sized search engines, Keyword extraction and other occasions. SCWS is developed using pure C code. it uses Unix-Like OS as the main platform environment and provides a shared function library to facilitate the implantation of various existing software systems. In addition, it supports GBK, UTF-8, BIG5 and other Chinese character encoding, word segmentation efficiency is high.
System Platform: Windows/Unix
Development Language: C
Usage: PHP extension
Demo URL: http://www.ftphp.com/scws/demo.php
Open Source official website: http://www.ftphp.com/scws/
Qingfeng notes: as a PHP extension, it is easy to continue integration with the existing PHP-based Web system, which is a major advantage.
PhpanAlysis-PHP component-less word splitting system
The PhpanAlysis word segmentation system is a string-matching word segmentation method. this method is also called the mechanical word segmentation method, it matches the Chinese character string to be analyzed with the entry in a "sufficiently large" machine dictionary according to certain policies. if a string is found in the dictionary, the match is successful (a word is recognized ). According to the scanning direction, the string matching and word segmentation methods can be divided into forward matching and reverse matching. according to the priority matching of different lengths, they can be divided into maximum (longest) matching and minimum (shortest) matching; based on whether it is combined with the part-of-speech tagging process, it can be divided into a simple word segmentation method and an integrated method combining word segmentation and tagging.
System platform: PHP environment
Development Language: PHP
Usage: HTTP service
Demo URL: http://www.itgrass.com/phpanalysis/
Open Source official website: http://www.itgrass.com/phpanalysis/
Qingfeng notes: It is easy to implement and easy to use. it can be used for some simple applications, but the computing efficiency of large data volumes is not as high as that of the previous ones.
I tried several systems and found that the basic word segmentation function is okay, but there are some differences in the division of some words. for part-of-speech determination, there are differences between systems.
Http://www.bitsCN.com/codes/40139.html