Thinkphp3.2 Use SCWS Chinese word segmentation to extract keywords,
SCWS is the acronym for Simple Chinese Word segmentation (ie: Simplified Chinese word breaker).
1. Download the SCWS official class (PSCWS Fourth edition is used here)
http://www.xunsearch.com/scws/down/pscws4-20081221.tar.bz2
Download the xdb dictionary file (UTF8 Simplified Chinese dictionary pack is used here)
http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
2. Unzip the Scws class Pscws.class.php (Here I changed the pscws4.class.php file name to pscws.class.php) and XDB_R.class.php (here I put xdb_ r.class.php file names are replaced with uppercase XDB_R.class.php) placed under the Thinkphp\library\org\util directory.
3. Then modify the Pscws.class.php
Join a Namespace
1 namespace Org\util;
Change the name of the class to PSCWS
Put require_once (DirName (__file__). '/xbd_r.class.php '); This piece of code is removed.
Modify XDB_R.class.php
Join a Namespace
namespace Org\util;
4. Unzip the Xdb dictionary file
Create a new Dict folder under the Public\admin directory, and then extract the dict.utf8.xdb of the Xdb dictionary file into the Word directory, and then place the SCWS in the Rules.utf8.ini class under this directory.
5. Add a line of constant definition code in the portal file (in fact, the path to the definition dictionary file and configuration file)
Define ("Conf_path", DirName (__file__). " /public/admin/dict/");
6. Create a private method inside the IndexController.class.php controller for other methods to invoke
/** * Chinese word breaker * @params string $title words that need to be participle * @params int $num The number of participle, default does not fill **/ private function Get_tags ($title, $num =null) { $PSCWS = new \org\util\pscws (' UTF8 '); $pscws->set_dict (Conf_path. ' Dict.utf8.xdb '); $pscws->set_rule (Conf_path. ' Rules.utf8.ini '); $pscws->set_ignore (true); $pscws->send_text ($title); $words = $pscws->get_tops ($num); $pscws->close (); $tags = Array (); foreach ($words as $val) { $tags [] = $val [' word ']; } Return implode (', ', $tags); } /** * Product Search results page **/public function Search () { $rzt = $this->get_tags ("The new calf leather small pointed heel heel shoes 910033 Gray Sheep Huangliang ( 7.31 shipments); Print_r ($rzt); }
The results shown are:
Patent leather, shoes, pointed toe, high heel, new, delivery, 910033,7.31,39
http://www.bkjia.com/PHPjc/1063515.htmlwww.bkjia.comtruehttp://www.bkjia.com/PHPjc/1063515. htmlTechArticleThinkphp3.2 Use SCWS chinese word segmentation extract keywords, SCWS is simple Chinese Word segmentation acronym (ie: Simplified Chinese word breaker). 1. Download the SCWS officially provided class ...