Thinkphp3.2 Use SCWS chinese word segmentation extraction keyword, _php tutorial

Source: Internet
Author: User
Tags constant definition

Thinkphp3.2 Use SCWS Chinese word segmentation to extract keywords,

SCWS is the acronym for Simple Chinese Word segmentation (ie: Simplified Chinese word breaker).
1. Download the SCWS official class (PSCWS Fourth edition is used here)
http://www.xunsearch.com/scws/down/pscws4-20081221.tar.bz2
Download the xdb dictionary file (UTF8 Simplified Chinese dictionary pack is used here)
http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
2. Unzip the Scws class Pscws.class.php (Here I changed the pscws4.class.php file name to pscws.class.php) and XDB_R.class.php (here I put xdb_ r.class.php file names are replaced with uppercase XDB_R.class.php) placed under the Thinkphp\library\org\util directory.
3. Then modify the Pscws.class.php
Join a Namespace

1 namespace Org\util;

Change the name of the class to PSCWS

Put require_once (DirName (__file__). '/xbd_r.class.php '); This piece of code is removed.

Modify XDB_R.class.php
Join a Namespace

namespace Org\util;

4. Unzip the Xdb dictionary file
Create a new Dict folder under the Public\admin directory, and then extract the dict.utf8.xdb of the Xdb dictionary file into the Word directory, and then place the SCWS in the Rules.utf8.ini class under this directory.
5. Add a line of constant definition code in the portal file (in fact, the path to the definition dictionary file and configuration file)

Define ("Conf_path", DirName (__file__). " /public/admin/dict/");

6. Create a private method inside the IndexController.class.php controller for other methods to invoke

/**     * Chinese word breaker           * @params string $title words that need to be participle          * @params int $num  The number of participle, default does not fill     **/    private function Get_tags ($title, $num =null) {                $PSCWS = new \org\util\pscws (' UTF8 ');        $pscws->set_dict (Conf_path. ' Dict.utf8.xdb ');        $pscws->set_rule (Conf_path. ' Rules.utf8.ini ');        $pscws->set_ignore (true);        $pscws->send_text ($title);        $words = $pscws->get_tops ($num);        $pscws->close ();        $tags = Array ();        foreach ($words as $val) {            $tags [] = $val [' word '];        }        Return implode (', ', $tags);    }      /**     * Product Search results page     **/public    function Search () {        $rzt = $this->get_tags ("The new calf leather small pointed heel heel shoes 910033 Gray Sheep Huangliang ( 7.31 shipments);        Print_r ($rzt);    }

The results shown are:

Patent leather, shoes, pointed toe, high heel, new, delivery, 910033,7.31,39


http://www.bkjia.com/PHPjc/1063515.htmlwww.bkjia.comtruehttp://www.bkjia.com/PHPjc/1063515. htmlTechArticleThinkphp3.2 Use SCWS chinese word segmentation extract keywords, SCWS is simple Chinese Word segmentation acronym (ie: Simplified Chinese word breaker). 1. Download the SCWS officially provided class ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.