PHP chinese Word segmentation automatic keyword Introduction _php instance

Source: Internet
Author: User
Copy CodeThe code is as follows:
Header ("content-type:text/html; Charset=utf-8 ");
Define (' App_root ', str_replace (' \ \ ', '/', dirname (__file__)));
$test = ' Here is a Chinese test code! ';
function Get_tags_arr ($title)
{
Require (app_root. ' /pscws4.class.php ');
$PSCWS = new PSCWS4 ();
$pscws->set_dict (app_root. ' /scws/dict.utf8.xdb ');
$pscws->set_rule (app_root. ' /scws/rules.utf8.ini ');
$pscws->set_ignore (TRUE);
$pscws->send_text ($title);
$words = $pscws->get_tops (5);
$tags = Array ();
foreach ($words as $val) {
$tags [] = $val [' word '];
}
$pscws->close ();
return $tags;
}
Print_r (Get_tags_arr ($test));
//============================================================
function Get_keywords_str ($content) {
Require (app_root. ' /phpanalysis.class.php ');
Phpanalysis:: $loadInit = false;
$pa = new Phpanalysis (' Utf-8 ', ' utf-8 ', false);
$pa->loaddict ();
$pa->setsource ($content);
$pa->startanalysis (FALSE);
$tags = $pa->getfinallyresult ();
return $tags;
}
Print (Get_keywords_str ($test));

Relevant download Address

scws– Simple Chinese Word segmentation system

SCWS in the concept of non-innovative ingredients, using a self-collected word frequency dictionary, supplemented by a certain degree of the name, names, place names, digital age and other rules set, through a small range of the approximate accuracy rate of 90% ~ 95%, has been able to meet some small and medium-sized search engines, keyword extraction and other occasions. SCWS uses pure C code development, to Unix-like OS as the main platform environment, to provide a shared function library to facilitate the implantation of various existing software systems. In addition, it supports GBK,UTF-8,BIG5, such as Chinese character coding, and high efficiency of cutting words.

System platform: Windows/unix
Development language: C
How to use: PHP extension

Demo URL: http://www.ftphp.com/scws/demo.php
Open Source Official website: http://www.ftphp.com/scws/

Sunny Maple Note: As a php extension, it is easy to continue to integrate with existing PHP-based web systems, which is a big advantage.

phpanalysis-php non-component word breaker system

Phpanalysis Word segmentation system is based on string matching word segmentation method, this method is also called mechanical word segmentation method, it is to be analyzed in accordance with a certain strategy of the Chinese character string and a "full" machine Dictionary of the entry to match, if a string found in the dictionary, the matching success (recognize a word). According to the scanning direction, the string matching segmentation method can be divided into positive matching and inverse matching, according to the case of different length priority matching, can be divided into maximum (longest) match and minimum (shortest) match, according to whether with the part of speech labeling process, but also can be divided into simple word segmentation method and the combination of word segmentation and labeling integration method.

System Platform: PHP Environment

Development language: PHP

How to use: HTTP service

Demo URL: http://www.itgrass.com/phpanalysis/
Open Source Official website: http://www.itgrass.com/phpanalysis/

Sunny Maple Note: Simple, easy to use, can do some simple applications, but the large data volume calculation efficiency is not as good as the previous several.

Try a few systems, the basic word segmentation function is not a problem, but in the division of some individual words there are some differences, for the determination of the part of speech, the system is different.

Http://www.php.net/codes/40139.html

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.