Introduction to PHP Chinese word segmentation automatic keyword acquisition

Last Update:2018-04-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We have used scws and phpanalysis, which are well known. For more information, see. The code is as follows:
Header ("Content-Type: text/html; charset = utf-8 ");
Define ('app _ root', str_replace ('\', '/', dirname (_ FILE __)));
$ Test = 'Here is a Chinese test code! ';
Function get_tags_arr ($ title)
{
Require (APP_ROOT. '/pscws4.class. php ');
$ Pscws = new PSCWS4 ();
$ Pscws-> set_dict (APP_ROOT. '/scws/dict. utf8.xdb ');
$ Pscws-> set_rule (APP_ROOT. '/scws/rules. utf8.ini ');
$ Pscws-> set_ignore (true );
$ Pscws-> send_text ($ title );
$ Words = $ pscws-> get_tops (5 );
$ Tags = array ();
Foreach ($ words as $ val ){
$ Tags [] = $ val ['word'];
}
$ Pscws-> close ();
Return $ tags;
}
Print_r (get_tags_arr ($ test ));
// ================================================ ======================================
Function get_keywords_str ($ content ){
Require (APP_ROOT. '/phpanalysis. class. php ');
PhpAnalysis: $ loadInit = false;
$ Pa = new PhpAnalysis ('utf-8', 'utf-8', false );
$ Pa-> LoadDict ();
$ Pa-> SetSource ($ content );
$ Pa-> StartAnalysis (false );
$ Tags = $ pa-> GetFinallyResult ();
Return $ tags;
}
Print (get_keywords_str ($ test ));

Related

SCWS-simple Chinese word segmentation system

In terms of concept, SCWS does not have any innovative components. it uses a word frequency dictionary collected by itself, supplemented by a certain set of rules such as proprietary names, names, place names, and digital ages, the accuracy of a small-scale test is approximately 90% ~ Between 95%, can basically meet the needs of some small and medium-sized search engines, Keyword extraction and other occasions. SCWS is developed using pure C code. it uses Unix-Like OS as the main platform environment and provides a shared function library to facilitate the implantation of various existing software systems. In addition, it supports GBK, UTF-8, BIG5 and other Chinese character encoding, word segmentation efficiency is high.

System Platform: Windows/Unix
Development Language: C
Usage: PHP extension

Demo URL: http://www.ftphp.com/scws/demo.php
Open Source official website: http://www.ftphp.com/scws/

Qingfeng notes: as a PHP extension, it is easy to continue integration with the existing PHP-based Web system, which is a major advantage.

PhpanAlysis-PHP component-less word splitting system

The PhpanAlysis word segmentation system is a string-matching word segmentation method. this method is also called the mechanical word segmentation method, it matches the Chinese character string to be analyzed with the entry in a "sufficiently large" machine dictionary according to certain policies. if a string is found in the dictionary, the match is successful (a word is recognized ). According to the scanning direction, the string matching and word segmentation methods can be divided into forward matching and reverse matching. according to the priority matching of different lengths, they can be divided into maximum (longest) matching and minimum (shortest) matching; based on whether it is combined with the part-of-speech tagging process, it can be divided into a simple word segmentation method and an integrated method combining word segmentation and tagging.

System platform: PHP environment

Development Language: PHP

Usage: HTTP service

Demo URL: http://www.itgrass.com/phpanalysis/
Open Source official website: http://www.itgrass.com/phpanalysis/

Qingfeng notes: It is easy to implement and easy to use. it can be used for some simple applications, but the computing efficiency of large data volumes is not as high as that of the previous ones.

I tried several systems and found that the basic word segmentation function is okay, but there are some differences in the division of some words. for part-of-speech determination, there are differences between systems.

Http://www.bitsCN.com/codes/40139.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to PHP Chinese word segmentation automatic keyword acquisition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to PHP Chinese word segmentation automatic keyword acquisition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support