This article illustrates the method of PHP using SCWS to realize the full text search function of MySQL. Share to everyone for your reference. The specific methods are as follows:
Scws such a Chinese word breaker is quite good, simple to learn, it contains some proprietary names, names, places, digital age and other rules set, you can direct the statement according to these rules to separate into a keyword, the accuracy rate between 90%-95%, Follow the installation instructions to put SCWS extensions into the extended directory of PHP, download rules and dictionary files, and reference them in the PHP configuration file, you can use SCWS for word segmentation.
1 Modify the PHP extension code to support PHP 5.4.x
2) fixed the problem that the limit parameter of scws_get_tops in PHP extension does not allow less than 10
3) LIBSCWS add Scws_fork () from existing SCWS instances to generate branches and share dictionary/rule sets, mainly for multi-threaded development.
4 new partial version of the Win32 DLL extension
The PHP instance code is as follows:
Copy Code code as follows:
<?php
Instantiate the core class of Word breaker
$so = Scws_new ();
Encoding used when setting participle
$so->set_charset (' utf-8 ');
Set up a dictionary for participle (use UTF8 dictionary here)
$so->set_dict ('/path/dict.utf8.xdb ');
Set the rules for participle
$so->set_rule ('/path/rules.utf8.ini ');
Remove punctuation before participle
$so->set_ignore (TRUE);
Whether duplex division, such as "Chinese" return "Chinese + people + Chinese" three words.
$so->set_multi (TRUE);
Set the text automatically to the two Word segmentation method aggregation
$so->set_duality (TRUE);
The statement to be participle
$so->send_text ("Welcome to IT development in the Martian Era");
Get the result of word segmentation, if extracting high-frequency words using Get_tops method
while ($tmp = $so->get_result ())
{
Print_r ($TMP);
}
$so->close ();
?>
Note: as the above example, the input of the text, dictionaries, rules and files of the three character sets must be unified, in addition to MySQL 4.XX some do not support Chinese full-text search, you can deposit key words corresponding to the location code to facilitate full-text search.
Version List
Version type platform performance other
scws-1.1.x C Code *unix*/*php* Accurate: 95%, recall: 91%, Speed: 1.2mb/sec
PHP Extended Word segmentation speed: 250kb/sec [download] [documentation] [Installation instructions]
Php_scws.dll (1) PHP extension library windows/php 4.4.x accurate: 95%, recall: 91%,
Php_scws.dll (2) PHP extension library windows/php 5.2.x accurate: 95%, recall: 91%,
Php_scws.dll (3) PHP extension library windows/php 5.3.x accurate: 95%, recall: 91%,
Php_scws.dll (4) PHP extension library windows/php 5.4.x accurate: 95%, recall: 91%,
PSCWS23 PHP Source code is not limited (not supported UTF-8) accurate: 93%, recall: 89%,
PSCWS4 PHP Source code is not limited to accurate: 95%, recall: 91%,
I hope this article will help you with your PHP program design.