Install Sphinx recommended installation Sphinx 2.0.7-release http://sphinxsearch.com
wget Http://sphinxsearch.com/files/sphinx-2.0.8-release.tar.gztar ZXVF SPHINX-2.0.8-RELEASE.TAR.GZCD Sphinx-2.0.8-release./configure--prefix=/usr/local/webserver/sphinx--with-mysql=make && make Install Note:--prefix: Specify where to install Sphinx, my installation directory is "/usr/local/webserver/sphinx"--with-mysql:mysql the installation directory other parameters please use./configure Help view, it is recommended to use the above parameters
Run the Sphinx searchd command: If you see the following message, the installation is successful (depending on your configuration, the/usr/local/webserver/sphinx/bin needs to be placed in the environment variable)
install SCWS http://www.xunsearch.com/scws/download.php
Installing Scwswget Http://www.xunsearch.com/scws/down/scws-1.2.2.tar.bz2tar xvf SCWS-1.2.2.TAR.BZ2CD scws-1.2.2./configure --prefix=/usr/local/webserver/scwsmake && make install php scws extension (back to Scws source directory scws-1.2.2) CD phpext//usr/ Local/webserver/php/bin/phpize./configure--with-scws=/usr/local/webserver/scws/--with-php-config=/usr/local/ Webserver/php/bin/php-configmake && make install configuration php.ini, add the following code in php.ini [scws]extension = Scws.soscws.default.charset = UTF8; The following path is the value of--WITH-SCWS at compile time Scws.default.fpath =/usr/local/webserver/scws/
[Email protected] www]# php phpinfo.php | grep scwsscwsscws bugreport = http://www.xunsearch.com/ Scwsscws.default.charset = UTF8 = Utf8scws.default.fpath =/usr/local/webserver/scws/=/usr/local/ webserver/scws/
Install the SCWS dictionary
Cd/usr/local/webserver/scws/etc/wget Http://www.xunsearch.com/scws/down/scws-dict-chs-gbk.tar.bz2tar XVJF Scws-dict-chs-gbk.tar.bz2wget Http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2tar XVJF SCWS-DICT-CHS-UTF8.TAR.BZ2 Note: SCWS can only be deployed on a single machine, and if it is a clustered deployment, each Web server needs to be deployed SCWS
Brief introduction
SCWS is the acronym for Simple Chinese Word segmentation (ie: Simplified Chinese word breaker).
This is a set of word frequency dictionary based mechanical Chinese word breaker, it can be a whole paragraph of the Chinese text is basically correctly cut into words. The word is the smallest morpheme unit in Chinese, but in writing it is not like English will be separated by a space between the words, so how accurate and fast participle has been the key to the Chinese word segmentation difficult.
SCWS uses the pure C language development, does not rely on any external library function, can directly use the dynamic link library to embed the application, the support Chinese code includes GBK, UTF-8 and so on. In addition, PHP extensions are available to quickly and easily use word breakers in PHP.
Word segmentation algorithm is not too many innovative ingredients, the use of their own acquisition of the word frequency dictionary, supplemented by a certain proprietary name, name, place names, digital age and other rules to achieve the basic participle, through a small range of testing accuracy rate between 90% ~ 95%, basically can meet some small search engines, key word extraction and other occasions to use. The first prototype version was released at the end of 2005.
SCWS was developed by Hightman and released by the BSD license Agreement, source code hosted on GitHub.