One, Coreseek introduction
Official http://www.coreseek.cn/
Coreseek is a Chinese full-text search/search software, GPLV2 license Agreement open source release, based on Sphinx Research and Development and independent publishing, specializing in Chinese search and information processing field, for industry/vertical Search, forum/Site search, database search, Document/literature search, information retrieval, Application scenarios such as data mining. Commercial use (for example, embedding in other programs) requires commercial authorization.
Coreseek is a full-text search engine that supports Chinese, intended to provide high-speed, low-footprint, high-correlation results in Chinese full-text search capabilities for other applications. Coreseek can be very easy to integrate with SQL database and scripting languages.
The native search API provided in the Sphinx release supports PHP, Python, Perl, Rudy, and Java. The search API is very lightweight and can be ported to new languages within a few hours. Third-party API interfaces and plug-ins provide support for Perl, C #, Haskell, Ruby-on-rails, and other possible languages or frameworks.
Second, install Coreseek
Note: This article is a Coreseek installation tutorial based on Centos+mysql as a data source support. mysql installation skipped.
1, download Coreseek 3.2 stable version, download other versions please go to the official website to download
cd/usr/local/src/
wget http://www.coreseek.cn/uploads/csft/3.2/coreseek-3.2.14.tar.gz
Tar xzvf coreseek-3.2.14.tar.gz
CD coreseek-3.2.14
Pre-installed software is required before installing Coreseek: Yum install make gcc g++ gcc-c++ libtool autoconf automake imake mysql-devel libxml2-devel expat- Devel (Note: This is CentOS 64-bit
For other systems please refer to http://www.coreseek.cn/product_install/install_on_bsd_linux/#deps
2, install Mmseg
$ CD mmseg-3.2.14
$./bootstrap #输出的warning信息可以忽略, if error occurs, you need to resolve
$./configure--prefix=/usr/local/mmseg3
$ make && make install
$ CD.
# #如果提示libtool: Unrecognized option '--tag=cc ', see Libtool problem solution
# #安装完成后, the dictionaries and profiles used by MMSEG are automatically installed into/usr/local/mmseg3/etc
# #中文分词测试, if the display is unhealthy, check the locale and UTF-8 display settings in the current environment
$/usr/local/mmseg3/bin/mmseg-d/usr/local/mmseg3/etc Src/t1.txt
Chinese/x/x Word/x Test/X
Chinese/x Shanghai/X
Word Splite took:1 Ms.
3, install Coreseek
$ CD csft-3.2.14
# #执行configure, compile the configuration:
$ sh buildconf.sh
$./configure--prefix=/usr/local/coreseek--WITHOUT-UNIXODBC--with-mmseg--with-mmseg-includes=/usr/local/mmseg3/ include/mmseg/--with-mmseg-libs=/usr/local/mmseg3/lib/--with-mysql
If you find that MySQL includes file is not found, use the following compile command
./configure--prefix=/usr/local/coreseek--without-unixodbc--with-mmseg--with-mmseg-includes=/usr/local/mmseg3/ include/mmseg/--with-mmseg-libs=/usr/local/mmseg3/lib/--with-mysql-includes=/alidata/server/mysql/include/-- with-mysql-libs=/alidata/server/mysql/bin/
Make && make install
4, measuring the type Coreseek
Cd.. /testpack
$/usr/local/coreseek/bin/indexer-c etc/csft.conf
# #以下为正常情况下的提示信息:
Coreseek fulltext 3.2 [Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2010,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
Using config file ' etc/csft.conf ' ...
Total 0 Reads, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
Total 0 writes, 0.000 sec, 0.0 kb/call avg, 0.0 msec/call avg
##
# #csft-version 4.0 display: Error:nothing to do.
##
$/usr/local/coreseek/bin/indexer-c etc/csft.conf--all
# #以下为正常索引全部数据时的提示信息: (similar to version csft-4.0)
Coreseek fulltext 3.2 [Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2010,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
Using config file ' etc/csft.conf ' ...
Indexing index ' XML ' ...
Collected 3 docs, 0.0 MB
Sorted 0.0 mhits, 100.0% done
Total 3 docs, 7585 bytes
Total 0.075 sec, 101043 bytes/sec, 39.96 docs/sec
Total 2 Reads, 0.000 sec, 5.6 kb/call AVG, 0.0 msec/call avg
Total 7 writes, 0.000 sec, 3.9 kb/call avg, 0.0 msec/call avg
$/usr/local/coreseek/bin/indexer-c etc/csft.conf XML
# #以下为正常索引指定数据时的提示信息: (similar to version csft-4.0)
Coreseek fulltext 3.2 [Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2010,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
Using config file ' etc/csft.conf ' ...
Indexing index ' XML ' ...
Collected 3 docs, 0.0 MB
Sorted 0.0 mhits, 100.0% done
Total 3 docs, 7585 bytes
Total 0.069 sec, 109614 bytes/sec, 43.35 docs/sec
Total 2 Reads, 0.000 sec, 5.6 kb/call AVG, 0.0 msec/call avg
Total 7 writes, 0.000 sec, 3.9 kb/call avg, 0.0 msec/call avg
$/usr/local/coreseek/bin/search-c etc/csft.conf
# #以下为正常测试搜索时的提示信息: (similar to version csft-4.0)
Coreseek fulltext 3.2 [Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2010,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
Using config file ' etc/csft.conf ' ...
Index ' XML ': Query ': returned 3 matches of 3 total in 0.093 sec
displaying matches:
1. Document=1, Weight=1, Published=thu Apr 1 22:20:07, author_id=1
2. document=2, Weight=1, Published=thu Apr 1 23:25:48, author_id=1
3. Document=3, Weight=1, Published=thu Apr 1 12:01:00, author_id=2
Words
$/usr/local/coreseek/bin/search-c etc/csft.conf-a Twittter and opera all offer search services
# #以下为正常测试搜索关键词时的提示信息: (similar to version csft-4.0)
Coreseek fulltext 3.2 [Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2010,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
Using config file ' etc/csft.conf ' ...
Index ' XML ': Query ' Twittter and opera both provide search services ': returned 3 matches of 3 total in 0.038 sec
displaying matches:
1. Document=3, weight=24, Published=thu Apr 1 12:01:00, author_id=2
2. Document=1, weight=4, Published=thu Apr 1 22:20:07, author_id=1
3. document=2, weight=3, Published=thu Apr 1 23:25:48, author_id=1
Words
1. ' Twittter ': 1 documents, 3 hits
2. ' AND ': 3 documents, hits
3. ' Opera ': 1 documents, hits
4. ' All ': 2 documents, 4 hits
5. ' Offer ': 0 documents, 0 hits
6. ' Up ': 3 documents, hits
7. ' Search ': 2 documents, 5 hits
8. ' Service ': 1 documents, 1 hits
$/usr/local/coreseek/bin/searchd-c etc/csft.conf
# #以下为正常开启搜索服务时的提示信息: (similar to version csft-4.0)
Coreseek fulltext 3.2 [Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2010,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
Using config file ' etc/csft.conf ' ...
Listening on all interfaces, port=9312
Third, configure Coreseek support MySQL data source
1. Configure the csft_mysql.conf file
Copy the MySQL configuration file to the Coreseek installation directory etc/(e.g./usr/local/coreseek/etc/)
cp/usr/local/src/coreseek-3.2.14/testpack/etc/csft_mysql.conf/usr/local/coreseek/etc/
cd/usr/local/coreseek/etc/
VI csft_mysql.conf
The red part below is for you to configure yourself
Official Reference Document: Data source configuration: MySQL data source http://www.coreseek.cn/products-install/datasource/
For additional data sources please refer to the official
==============================================================
#源定义
SOURCE Phperz
{
Type = MySQL
Sql_host = localhost
Sql_user = root
Sql_pass = xxxx
sql_db = Phperz
Sql_port = 3306
Sql_query_pre = SET NAMES UTF8
Sql_query = SELECT Id,title,descs,status from article
#sql_query第一列id需为整数
#title, content as a string/text field, indexed by the full text
Sql_attr_uint = Status #从SQL读取到的值必须为整数
#sql_attr_timestamp = date_added #从SQL读取到的值必须为整数, as a time attribute
Sql_query_info_pre = set NAMES UTF8 #命令行查询时, setting the correct character sets
Sql_query_info = SELECT * from article where id= $id #命令行查询时 to read raw data information from the database
}
#index定义
Index Phperz
{
Source = Phperz #对应的source名称
Path =/usr/local/coreseek/var/data/phperz #请修改为实际使用的绝对路径, for example:/usr/local/coreseek/var/...
DocInfo = extern
Mlock = 0
Morphology = None
Min_word_len = 1
Html_strip = 0
#中文分词配置, for more information, see: http://www.coreseek.cn/products-install/coreseek_mmseg/
Charset_dictpath =/usr/local/mmseg3/etc/#BSD, settings under Linux,/end of symbol
#charset_dictpath = etc/#Windows环境下设置,/end of symbol, it is best to give absolute path, for example: c:/usr/local/coreseek/etc/...
Charset_type = Zh_cn.utf-8
}
#全局index定义
Indexer
{
Mem_limit = 128M
}
#searchd服务定义
Searchd
{
Listen = 9312
Read_timeout = 5
Max_children = 30
max_matches = 1000
Seamless_rotate = 0
preopen_indexes = 0
Unlink_old = 1
Pid_file =/usr/local/coreseek/var/log/searchd_mysql.pid #请修改为实际使用的绝对路径, for example:/usr/local/coreseek/var/...
Log =/usr/local/coreseek/var/log/searchd_mysql.log #请修改为实际使用的绝对路径, for example:/usr/local/coreseek/var/...
Query_log =/usr/local/coreseek/var/log/query_mysql.log #请修改为实际使用的绝对路径, for example:/usr/local/coreseek/var/...
}
==============================================================
2, building the index
The road section needs to be changed to your own address.
/usr/local/coreseek/bin/indexer-c/usr/local/coreseek/etc/csft_mysql.conf--all
Errors that may occur
Error:index ' Phperz ': Sql_connect:can ' t connect to local MySQL server through socket '/var/lib/mysql/mysql.sock ' (2) (DS n=mysql://root:*** @localhost: 3306/phperz).
This is because MySQL's sock file path was incorrectly caused.
Confirm your Mysql.sock path and establish a soft connection, such as
Ln-s/tmp/mysql.sock/var/lib/mysql/mysql.sock
3, the index after the completion of the test can be done!
/usr/local/coreseek/bin/search-c/usr/local/coreseek/etc/csft_mysql.conf I'm a little apple
Test results (see below):
Coreseek fulltext 3.2 [Sphinx 0.9.9-release (r2117)]
Copyright (c) 2007-2011,
Beijing Choice Software Technologies Inc (http://www.coreseek.com)
Using config file '/usr/local/coreseek/etc/csft.conf ' ...
Index ' mysql ': query ' I am a little Apple ': returned 1 matches of 1 total in 0.003 sec
displaying matches:
1. document=291, weight=4, prize=1
id=291
Winner_name= Chaoli
Subject_name= I'm a little apple
School_name= Beijing Haidian District First Kindergarten
Sub_url=http://www.xxxxx.com
Prize=1
Words
1. ' Me ': Documents, Hits
2. ' Yes ': Documents, Hits
3. ' Little ': 5 documents, 5 hits
4. ' Apple ': 2 documents, 2 hits
------------------above are test results------------------------------
Four Sphinx Extensions for PHP installation (using Coreseek in PHP language)
Cd/web/src/coreseek-3.2.14/csft-3.2.14/api/libsphinxclient
./configure--prefix=/usr/local/sphinxclient
Make && make install
CD cd/web/src/sphinx-1.3.0
/usr/local/php/bin/phpize
./configure--with-php-config=/usr/local/php/bin/php-config--with-sphinx=/usr/local/sphinxclient
Make
Make install
Modify Vi/usr/local/php/etc/php.ini #添加下面两行
[Sphinx]
Extension=sphinx.so
To this Sphinx extension installation complete, restart Apache for testing!
The test code is as follows:
<?php
$CL = new Sphinxclient ();
Set the Sphinx server address and port, and if it is native, it can be localhost
$CL->setserver ("192.168.1.23", 9312);//corresponds to SEARCHD port
The following settings are used to return the result in array form
$cl->setarrayresult (TRUE);
$CL->setmatchmode (Sph_match_boolean);
$result = $cl->query (' I am a little apple ', ' MySQL '); Parameter Keyword index name
if ($result = = = False) {
echo "Query failed:". $CL->getlasterror (). ". \ n";
}
else {
if ($cl->getlastwarning ()) {
echo "WARNING:". $CL->getlastwarning (). "";
}
Print_r ($result);
}
?>
Five, Coreseek daily maintenance
Start
/usr/local/coreseek/bin/searchd-c/usr/local/coreseek/etc/csft_mysql.conf
Stop it
/usr/local/coreseek/bin/searchd-c/usr/local/coreseek/etc/csft_mysql.conf--stop
Build an index
/usr/local/coreseek/bin/indexer-c/usr/local/coreseek/etc/csft_mysql.conf--all
Rebuilding indexes
/usr/local/coreseek/bin/indexer-c/usr/local/coreseek/etc/csft_mysql.conf--all--rotate
You need to add the boot command to boot.
Add the Rebuild Index command to the scheduled task for daily execution
CentOS installation Coreseek and PHP extensions