Set up coreseek (sphplug + mmseg3) detailed installation configuration + php sphplug extension installation + php call example a document contains the installation, incremental backup, expansion, api call examples, saving time for searching a large number of articles. Set up coreseek (sphplug & #43; mmseg3) and install [step 1] set up coreseek (sphplug + mmseg3). detailed installation configuration + php sphplug extension installation + php call example
A document contains installation, incremental backup, extension, and api call examples, saving time for searching a large number of articles.
Set up coreseek (sphinx + mmseg3) installation
[Step 1] first install mmseg3
Cd/var/installwget http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gztar zxvf coreseek-4.1-beta.tar.gzcd coreseek-4.1-betacd mmseg-3.2.14. /bootstrap. /configure -- prefix =/usr/local/mmseg3make & make install: error: cannot find input file: src/Makefile. in or other errors similar to errors... solution: execute the following commands in sequence. an error occurs when I run 'aclocal'. for the solution, see the following description: yum-y install libtoolaclocallibtoolize -- forceautomake -- add-missingautoconfautoheadermake clean.
After 'libtool' is installed, continue to run the preceding commands starting from 'aclocal', and then run the initial installation process.
[Step 2] install coreseek
# Install coreseek $ cd csft-3.2.14 or cd csft-4.0.1 or cd csft-4.1 $ sh buildconf. sh # The output warning information can be ignored. If an error occurs, you need to solve the problem $. /configure -- prefix =/usr/local/coreseek -- without-unixodbc -- with-mmseg-separated des =/usr/local/mmseg3/include/mmseg/--- mmseg-libs =/usr/local/mmseg3/lib/-- with-mysql # if a mysql problem is prompted, for more information about how to install MySQL data sources, see http://www.coreseek.cn/product_install/install_on_bsd_linux/?mysql$ make & make install $ cd .. # command line test mmseg word segmentation, coreseek search (you need to set the character set to zh_CN.UTF-8 in advance to ensure the correct display of Chinese) $ cd testpack $ cat var/test. xml # Chinese $/usr/local/mmseg3/bin/mmseg-d/usr/local/mmseg3/etc var/test. xml $/usr/local/coreseek/bin/indexer-c etc/csft. conf -- all $/usr/local/coreseek/bin/search-c etc/csft. conf network search
The xmlpipe2 support NOT compiled in. To use xmlpipe2 and install missing XML libra error occurs.
Run the following command:
yum -y install expat expat-devel
After the installation, re-compile coreseek and then generate the index.
The result is as follows:
Coreseek Fulltext 4.1 [sph00002.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file 'etc/csft. conf '... index 'xml': query'network search': returned 1 matches of 1 total in 0.000 sec displaying matches: 1. document = 1, weight = 1590, published = Thu Apr 1 07:20:07 2010, author_id = 1 words: 1. 'Network': 1 agents, 1 hits 2. 'Search': 2 documents, 5 hits
The following describes how to configure sphtracing and mysql.
Create a sphinx statistical table and execute it in the coreseek_test database.
CREATE TABLE sph_counter( counter_id INTEGER PRIMARY KEY NOT NULL, max_doc_id INTEGER NOT NULL);
Create a configuration file for configuring sphinx and mysql
# vi /usr/local/coreseek/etc/csft_mysql.conf
# MySQL data source configuration. for details, see: http://www.coreseek.cn/products-install/mysql/# Please first convert var/test/documents. SQL import database, configure the following MySQL User password database # source definition source main # define source name {type = mysql SQL _host = localhost SQL _user = root SQL _pass = 123456 SQL _db = coreseek_test SQL _port = 3306 SQL _query_pre = SET NAMES utf8 SQL _query_pre = REPLACE INTO sph_counter SELECT 1, MAX (id) FROM hr_spider_company; # Update sph_counter SQL _query = SELECT * FROM hr_spider_company WHERE id <= (SELECT max_doc_id FROM sph_counter WHERE counter_id = 1) # reading data based on the sph_counter Record ID # The id of the first column of SQL _query must be an integer # the title and content are used as string/text fields and are fully indexed, please refer to the actual database field SQL _attr_uint = from_id # The value read from SQL must be an integer. please refer to the actual database field SQL _attr_uint = link_id # The value read from SQL must be an integer, please refer to the actual database field SQL _attr_uint = add_time # The value read from SQL must be an integer. please refer to the actual database field} # incremental source definition source delta: main # note the uniformity with the definition name {SQL _query_pre = set names utf8 SQL _query = SELECT * FROM hr_spider_company WHERE id> (SELECT max_doc_id FROM sph_counter WHERE counter_id = 1) # read data SQL _query_post_index = REPLACE INTO sph_counter SELECT 1, MAX (ID) based on the sph_counter record id) FROM hr_spider_company # Update sph_counter} # index definition index main # Consistency with definition name {source = main # corresponding source name path =/usr/local/coreseek/var/data/ mysql # change it to the actual absolute path, example:/usr/local/coreseek/var /... docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 # Chinese word segmentation configuration. for details, see: http://www.coreseek.cn/products-install/coreseek_mmseg/ Charset_dictpath =/usr/local/mmseg3/etc/# BSD, set in Linux environment,/character end charset_type = zh_cn.utf-8} index delta: main # Consistency with definition names {source = delta path =/usr/local/coreseek/var/data/delta} # global index definition indexer {mem_limit = 128 M} # searchd service definition searchd {listen = 9312 read_timeout = 5 max_children = 30 max_matches = 1000 bytes = 0 preopen_indexes = 0 unlink_old = 1 pid_file =/usr/local/coreseek/var/log/searchd_mysql.pid # change it to the actual absolute path, example:/usr/local/coreseek/var /... log =/usr/local/coreseek/var/log/searchd_mysql.log # change it to the actual absolute path, for example,/usr/local/coreseek/var /... query_log =/usr/local/coreseek/var/log/query_mysql.log # change it to the actual absolute path, for example,/usr/local/coreseek/var /... binlog_path = # Disable binlog}
The name of my Test table is hr_spider_company. you only need to change it to your own table name as needed.
Call command list:
Start the background service (must be enabled)
# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf
Index execution (query and test must be performed once)
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
Execute incremental index
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate
Merge indexes
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted 0 0
(To prevent multiple keywords from pointing to the same document, add -- merge-dst-range deleted 0 0)
Backend service test
# /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/csft_mysql.conf aaa
Disable background services
# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop
Automated commands:
crontab -e
*/1 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate*/5 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted 0 030 1 * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
The following task plan means to execute an incremental index every minute, merge the index every five minutes, and execute the overall index at every day.
Extension installation and installation of Sphinx
In the official Coreseek tutorial, php is recommended to directly include a php file. In fact, php has an independent sphsf-module that can directly operate coreseek (coreseek is sphsf !) I have already entered the Official php function library, and the efficiency improvement is not a little bit! However, the php module depends on the libsphinxclient package.
[Step 1] install libsphinxclient
# Cd/var/install/coreseek-4.1-beta/csft-4.1/api/libsphinxclient /#. /configure -- prefix =/usr/local/sphinxclientconfigure: creating. /config. statusconfig. status: creating Makefileconfig. status: error: cannot find input file: Makefile. in # Error: configure failed // handle configure error: A config is reported during compilation. status: error: cannot find input file: src/Makefile. in, and then run the following command to re-compile the code: # aclocal # libtoolize -- force # automake -- add-missing # autoconf # autoheader # make clean // compile from new configure #. /configure # make & make install
[Step 2] install the PHP extension of sphinx
http://pecl.php.net/package/sphinx# wget http://pecl.php.net/get/sphinx-1.3.0.tgz# tar zxvf sphinx-1.3.0.tgz# cd sphinx-1.3.0# phpize# ./configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/sphinxclient# make && make install# cd /etc/php.d/# cp gd.ini sphinx.ini# vi sphinx.iniextension=sphinx.so# service php-fpm restart
Open phpinfo and check whether the sphinx module is supported.
Example of php calling sphinx:
SetServer ("127.0.0.1", 9312); $ s-> setMatchMode (SPH_MATCH_PHRASE); $ s-> setMaxQueryTime (30); $ res = $ s-> query ("BMW ", 'main'); # [BMW] keyword, [main] data source $ err = $ s-> GetLastError (); var_dump (array_keys ($ res ['matches']); echo"
"." Read the value in the database by the obtained ID. "."
"; Echo''; var_dump($res); var_dump($err); echo '
';
Output result:
Array (20) {[0] => int (1513) [1] => int (42020) [2] => int (57512) [3] => int (59852) [4] => int (59855) [5] => int (60805) [6] => int (94444) [7] => int (94448) [8] => int (99229) [9] => int (107524) [10] => int (111918) [11] => int (148) [12] => int (178) [13] => int (595) [14] => int (775) [15] => int (860) [16] => int (938) [17] => int (1048) [18] => int (1395) [19] => int (1657 )}
Read the value in the database by the obtained ID.
Array (10) {["error"] => string (0) "" ["warning"] => string (0) "" ["status"] => int (0) ["fields"] => array (17) {[0] => string (3) "cid" [1] => string (8) "link_url" [2] => string (12) "company_name" [3] => string (9) "type_name" [4] => string (10) "trade_name" [5] => string (5) "scale" [6] => string (8) "homepage" [7] => string (7) "address" [8] => string (9) "city_name" [9] => string (8) "postcode" [10] => string (7) "contact" [11] => string (9) "telephone" [12] => string (6) "mobile" [13] => string (3) "fax" [14] => string (5) "email" [15] => string (11) "description" [16] => string (11) "update_time"} ["attrs"] => array (3) {["from_id"] => string (1) "1" ["link_id"] => string (1) "1" ["add_time"] => string (1) "1"} ["matches"] => array (20) {[1513] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "3171471" ["add_time"] => string (10) "1394853454"} [42020] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "2248093" ["add_time"] => string (10) "1394913884"} [57512] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "2684470" ["add_time"] => string (10) "1394970833"} [59852] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "3" ["link_id"] => string (1) "0" ["add_time"] => string (10) "1394977527"} [59855] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "3" ["link_id"] => string (1) "0" ["add_time"] => string (10) "1394977535"} [60805] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "3" ["link_id"] => string (1) "0" ["add_time"] => string (10) "1394980072"} [94444] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "3" ["link_id"] => string (1) "0" ["add_time"] => string (10) "1395084115"} [94448] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "3" ["link_id"] => string (1) "0" ["add_time"] => string (10) "1395084124"} [99229] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "1297992" ["add_time"] => string (10) "1395100520"} [107524] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "5" ["link_id"] => string (10) "4294967295" ["add_time"] => string (10) "1395122053"} [111918] => array (2) {["weight"] => int (2) ["attrs"] => array (3) {["from_id"] => string (1) "5" ["link_id"] => string (10) "4294967295" ["add_time"] => string (10) "1395127953"} [148] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "2770294" ["add_time"] => string (10) "1394852562"} [178] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "2474558" ["add_time"] => string (10) "1394852579"} [595] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (6) "534804" ["add_time"] => string (10) "1394852862"} [775] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "3230353" ["add_time"] => string (10) "1394852980"} [860] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "2549233" ["add_time"] => string (10) "1394853048"} [938] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "3191382" ["add_time"] => string (10) "1394853114"} [1048] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "3234645" ["add_time"] => string (10) "1394853174"} [1395] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "2661219" ["add_time"] => string (10) "1394853375"} [1657] => array (2) {["weight"] => int (1) ["attrs"] => array (3) {["from_id"] => string (1) "2" ["link_id"] => string (7) "2670624" ["add_time"] => string (10) "1394853540" }}[ "total"] => int (543) ["total_found"] => int (543) ["time"] => float (0.109) ["words"] => array (1) {["BMW"] => array (2) {["docs"] => int (543) ["hits"] => int (741 )}}} string (0 )""