Set up coreseek (sphek + mmseg3) Detailed installation configuration + php sphek extension installation +

Source: Internet
Author: User
A document contains installation, Incremental backup, extension, and api call examples, saving time for searching a large number of articles. Build coreseek (sphinx + mmseg3) install [Step 1] install mmseg3cdvarinstallwgethttp: www.coreseek.cnuploadscsft4.0coreseek-4.1-beta.tar.gz tarzxvfcoreseek-4 first

A document contains installation, Incremental backup, extension, and api call examples, saving time for searching a large number of articles. Set up coreseek (sphinx + mmseg3) and install [Step 1] first install mmseg3 cd/var/install wget http: // www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz tar zxvf coreseek

A document contains installation, Incremental backup, extension, and api call examples, saving time for searching a large number of articles.

Set up coreseek (sphinx + mmseg3) Installation

[Step 1] first install mmseg3

Cd/var/installwget http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gztar zxvf coreseek-4.1-beta.tar.gzcd coreseek-4.1-betacd mmseg-3.2.14. /bootstrap. /configure -- prefix =/usr/local/mmseg3make & make install: error: cannot find input file: src/Makefile. in or other errors similar to errors... solution: execute the following commands in sequence. An error occurs when I run 'aclocal'. For the solution, see the following description: yum-y install libtoolaclocallibtoolize -- forceautomake -- add-missingautoconfautoheadermake clean.

After 'libtool' is installed, continue to run the preceding commands starting from 'aclocal', and then run the initial installation process.

[Step 2] install coreseek

# Install coreseek $ cd csft-3.2.14 or cd csft-4.0.1 or cd csft-4.1 $ sh buildconf. sh # The output warning information can be ignored. If an error occurs, you need to solve the problem $. /configure -- prefix =/usr/local/coreseek -- without-unixodbc -- with-mmseg-separated des =/usr/local/mmseg3/include/mmseg/--- mmseg-libs =/usr/local/mmseg3/lib/-- with-mysql # If a mysql problem is prompted, for more information about how to install MySQL Data Sources, see http://www.coreseek.cn/product_install/install_on_bsd_linux/?mysql$ make & make install $ cd .. # command line test mmseg word segmentation, coreseek search (you need to set the character set to zh_CN.UTF-8 in advance to ensure the correct display of Chinese) $ cd testpack $ cat var/test. xml # Chinese $/usr/local/mmseg3/bin/mmseg-d/usr/local/mmseg3/etc var/test. xml $/usr/local/coreseek/bin/indexer-c etc/csft. conf -- all $/usr/local/coreseek/bin/search-c etc/csft. conf network search

The xmlpipe2 support NOT compiled in. To use xmlpipe2 and install missing XML libra error occurs.

Run the following command:

yum -y install expat expat-devel

After the installation, re-compile coreseek and then generate the index.

The result is as follows:

Coreseek Fulltext 4.1 [sph00002.0.2-dev (r2922)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file 'etc/csft. conf '... index 'xml': query'network search': returned 1 matches of 1 total in 0.000 sec displaying matches: 1. document = 1, weight = 1590, published = Thu Apr 1 07:20:07 2010, author_id = 1 words: 1. 'network': 1 agents, 1 hits 2. 'search': 2 documents, 5 hits

The following describes how to configure sphtracing and mysql.


Create a sphinx statistical table and execute it in the coreseek_test database.

CREATE TABLE sph_counter(    counter_id INTEGER PRIMARY KEY NOT NULL,    max_doc_id INTEGER NOT NULL);

Create a configuration file for configuring sphinx and mysql

# vi /usr/local/coreseek/etc/csft_mysql.conf

# MySQL Data Source configuration. For details, see: http://www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#conf-reference# Source main {type = mysql SQL _host = localhost SQL _user = root SQL _pass = 123456 SQL _db = coreseek_test SQL _port = 3306 SQL _query_pre = set names utf8 SQL _query_pre = REPLACE INTO sph_counter SELECT 1, MAX (id) FROM hr_spider_company; SQL _query = SELECT * FROM hr_spider_company WHERE id <= (SELECT max_doc_id FROM sph_counter WHERE counter_id = 1) # the id of the first column of SQL _query must be an integer. # title and content are used as string/text fields, SQL _attr_uint = id # The value read from SQL must be an integer SQL _attr_uint = from_id # The value read from SQL must be an integer, full-text search SQL _attr_uint = link_id # The value read from SQL must be an integer. Full-text search SQL _attr_uint = add_time # The value read from SQL must be an integer, full-text search SQL _field_string = link_url # string field (full-text search is supported and original text information can be returned) SQL _field_string = company_name # string field (full-text search is supported and original text information can be returned) SQL _field_string = type_name # string field (full-text search is supported and original text information can be returned) SQL _field_string = trade_name # string field (full-text search is supported and original text information can be returned) SQL _field_string = email # string field (full-text search is supported and original text information can be returned) SQL _field_string = description # string field (full-text search is supported and original text information can be returned) SQL _query_info_pre = set names utf8 # When querying through the command line, SET the correct character SET SQL _query_info = SELECT id, from_id, link_id, company_name, type_name, trade_name, address, description, FROM_UNIXTIME (add_time) AS add_time FROM hr_spider_company WHERE id = $ id # When querying through the command line, read the original data FROM the database} source delta: main {SQL _query_pre = set names utf8 SQL _query = SELECT * FROM nation WHERE id> (SELECT max_doc_id FROM sph_counter WHERE counter_id = 1) SQL _query_post_index = REPLACE INTO sph_counter SELECT 1, MAX (id) FROM hr_spider_company} # index defines index main {source = main # corresponding source name path =/usr/local/coreseek/var/data/mysql # change to the actual absolute path, example:/usr/local/coreseek/var /... docinfo = extern mlock = 0 morphology = none min_word_len = 1 html_strip = 0 # Chinese Word Segmentation configuration. For details, see: http://www.coreseek.cn/products-install/coreseek_mmseg/ Charset_dictpath =/usr/local/mmseg3/etc/# BSD, set in Linux environment,/character end charset_type = zh_cn.utf-8} index delta: main {source = delta path =/usr/local/coreseek/var/data/delta} # global index definition indexer {mem_limit = 128 M} # searchd service definition searchd {listen = 9312 read_timeout = 5 max_children = 30 max_matches = 1000 bytes = 0 preopen_indexes = 0 unlink_old = 1 pid_file =/usr/local/coreseek/var/log/searchd_mysql.pid # change it to the actual absolute value. path, example:/usr/local/coreseek/var /... log =/usr/local/coreseek/var/log/searchd_mysql.log # change it to the actual absolute path, for example,/usr/local/coreseek/var /... query_log =/usr/local/coreseek/var/log/query_mysql.log # change it to the actual absolute path, for example,/usr/local/coreseek/var /... binlog_path = # disable binlog}

The name of my test table is hr_spider_company. You only need to change it to your own table name as needed.

Call Command list:

Start the Background Service (must be enabled)

# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf

Index execution (query and test must be performed once)

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate

Execute incremental Index

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate

Merge Indexes

/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted 0 0

(To prevent multiple keywords from pointing to the same document, add -- merge-dst-range deleted 0 0)

Backend service test

# /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/csft_mysql.conf  aaa

Disable background services

# /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/csft_mysql.conf --stop

Automated commands:

crontab -e

*/1 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf delta --rotate*/5 * * * * /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --merge main delta --rotate --merge-dst-range deleted 0 030 1 * * *  /bin/sh /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate

The following task plan means to execute an incremental index every minute, merge the index every five minutes, and execute the overall index at every day.

Extension installation and installation of Sphinx


In the official Coreseek tutorial, php is recommended to directly include a php file. In fact, php has an independent sphsf-module that can directly operate coreseek (coreseek is sphsf !) I have already entered the official php function library, and the efficiency improvement is not a little bit! However, the php module depends on the libsphinxclient package.

[Step 1] install libsphinxclient

# Cd/var/install/coreseek-4.1-beta/csft-4.1/api/libsphinxclient /#. /configure -- prefix =/usr/local/sphinxclientconfigure: creating. /config. statusconfig. status: creating Makefileconfig. status: error: cannot find input file: Makefile. in # error: configure failed // handle configure error: A config is reported during compilation. status: error: cannot find input file: src/Makefile. in, and then run the following command to re-compile the Code: # aclocal # libtoolize -- force # automake -- add-missing # autoconf # autoheader # make clean // compile from new configure #. /configure # make & make install

[Step 2] install the PHP extension of sphinx

http://pecl.php.net/package/sphinx# wget http://pecl.php.net/get/sphinx-1.3.0.tgz# tar zxvf sphinx-1.3.0.tgz# cd sphinx-1.3.0# phpize# ./configure --with-php-config=/usr/bin/php-config --with-sphinx=/usr/local/sphinxclient# make && make install# cd /etc/php.d/# cp gd.ini  sphinx.ini# vi sphinx.iniextension=sphinx.so# service php-fpm restart

Open phpinfo and check whether the sphinx module is supported.

Example of php calling sphinx:

 SetServer ("127.0.0.1", 9312); $ s-> setMatchMode (SPH_MATCH_PHRASE); $ s-> setMaxQueryTime (30); $ res = $ s-> query ("BMW ", 'main'); # [BMW] keyword, [main] data source $ err = $ s-> GetLastError (); var_dump (array_keys ($ res ['matches']); echo"
"." Read the value in the database by the obtained ID. "."
"; Echo'
';    var_dump($res);    var_dump($err);    echo '
';

Call Example 2: pagination supported

 SetServer ("192.168.252.132", 9312); // SPH_MATCH_ALL, match all query words (default mode); SPH_MATCH_ANY, match any of the query words; SPH_MATCH_EXTENDED2, supports Special operators to query $ s-> setMatchMode (SPH_MATCH_ALL); $ s-> setMaxQueryTime (30); // sets the maximum search time $ s-> SetArrayResult (false ); // whether to replace the Matches key with ID $ s-> SetSelect ("*"); // you can specify the content of the returned information, equivalent to SQL $ s-> SetRankingMode (SPH_RANK_BM25); // sets the scoring mode. SPH_RANK_BM25 may degrade the quality of the query results containing multiple words. // $ S-> SetSortMode (SPH_SORT_EXTENDED); // The result may be inaccurate if this parameter is added. // $ s-> SetSortMode (SPH_SORT_EXTENDED, "from_id asc, id desc "); // set the sorting mode for matching items. SPH_SORT_EXTENDED combines columns in a way similar to SQL, in ascending or descending order. $ Weights = array ('Company _ name' => 20); $ s-> SetFieldWeights ($ weights); // set the field weight $ s-> SetLimits (0, 30,100 0, 0); // set the result set Offset SetLimits (cheap quantity, number of matching items, default 1000 Number of query result sets, and stop when the threshold value reaches) // $ s-> SetFilter ($ attribute, $ values, $ exclude = false); // set attribute filtering // $ s-> SetGroupBy ($ attribute, $ func, $ groupsort = "@ group desc"); // set the attributes of the group $ res = $ s-> query ('@ * ""', 'main ', '-- single-0-query --'); # [BMW] keyword, [news] Data source // code highlight $ tags = array (); $ tags_name = array (); foreach ($ res ['matches'] as $ key => $ value) {$ tags [] = $ value ['attrs']; $ company_name [] = $ value ['attrs'] ['Company _ name']; $ description [] = $ value ['attrs'] ['description'];} $ company_name = $ s-> BuildExcerpts ($ company_name, 'main', 'auto', $ opts = array (); // execute highlighting, the index name must not contain * $ description = $ s-> BuildExcerpts ($ description, 'main', 'auto', $ opts = array (). // you can highlight the index, * foreach ($ tags as $ k => $ v) cannot be used for index names) {$ tags [$ k] ['Company _ name'] = $ company_name [$ k]; // overwrite $ tags [$ k] ['description'] = $ description [$ k] After highlighting; // overwrite $ I = 0 after highlighting} // overwrite $ I = 0 after highlighting; foreach ($ res ['matches'] as $ key => $ value) {$ res ['matches'] [$ key] = $ tags [$ I]; $ I ++;} $ err = $ s-> GetLastError (); echo'
';    var_export($res);    var_export($err);    echo '
';

There is a lot of need for reference: http://www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#api-reference

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.