Configuration and use of coreeek and sphinx

Source: Internet
Author: User
For details about sphinx installation, refer to the Sphinx installation record. for more information about coreeek installation, see coreseek installation record. after sphinx and coreeek are installed, we can search for satisfactory results. If there is a problem: we need to re-index the new data in SphK. Because the old data volume is large

For details about sphinx installation, refer to the Sphinx installation record. for more information about coreeek installation, see coreseek installation record. after sphinx and coreeek are installed, we can search for satisfactory results. If there is a problem: we need to re-index the new data in SphK. Because the old data volume is large

Preface

For details about sphinx installation, refer to the Sphinx installation record.
For more information about coreeek installation, see coreseek installation record.

After sphinx and coreeek are installed, we can search for satisfactory results. If there is a problem: we need to re-index the new data in SphK.
Because the old data volume is large, it takes a lot of time to re-build the index. If the data does not need to be synchronized in real time, you can re-build the index regularly every night.
If real-time synchronization is required, for example, if the search takes effect within a few minutes, you need to use an incremental index.
Then merge the incremental and primary indexes at night.

Configuration

In sphinx, You need to configure two data sources and two indexes. One is the primary index, the other is the incremental index, and the incremental index must inherit from the primary index.

Because our indexes will be merged at the specified time, what we need to do before the next merge index is to re-create the data changed or added after the last merged index.

We need a secondary table to record the last modification time for incremental indexing.

The structure of the secondary table is very simple. There is only one field last merged, and there is always only one record.

CREATE TABLE t_blog_time_sphinx(    c_id INTEGER PRIMARY KEY NOT NULL,    c_time DATETIME NOT NULL);

The configuration of sphinx is as follows:

# Source main_source {type = mysql SQL _host = 127.0.0.1 SQL _user = test SQL _pass = test SQL _db = test SQL _port = 3306 SQL _query_pre = set names utf8 SQL _query = select c_id, c_title, c_content, c_year, c_month, c_day, c_modifytime, c_createtime FROM hour; hour = c_year hour = c_month hour = c_day hour = c_modifytime hour = c_createtime SQL _field_string = c_title SQL _field_string = c_content} # incremental data source main_inc_source: main_source {SQL _query_pre = set names utf8 SQL _query = select c_id, c_title, c_content, c_year, c_month, c_day, c_modifytime, c_createtime FROM region where c_modifytime> (SELECT c_time FROM limit 1 );} # Main index main_index {source = main_source path =/usr/local/coreseek4/var/data/main_index docinfo = extern charset_type = zh_cn.utf-8 charset_dictpath =/usr/local/mmseg3/etc/ ngram_len = 0} # incremental index main_inc_index: main_index {source = main_inc_source path =/usr/local/coreseek4/var/data/main_inc_index} # indexer {mem_limit = 32 M} # daemon searchd {listen = 9312 listen = 9306: mysql41 log =/usr/local/coreseek4/var/log/searchd. log query_log =/usr/local/coreseek4/var/log/query. lo client_timeout = 300 read_timeout = 5 max_children = 30 pid_file =/usr/local/coreseek4/var/log/searchd. pid max_matches = 1000 bytes = 1 preopen_indexes = 1 unlink_old = 1 bytes = 1 M max_packet_size = 8 M max_filters = 256 bytes = 4096 max_batch_queries = 32 workers = threads # for RT to work}
Start sphinx

The first step is to insert a time in the secondary table.

INSERT INTO t_blog_time_sphinx (c_time)VALUES(now());

Step 2: Create a primary index and an incremental Index

/usr/local/coreseek4/bin/indexer main_index/usr/local/coreseek4/bin/indexer main_inc_index

The third part is to start the daemon

/usr/local/coreseek4/bin/searchd
Scheduled task

Scheduled tasks need to do so many things.

  1. Real-time re-indexing of the day (incremental index)
  2. Merge incremental indexes to primary indexes at night
  3. The Update Time of the auxiliary table is the current time (usually after several minutes, data is redundant for several minutes to avoid data omission)
# Incremental index/usr/local/coreseek4/bin/indexer t_cover_sphinx_inc_index -- rotate # merge/usr/local/coreseek4/bin/indexer -- merge t_cover_sphinx_index previous -- rotate # modify the previous secondary table merge time update t_blog_time_sph1_set c_time = now () -10*60;
Php Testing Program

In the coreseek test directory, you can find the sphinxapi. php file and copy it to the corresponding location of your php source code.

For details about how to assemble full-text index fields, refer to the official documentation.

// Add sphinx apiinclude ('api/coreseek_sphinxapi.php '); // initialize sphclient $ sphclient = new SphinxClient (); $ sphclient-> setServer ($ ip, $ port ); // set the attribute field if (isset ($ _ GET ["year"]) & strlen ($ _ GET ["year"])> 0) {$ sphinx-> SetFilter ("c_year", array ($ _ GET ["year"]);} // sets the full-text search field $ query = ""; if (isset ($ _ GET ["title"]) & strlen ($ _ GET ["title"])> 0) {$ query. = "| ". trim ($ _ GET ["title"]);} if (isset ($ _ GET ["content"]) & strlen ($ _ GET ["content"])> 0) {$ query. = "| ". trim ($ _ GET ["content"]);} $ query = trim ($ query); // start search, the index must be a primary index and an incremental index $ res = $ sphindexing-> query ($ query, 'main _ inc_index, main_index '); echo"

Query = $ query

"; // Output result. GetLastError and GetLastWarning are used for debugging. Echo"
";  print_r($sphinx->GetLastError());print_r($sphinx->GetLastWarning ());print_r($res); echo "
";

This article from: http://tiankonguse.github.io, original address: http://tiankonguse.github.io/blog/2014/11/06/sphinx-config-and-use/, thanks to the original author to share.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.