There are many configuration documents on the Internet, but there is always a problem with their documents. So I spent some time researching them and writing them as a summary for future reference. I also hope that my friends who want to learn sphtracing can avoid detours. For how to install coreseek, see http://blog.chinaunix.net/uid-2063795-id-3261834.html.
1. Sphinx Configuration
- Introduction to the Sphinx configuration file structure
The Sphinx configuration file structure is as follows:
Source source name 1 {
# Add a data source. Here, some database connection parameters are set, such as the IP address, user name, and password of the database.
# Setting SQL _query, setting SQL _query_pre, and setting SQL _query_range will be detailed in the following examples
......
}
Index name 1 {
Source = source name 1
# Set full-text index
......
}
Indexer {
# Set indexer program configuration options, such as memory restrictions
......
}
Searchd {
# Set parameters of the searchd daemon
......
}
Both source and index can be configured.
- Spinx configuration case description
Next we will introduce the configuration in detail for a configuration case:
# Define a data source
Source search_main
{
# Define database types
Type = MySQL
# Define the database IP address or computer name
SQL _host = localhost
# Define the account used to connect to the database
SQL _user = root
# Define the password of the connected database
SQL _pass = test123
# Define the Database Name
SQL _db = test
# Define the SQL statement executed before the database is connected and the data is retrieved
SQL _query_pre = set names utf8
SQL _query_pre = set session query_cache_type = off
# Create a sph_counter for incremental Indexing
SQL _query_pre = CREATE TABLE if not exists sph_counter \
(Counter_id integer primary key not null, max_doc_id integer not null)
# Before getting data, record the maximum table ID to the sph_counter table
SQL _query_pre = replace into sph_counter select 1, max (searchid) from v9_search
# Define the SQL statement for Data fetch. The ID column in the first column must be a unique positive integer value.
SQL _query = select searchid, typeid, ID, adddate, data from v9_search where \
Searchid <(select max_doc_id from sph_counter where counter_id = 1 )\
And searchid> = $ start and searchid <= $ end
# SQL _attr_uint and SQL _attr_timestamp are used to define multiple columns for API filtering or sorting.
SQL _attr_uint = typeid
SQL _attr_uint = ID
SQL _attr_timestamp = adddate
# Partition query settings
SQL _query_range = select Min (searchid), max (searchid) from v9_search
# Partition query step size
SQL _range_step = 1000
# Set the interval of partition Query
SQL _ranged_throttle = 0
# For CLI debugging
SQL _query_info = select * From v9_search where searchid = $ ID
}
# Define an incremental Source
Source search_main_delta: search_main
{
SQL _query_pre = set names utf8
# The incremental source only queries the newly added data after the last primary index is generated.
# If the newly added searchid is smaller than the searchid when the primary index is created, it will be missed.
SQL _query = select searchid, typeid, ID, adddate, data from v9_search where \
Searchid> (select max_doc_id from sph_counter where counter_id = 1 )\
And searchid> = $ start and searchid <= $ end
SQL _query_range = select Min (searchid), max (searchid) from v9_search where \
Searchid> (select max_doc_id from sph_counter where counter_id = 1)
}
# Define an index_search_main Index
Index index_search_main
{
# Set the index Source
Source = search_main
# Set the storage path of the generated index
Path =/usr/local/coreseek/var/data/index_search_main
# Define the storage mode of document information. extern indicates separate storage of document information and document ID
Docinfo = extern
# Set the Memory Lock for cached data. If the value is 0, the data is not locked.
Mlock = 0
# Set the word-form processor list. If it is set to none, no word-form processor is used.
Morphology = none
# Define the minimum index term length
Min_word_len = 1
# Set the character set encoding type. The utf8 encoding used here is consistent with that used in the database.
Charset_type = zh_cn.utf-8
# Specify the location where word segmentation reads dictionary files
Charset_dictpath =/usr/local/mmseg3/etc
# Tables in the Word files that are not searched.
Stopwords =/usr/local/coreseek/var/data/stopwords.txt
# Define whether to retrieve the HTML tag from the input full-text data
Html_strip = 0
}
# Defining incremental Indexes
Index index_search_main_delta: index_search_main
{
Source = search_main_delta
Path =/usr/local/coreseek/var/data/index_search_main_delta
}
# Define indexer configuration options
Indexer
{
# Define the index generation restrictions
Mem_limit = 512 m
}
# Define options for the searchd daemon
Searchd
{
# Define the IP address and port of the listener
# Listen = 127.0.0.1
# Listen = 172.16.88.100: 3312
Listen = 3312.
Listen =/var/run/searchd. Sock
# Define the log location
Log =/usr/local/coreseek/var/log/searchd. Log
# Define the location of the query log
Query_log =/usr/local/coreseek/var/log/query. Log
# Define the read timeout time for network client requests
Read_timeout = 5
# Define the maximum number of sub-Processes
Max_children = 300
# Set the PID File Name of the searchd Process
Pid_file =/usr/local/coreseek/var/log/searchd. PID
# Define the maximum number of matches that the daemon maintains for each index in memory and returns to the client
Max_matches = 100000
# Enable seamless rotation to prevent searchd rotation from stopping the response when indexing a large amount of data needs to be prefetch
# That is to say, queries are available at any time, old indexes, or new indexes are used.
Seamless_rotate = 1
# Force re-open all index files at startup
Preopen_indexes = 1
# Delete an index copy with the extension. Old after the index rotation is set
Unlink_old = 1
# MVA updates the pool size. This parameter is not clear.
Mva_updates_pool = 1 m
# Maximum allowed Package Size
Max_packet_size = 32 m
# Maximum number of allowed filters
Max_filters = 256
# Maximum number of values allowed for each filter
Max_filter_values = 4096
}
Ii. sphtracing Management
- Generate a Chinese Word Segmentation Dictionary (the new version of the Chinese word segmentation dictionary has been generated in the/usr/local/mmseg3/etc directory)
CD/usr/local/mmseg3/etc
/Usr/local/mmseg3/bin/mmseg-u thesaurus.txt
MV thesaurus.txt. Uni uni. Lib
- Generate a Chinese synonym library for Sphinx
# The synonym library means that when you search for Shenzhen, words including Shenzhen Bay will also be searched.
/Data/software/sphinx/coreseek-3.2.14/mmseg-3.2.14/script/build_thesaurus.py unigram.txt> thesaurus.txt
/Usr/local/mmseg3/bin/mmseg-T thesaurus.txt
Put thesaurus. Lib in the same directory as uni. Lib.
- Generate all indexes
/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphek. conf-all
If the searchd daemon has been started, add the-rotate parameter:
/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphek. conf -- all -- rotate
- Start the searchd daemon
/Usr/local/coreseek/bin/searchd -- config/usr/local/coreseek/etc/sphek. conf
- Generate Primary Index
Write a shell script and add it to the crontab task. Set it to rebuild the primary index at every day.
/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphek. conf -- rotate index_search_main
- Generate incremental Index
Write a shell script and add it to the crontab task. Set it to run every 10 minutes.
/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphek. conf -- rotate index_search_main_delta
- Merge incremental indexes and primary Indexes
Write a shell script and add it to a scheduled task. It runs once every 15 minutes.
/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphsf-. conf -- merge index_search_main index_search_main_delta -- rotate
- Use the SEARCH Command to search indexes on the command line
/Usr/local/coreseek/bin/search -- config/usr/local/coreseek/etc/sphek. conf game
Iii. References:
Http://baobeituping.iteye.com/blog/870354
Http://www.sphinxsearch.org/sphinx-tutorial
Http://blog.s135.com/post/360/
Http://youngerblue.iteye.com/blog/1513140