Configuration and management of sphinx

Source: Internet
Author: User

There are many configuration documents on the Internet, but there is always a problem with their documents. So I spent some time researching them and writing them as a summary for future reference. I also hope that my friends who want to learn sphtracing can avoid detours. For how to install coreseek, see http://blog.chinaunix.net/uid-2063795-id-3261834.html.

1. Sphinx Configuration

  1. Introduction to the Sphinx configuration file structure

The Sphinx configuration file structure is as follows:

Source source name 1 {

# Add a data source. Here, some database connection parameters are set, such as the IP address, user name, and password of the database.

# Setting SQL _query, setting SQL _query_pre, and setting SQL _query_range will be detailed in the following examples

......

}

Index name 1 {

Source = source name 1

# Set full-text index

......

}

Indexer {

# Set indexer program configuration options, such as memory restrictions

......

}

Searchd {

# Set parameters of the searchd daemon

......

}

Both source and index can be configured.

 

  1. Spinx configuration case description

Next we will introduce the configuration in detail for a configuration case:

# Define a data source

Source search_main

{

# Define database types

Type = MySQL

# Define the database IP address or computer name

SQL _host = localhost

# Define the account used to connect to the database

SQL _user = root

# Define the password of the connected database

SQL _pass = test123

# Define the Database Name

SQL _db = test

# Define the SQL statement executed before the database is connected and the data is retrieved

SQL _query_pre = set names utf8

SQL _query_pre = set session query_cache_type = off

# Create a sph_counter for incremental Indexing

SQL _query_pre = CREATE TABLE if not exists sph_counter \

(Counter_id integer primary key not null, max_doc_id integer not null)

# Before getting data, record the maximum table ID to the sph_counter table

SQL _query_pre = replace into sph_counter select 1, max (searchid) from v9_search

# Define the SQL statement for Data fetch. The ID column in the first column must be a unique positive integer value.

SQL _query = select searchid, typeid, ID, adddate, data from v9_search where \

Searchid <(select max_doc_id from sph_counter where counter_id = 1 )\

And searchid> = $ start and searchid <= $ end

# SQL _attr_uint and SQL _attr_timestamp are used to define multiple columns for API filtering or sorting.

SQL _attr_uint = typeid

SQL _attr_uint = ID

SQL _attr_timestamp = adddate

# Partition query settings

SQL _query_range = select Min (searchid), max (searchid) from v9_search

# Partition query step size

SQL _range_step = 1000

# Set the interval of partition Query

SQL _ranged_throttle = 0

# For CLI debugging

SQL _query_info = select * From v9_search where searchid = $ ID

}

# Define an incremental Source

Source search_main_delta: search_main

{

SQL _query_pre = set names utf8

# The incremental source only queries the newly added data after the last primary index is generated.

# If the newly added searchid is smaller than the searchid when the primary index is created, it will be missed.

SQL _query = select searchid, typeid, ID, adddate, data from v9_search where \

Searchid> (select max_doc_id from sph_counter where counter_id = 1 )\

And searchid> = $ start and searchid <= $ end

SQL _query_range = select Min (searchid), max (searchid) from v9_search where \

Searchid> (select max_doc_id from sph_counter where counter_id = 1)

}

 

# Define an index_search_main Index

Index index_search_main

{

# Set the index Source

Source = search_main

# Set the storage path of the generated index

Path =/usr/local/coreseek/var/data/index_search_main

# Define the storage mode of document information. extern indicates separate storage of document information and document ID

Docinfo = extern

# Set the Memory Lock for cached data. If the value is 0, the data is not locked.

Mlock = 0

# Set the word-form processor list. If it is set to none, no word-form processor is used.

Morphology = none

# Define the minimum index term length

Min_word_len = 1

# Set the character set encoding type. The utf8 encoding used here is consistent with that used in the database.

Charset_type = zh_cn.utf-8

# Specify the location where word segmentation reads dictionary files

Charset_dictpath =/usr/local/mmseg3/etc

# Tables in the Word files that are not searched.

Stopwords =/usr/local/coreseek/var/data/stopwords.txt

# Define whether to retrieve the HTML tag from the input full-text data

Html_strip = 0

}

# Defining incremental Indexes

Index index_search_main_delta: index_search_main

{

Source = search_main_delta

Path =/usr/local/coreseek/var/data/index_search_main_delta

}

 

# Define indexer configuration options

Indexer

{

# Define the index generation restrictions

Mem_limit = 512 m

}

 

# Define options for the searchd daemon

Searchd

{

# Define the IP address and port of the listener

# Listen = 127.0.0.1

# Listen = 172.16.88.100: 3312

Listen = 3312.

Listen =/var/run/searchd. Sock

# Define the log location

Log =/usr/local/coreseek/var/log/searchd. Log

# Define the location of the query log

Query_log =/usr/local/coreseek/var/log/query. Log

# Define the read timeout time for network client requests

Read_timeout = 5

# Define the maximum number of sub-Processes

Max_children = 300

# Set the PID File Name of the searchd Process

Pid_file =/usr/local/coreseek/var/log/searchd. PID

# Define the maximum number of matches that the daemon maintains for each index in memory and returns to the client

Max_matches = 100000

# Enable seamless rotation to prevent searchd rotation from stopping the response when indexing a large amount of data needs to be prefetch

# That is to say, queries are available at any time, old indexes, or new indexes are used.

Seamless_rotate = 1

# Force re-open all index files at startup

Preopen_indexes = 1

# Delete an index copy with the extension. Old after the index rotation is set

Unlink_old = 1

# MVA updates the pool size. This parameter is not clear.

Mva_updates_pool = 1 m

# Maximum allowed Package Size

Max_packet_size = 32 m

# Maximum number of allowed filters

Max_filters = 256

# Maximum number of values allowed for each filter

Max_filter_values = 4096

}

 

Ii. sphtracing Management

  1. Generate a Chinese Word Segmentation Dictionary (the new version of the Chinese word segmentation dictionary has been generated in the/usr/local/mmseg3/etc directory)

CD/usr/local/mmseg3/etc

/Usr/local/mmseg3/bin/mmseg-u thesaurus.txt

MV thesaurus.txt. Uni uni. Lib

  1. Generate a Chinese synonym library for Sphinx

# The synonym library means that when you search for Shenzhen, words including Shenzhen Bay will also be searched.

/Data/software/sphinx/coreseek-3.2.14/mmseg-3.2.14/script/build_thesaurus.py unigram.txt> thesaurus.txt

/Usr/local/mmseg3/bin/mmseg-T thesaurus.txt

Put thesaurus. Lib in the same directory as uni. Lib.

  1. Generate all indexes

/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphek. conf-all

If the searchd daemon has been started, add the-rotate parameter:

/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphek. conf -- all -- rotate

  1. Start the searchd daemon

/Usr/local/coreseek/bin/searchd -- config/usr/local/coreseek/etc/sphek. conf

  1. Generate Primary Index

Write a shell script and add it to the crontab task. Set it to rebuild the primary index at every day.

/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphek. conf -- rotate index_search_main

  1. Generate incremental Index

Write a shell script and add it to the crontab task. Set it to run every 10 minutes.

/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphek. conf -- rotate index_search_main_delta

  1. Merge incremental indexes and primary Indexes

Write a shell script and add it to a scheduled task. It runs once every 15 minutes.

/Usr/local/coreseek/bin/indexer -- config/usr/local/coreseek/etc/sphsf-. conf -- merge index_search_main index_search_main_delta -- rotate

  1. Use the SEARCH Command to search indexes on the command line

/Usr/local/coreseek/bin/search -- config/usr/local/coreseek/etc/sphek. conf game

 

 

Iii. References:

Http://baobeituping.iteye.com/blog/870354

Http://www.sphinxsearch.org/sphinx-tutorial

Http://blog.s135.com/post/360/

Http://youngerblue.iteye.com/blog/1513140

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.