[Sphinx] Full-text index Sphinx usage configuration

Source: Internet
Author: User

------------------------------------------------------------------------------------

Search is divided into two types:

1. Search for structured data: SQL statements query the content stored in the database.

2. Search for unstructured data: text, images, full-text search.

The full-text search is divided into two categories:

1. Sequential scan: Like query in SQL or regexp regular query.

2. Index scanning: The unstructured data extraction part (such as phrase) is reorganized, so that it is structured, the extracted data is indexed.

A full-text search using an index consists of two procedures:

1. Index creation: Extracts the phrase in the content and the ID of the text, creating an index table.

2. Search index: Split the search content into phrases, go to the Index table to match the ID, find out the text content.

  

How to create an index:

1. Give the document that you want to create the index to a sub-phrase (tokenizer);

What you do: Create a separate word for the document, remove the punctuation, remove the stop word (the, a, yes), and each language has a set of stop words for each phrase,

The resulting result becomes the lexical element after the sub-phrase.

2. Pass the resulting Word element (Token) to the Language processing component (linguistic processer);

The language processing component makes the same language processing of the resulting lexical elements: for example, the English becomes lowercase (lowercase), the word is reduced to root form, such as "cars" to "car" and so on (stemming), the word into a root form, such as "drove" to "drive" (Lemmatization),

3. Pass the resulting word (term) to the index component (Indexer);

The index component does the following things: Use the resulting word (term) and document ID, create a dictionary, sort the dictionary alphabetically, and merge the same words into the document inverted (Posting list) linked list.

How to search for an index:

1. Enter the query statement and submit it to Sphinx.

2. Sphinx The lexical analysis, grammar analysis and language processing of query statements.

3. Search the index to get a document that conforms to the syntax tree.

4. Sort the results based on the relevance of the resulting document and query statement.

Sphinx is the SQL Parse index (query phrase index) abbreviation, SQL-based full-text search engine. Coreseek supports the Chinese full-text search engine.

Advantages of Sphinx :

High-speed indexing (up to 10m/seconds on a modern CPU)

High-performance search (average per-retrieval response time is less than 0.1 seconds on 2-4g text data)

Processing of large amounts of data (currently known to process more than 100GB of text data, on a single CPU system can process 100M documents);

An excellent correlation algorithm is provided, which is based on the phrase similarity and the statistical BM2 of the composite ranking method.

Support distributed search;

Provides document fragments (summary and highlight) generation capabilities;

Search service available as a storage engine for MySQL;

Support Boolean, phrase, word similarity and other search methods;

Document supports multiple full-text search fields (max. 32);

Disadvantages of Sphinx :

Must have a primary key

The primary key must be an orthopedic

Not responsible for data storage

Configuration not flexible

Use of Sphinx :

1. Download Sphinx:

wget http://sphinxsearch.com/files/sphinx-2.2.8-release.tar.gz

Compile and install :

cd/public/sphinx-2.2.8

./configure--prefix=/usr/local/sphinx--with-mysql=/usr/local/mysql

(RPM-Installed MySQL:./configure--prefix=/usr/local/sphinx--with-mysql=/usr)

Make && make install

The installation is complete to get four directories:

Bin: Hold command, indexer index component, SEARCHD process

ETC: Configuration Document

VAR: Storing Index Table

To create a database table:

Show database; View all databases

Create DATABASE test;

CREATE TABLE Post (

ID int unsigned auto_increment primary key,

Title varchar (255) NOT null default ' ',

Content text Default NULL

) Engine=myisam default Charset=utf8;

Desc Post; View table Structure

Insert into post (title, content) VALUES ("Linux", "linux11111");

2. Configure Sphinx:

cd/usr/local/sphinx/etc/

CP sphinx.conf.dist sphinx.conf//backup configuration file to prevent error modification

Vim sphinx.conf

configuration file Structure: # Main data source, (main name can change) source main{
Type = MySQL #数据库类型
Sql_host = localhost #MySQL主机IP
Sql_user = Test #MySQL用户名
Sql_pass = #MySQL密码
sql_db = Test #MySQL数据库
Sql_port = 3306 #MySQL端口
Sql_sock =/tmp/mysql.sock #Linux下需要开启, specifying sock file
Sql_query_pre = SET NAMES UTF8 #MySQL检索编码
Sql_query_pre = SET SESSION query_cache_type=off #关闭缓存
Sql_query = \ #获取数据的SQL语句
SELECT ID, title, content from Post
#sql_attr_uint = group_id #对排序字段进行注释, using the Sphinx Document table by default, this does not require
#sql_attr_timestamp = date_added #对排序字段进行注释
}# incremental data source, inheriting the main data source main throttled:main{}# primary index, (main name can be changed) index main{
Source = Main
Path =/usr/local/sphinx/var/data/main}# Incremental index test1 stemmed:test1{}# distributed index, distributed Indexindex dist1{}# Real-time index, R Ealtime indexindex rt{}# Indexer, (adjust minimum memory to best) indexer{
Mem_limit = 256M #内存大小限制, default 128M, recommended 256M
#其它用默认即可}# service process, (listening port number) searched{
#全部默认即可, the default port number is 9312}

# Public Configuration
common{
}

: Set nu//Display line number,: Set Nonu cancel line number

: 311,314s/^/#/g//Note Incremental data source

: 628,632s/^/#/g//Note Incremental index

: 639,696s/^/#/g//Comment Distributed index

3. Create an index :

CREATE INDEX command: Indexer

-C Specify configuration file

--all to reindex all indexes

--rotate is used to rotate the index, increasing the index when the service is not stopped (Searchd runtime), and the SEARCHD runtime does not add an error.

--merge Merge Index, when the incremental index is merged into the primary index

Build all indexes:/usr/local/sphinx/bin/indexer-c/usr/local/sphinx/etc/sphinx.conf--all

Or build the specified index:/usr/local/sphinx/bin/indexer-c/usr/local/sphinx/etc/sphinx.conf Main

(1) If an error occurs here:

"Error:index ' main ': Sql_connect:can ' t connect to local MySQL server through socket '/tmp/mysql.sock '"

Failed to find the/tmp/mysql.sock, find the location via the Find/-name mysql.sock-print, and change it correctly in the configuration sphinx.conf.

such as: Mysql_sock =/var/lib/mysql/mysql.sock Save exit.

(2) Continue to create INDEX, warning:

"Warning:attribute count is 0:switching to none DocInfo"

Change sphinx.conf in the DocInfo = none there is no warning. (Http://sphinxsearch.com/docs/current.html#conf-docinfo)

Create index The following prompt indicates that the build succeeded:

Link:http://www.cnblogs.com/farwish/p/3961962.html

@ Black eyed poet <farwish.com>

[Sphinx] Full-text index Sphinx usage configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.