Install and configure the application in Sphinx

Source: Internet
Author: User
Tags mysql host automake

Install and configure the application in Sphinx

Sphoff is a full-text search engine developed by Russian Andrew Aksyonoff. It is intended to provide other applications with full-text search functions featuring high speed, space occupation, and high result relevance. Sphinx can be easily integrated with SQL databases and scripting languages. Currently, the built-in MysqL and PostgreSQL database data sources Support reading xml data in a specific format from standard input. By modifying the source code, you can add new data sources (for example, native support for other types of DBMS)

1. sphsf-chinese Word Segmentation

Chinese full-text search is based on semantics. Currently, most databases do not support Chinese full-text search, such as Mysql. If you need to search the full text of Chinese characters, you also need some plug-ins, such as coreseek and sfc.

  • Coreseek is the most widely used sphinx full-text search. It provides the Chinese word segmentation package LibMMSeg designed for Sphinx. It also provides binary distributions for multiple systems, including rpm, deb, and Binary packages in windows.
  • Sfc is another chinese Word Segmentation plug-in provided by happy brother. Its Chinese Dictionary uses xdict. According to its introduction, after testing, the current version of the index speed (Linux testing platform) can basically reach half of the index UTF-8 English, that is, the official claim that the speed of half. (Time is mainly used for word splitting ).

2. Installation

There are two ways to apply Sphinx on mysql:
(1) Use API calls, such as querying using API functions or methods such as PHP and java. The advantage is that mysql does not need to be re-compiled, the server process is "low coupling", and the program can be called flexibly and conveniently. The disadvantage is that some programs need to be modified under the conditions of existing search programs. Recommended for programmers.
(2) Use the plug-in method (sphinxSE) to compile sphashes into a mysql plug-in and use specific SQL statements for retrieval. It is characterized by convenient combination on the SQL end and direct return of data to the client without secondary queries (Note). In the program, you only need to modify the corresponding SQL, however, this is inconvenient for programs developed using frameworks, such as using ORM. In addition, you also need to recompile mysql, And the mysql-5.1 version and above support plug-in storage. This method can be used by the system administrator.

Use the first method for installation:

Software environment:

  • Operating System: CentOS-5.2
  • Database: mysql-5.0.77-3.el5 mysql-devel (if you want to use the sphinxSE plug-in for storage, use mysql-5.1 or a later version)
  • Compilation software: gcc-c ++ autoconf automake
  • Sphinx: sph0000-0.9.9 (latest stable version)

Installation:

  • [Root @ localhost ~] # Yum install-y mysql-devel
  • [Root @ localhost ~] # Yum install-y automake autoconf
  • [Root @ localhost ~] # Cd/usr/local/src/
  • [Root @ localhost src] # wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
  • [Root @ localhost src] # tar zxvf sphinx-0.9.9.tar.gz
  • [Root @ localhost local] # cd sphinx-0.9.9
  • [Root @ localhost sphinx-0.9.9] #./configure-prefix =/usr/local/sphfix # Note: sphfix supports mysql by default.
  • [Root @ localhost sph0000- 0.9.9] # make & make install # The "warning" can be ignored.

After the installation is complete, check whether there are three directories, bin etc var, in/usr/local/sphinx. If so, the installation is correct!

Sfc Installation

Coreseek Installation

3. Configuration

  • [Root @ localhost ~] # Cd/usr/local/sphinx/etc # enter the sphinx configuration file directory
  • [Root @ localhost etc] # cp sphinx. conf. dist sphinx. conf # create a Sphinx configuration file
  • [Root @ localhost etc] # vim sphinx. conf # edit sphinx. conf

Specific instance configuration file:

##### Index source ###########
Source article_src
{
Type = mysql ##### Data Source type
SQL _host = 192.168.1.10 ###### mysql host
SQL _user = root ####### mysql user name
SQL _pass = pwd ############ mysql password
SQL _db = test ######## mysql Database Name
SQL _port = 3306 ########## mysql Port
SQL _query_pre = SET NAMES UTF8 ### mysql retrieval encoding. Pay special attention to this. Many people cannot retrieve Chinese characters because the database encoding is GBK or other non-UTF8
SQL _query = SELECT id, title, cat_id, member_id, content, created FROM sphinx_article ###### SQL statement for obtaining data

##### Attributes used for filtering or conditional query ############

SQL _attr_uint = cat_id ######## unsigned integer attribute
SQL _attr_uint = member_id
SQL _attr_timestamp = created ############ UNIX timestamp attribute

SQL _query_info = select * from sphinx_article where id = $ id ######### test the command interface (CLI) Call

}

### Index ###

Index article
{
Source = article_src #### declare the index source
Path =/usr/local/sphinx/var/data/article ####### index file storage path and index file name
Docinfo = extern ##### Document Information Storage Method
Mlock = 0 ### cache Data Memory Lock
Morphology = none #### morphology (invalid for Chinese)
Min_word_len = 1 #### minimum length of indexed words
Charset_type = UTF-8 ##### Data Encoding

##### Sequence table. Note: if this method is used, sphsequence splits Chinese words,
##### Index words. To use Chinese word segmentation, you must use other word segmentation plug-ins such as coreseek and sfc.

Charset_table = U + ff10 .. U + FF19-> 0 .. 9, 0 .. 9, U + FF41 .. U + FF5A-> .. z, U + ff21 .. U + FF3A-> .. z

}

######### Indexer configuration #####
Indexer
{
Mem_limit = 256 M ###### memory limit
}

########### Sphinx service process ########
Searchd
{
# Listen = 9312 ### listener port. In this version, the official port 9312 has been officially authorized in IANA, and the default port 3312 in earlier versions is

Log =/usr/local/sphinx/var/log/searchd. log #### service process logs. If an sphworker exception occurs, you can query the valid information here. You can find the answer to the rotate error.
Query_log =/usr/local/sphinx/var/log/query. log ### query logs on the client. Note: to collect statistics on some keywords, you can analyze this log file.
Read_timeout = 5 # request timeout
Max_children = 30 ### maximum number of searchd processes that can be executed simultaneously
Pid_file =/usr/local/sphinx/var/log/searchd. pid ####### process ID file
Max_matches = 1000 ### maximum number of returned results
Seamless_rotate = 1 ### whether seamless switchover is supported. It is usually required for incremental indexing.
}

4. Create an index

[Root @ localhost sphinx] # bin/indexer-c etc/sphinx. conf article ### create the index file command, replace article with -- all to create all the indexes in the configuration file
Sph00000.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

Using config file 'etc/sphexample. conf '...
Indexing index 'Article '...
Collected 1000 docs, 0.2 MB
Sorted 0.4 Mhits, 99.6% done
Total 1000 docs, 210559 bytes
Total 3.585 sec, 58723 bytes/sec, 278.89 docs/sec
Total 2 reads, 0.031 sec, 1428.8 kb/call avg, 15.6 msec/call avg
Total 11 writes, 0.032 sec, 671.6 kb/call avg, 2.9 msec/call avg

5. Applications

In the previous step, we have created an index. Now we have tested the index We just created. There are two methods for testing: CLI side and API call

(1) The command test on the CLI is to use the search command that comes with sphinx: search

[Root @ localhost sphinx] # bin/search-c etc/sphinx. conf Liu Li
Sph00000.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

Using config file 'etc/sphexample. conf '...

Index 'mdmloginlog': query 'lily': returned 6 matches of 6 total in 0.000 sec

Displaying matches:
1. document = 2, weight = 2
2. document = 3, weight = 2
3. document = 4, weight = 2
4. document = 5, weight = 2
5. document = 7, weight = 2
6. document = 8, weight = 2

Words:
1. 'Liu ': 6 documents, 6 hits
2. 'lil': 6 users, 6 hits

(2) Use PHP APIs for testing. Before testing, start the sphinx service process and open port 9312 to the centos firewall.

[Root @ localhost sphinx] # bin/searchd-c etc/sphinx. conf & ### run sphinx in the background
[1] 5759
[Root @ localhost sph1_] # sph1_0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

Using config file 'etc/sphexample. conf '...
Listening on all interfaces, port = 9312

[1] + Done bin/searchd-c etc/sphinx. conf

Sphinx + MySQL + PHP 1.2 billion DNS data query in seconds

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.