Sphinx Configuration + PHP

Source: Internet
Author: User
Tags mysql index

    1. 1. Why use Sphinx

Suppose you run a forum now, the Forum data has been more than 100W, many users reflect the forum search is very slow, then you can consider the use of Sphinx (of course, other full-text search program or method also line).

    1. 2. What is Sphinx?

Sphinx, a high-performance full-Text search package developed by the Russian Andrew Aksyonoff, is issued under the GPL and Commercial agreement dual license agreement.

Full text retrieval is a kind of information retrieval technology which takes all the textual information of the document as the retrieval object. The retrieved object may be the title of the article, or it may be the author of the article, or it may be the article summary or content.

    1. 3. features of the Sphinx

L High-speed index (on the new CPU, nearly ten MB/s);

L High-speed search (average query speed in 2-4g text volume is less than 0.1 seconds);

L High Availability (up to a maximum of 100M of text on a single CPU);

L provide a good relevance ranking

L Support distributed search;

L provide document summary generation;

• Search from a plug-in storage engine inside MySQL

L Support Boolean, phrase, and synonyms query;

L support multiple full-text search domains per document (default maximum of 32);

L support multiple attributes per document;

L support word breaking;

L support single-byte encoding and UTF-8 coding;

    1. 4. Download and install Sphinx

Open the URL http://www.coreseek.cn/news/7/52/find the appropriate version of the operating system, such as I am windows so I can download Coreseek Win32 generic version, Linux can download the source package, build their own installation. Here explains why we download the program called Coreseek,coreseek is based on Sphinx development of a software, Sphinx made some changes, in the Chinese language support than Sphinx, so we use it.

After the download is complete, the program is extracted to the place you want to unzip, such as I want to extract to the e-packing directory, then modify the directory named Coreseek, finished Coreseek installation completed, installed directory is in E:\coreseek\.

    1. 5. using Sphinx

Here are a few things I need to do to use Sphinx

1) First you have to have data

2) Set up Sphinx configuration file

3) Build Index

4) Start Sphinx

5) Use (Call API or Search.exe program to query)

1th Item: (Import data)

We have a limited number of databases, tables, and data needed to build the tests, which are available in the attachments, and then imported to MySQL after downloading.

2nd: (Setup profile)

Next we need to create a Sphinx configuration file E:\coreseek\etc\mysql.conf and change its contents to the following:

SOURCE MySQL

{

Type = MySQL

Sql_host = localhost

Sql_user = root

Sql_pass =

sql_db = Test

Sql_port = 3306

Sql_query_pre = SET NAMES UTF8

Sql_query = SELECT id,addtime,title,content from post

Sql_attr_timestamp = Addtime

}

Index MySQL

{

Source = MySQL

Path = E:/coreseek/var/data/mysql

Charset_dictpath = e:/coreseek/etc/

Charset_type = Zh_cn.utf-8

}

Searchd

{

Listen = 9312

max_matches = 1000

Pid_file = E:/coreseek/var/log/searchd_mysql.pid

Log = E:/coreseek/var/log/searchd_mysql.log

Query_log = E:/coreseek/var/log/query_mysql.log

}

Let's start by talking about the meaning of each item in this configuration file.

SOURCE mysql{} Defines the name of the feed as MySQL, or it can be called other, for example: source xxx{}

Type data Source types

sql_* data-related configuration, such as Sql_host,sql_pass, these do not explain the bird

Sql_query the query command at the time of indexing, where possible not to use where or group by, the where and groupby content to Sphinx, the Sphinx for conditional filtering and groupby efficiency will be higher, note: select field must include a unique primary key and a field to be retrieved in full-text, and the fields to be used in the Where are also select

Sql_query_pre the SQL command executed before executing sql_query, there can be more than one

Sql_attr the configuration item that begins with this, represents the attribute field, and the fields that appear in the Where,orderby,groupby define an attribute, defining different types of fields to use different property names, such as the above sql_attr_ Timestamp is the timestamp type.

Index mysql{} The name of the definition is MySQL, or it can be called other, for example: Index xxx{}

source, which is defined by sources XXX.

Path index file, such as: E:/coreseek/var/data/mysql is actually stored in the e:/coreseek/var/data/directory, and then create multiple names for the MySQL suffix but different index files

CHARSET_DICTPATH Specifies the location of the dictionary file to be read by the word breaker, which is required when the Word segmentation method is enabled. When using libmmseg as the word breaker, you need to make sure that the dictionary file uni.lib in the specified directory

Charset_type character set, such as Charset_type = ZH_CN.GBK

searchd{} Sphinx Daemon Configuration

Listen Listening port

Max_matches the maximum number of matches, that is, to find more data to return only the 1000 set here

Pid_file pid File path

Log Full-Text search

Query_log query Log

Well, the configuration file is like this, there are many parameters to configure, you can check the documentation yourself.

3rd: (Build index)

Start-to-run input cmd Enter, open command-line tool

E:\coreseek\bin\indexer--config e:\coreseek\etc\mysql.conf--all

This string of things is actually called the indexer program to generate all the indexes

If you only want to index a data source, you can do this: e:\coreseek\bin\indexer--config e:\coreseek\etc\mysql.conf index name (the index name is defined in the configuration file)

--config,--All these are the parameters of the indexer program, and friends who want to know more about the parameters can view the document

After running the command if you do not see fatal,error these things, then the index file even if the build succeeded, for example, I see is

......... Omitted.........

Using config file ' e:\coreseek\etc\mysql.conf ' ...

Indexing index ' MySQL ' ...

Collected 4 docs, 0.0 MB

......... Omitted.........

4th: (Start Sphinx)

The same command line

E:\coreseek\bin\searchd--config e:\coreseek\etc\mysql.conf

After running, I was prompted with a lot of things

Using config file ' e:\coreseek\etc\mysql.conf ' ...

Listening on all interfaces, port=9312

Accepting connections

Don't worry about these birds what is the meaning, anyway Sphinx is started well.

Now there is a string of bird text This command line is not closed, because off the Sphinx also closed, if feel so uncomfortable, you can install Sphinx System Services, in the background to run.

Install system Services simply enter the following command on the command line

E:\coreseek\bin\searchd--config e:\coreseek\etc\mysql.conf--install

After installation Remember to start this service, will not start that I can not, myself Google.

5th step: (using Sphinx)

Create a search directory under the Web root directory (not at the root directory, also in the same directory name), copy E:\coreseek\api\ sphinxapi.php file to the search directory (sphinxapi.php This is the official API for Sphinx), start the PHP program writing.

Create a file in the search directory, the name is anything, I call it index.php, its contents are as follows

<?php

Include ' sphinxapi.php '; Loading the Sphinx API

$SC = new Sphinxclient (); Instantiating the API

$SC->setserver (' localhost ', 9312); Set service side, first parameter Sphinx server address, second Sphinx listening port

$res = $sc->query (' Sphinx ', ' MySQL '); Execute the query, the first parameter of the query keyword, the index name of the second query, the MySQL index name (this is also defined in the configuration file), multiple index names are separated, or you can use * to represent all indexes.

Print_r ($res);

Printing results:

Array

(

......... Omitted.........

[matches] = = Array

(

[2] = = Array

(

[Weight] = 2

[Attrs] = = Array

(

[Addtime] = 1282622004

)

)

[4] = = Array

(

[Weight] = 2

[Attrs] = = Array

(

[Addtime] = 1282622079

)

)

)

......... Omitted.........

)

Matches is the result of the query, but as if it is not the data we want, such as the contents of the Titile,content field is not queried, according to the official note is Sphinx is not connected to MySQL to fetch data, just according to its own index content to calculate, So if we want to use the API provided by Sphinx to get the data we want, we must also query MySQL again to get the data we want based on the result of the query.

The key values in the query results are expressed separately

2 Unique PRIMARY key

Weight weight

Attrs Configuration in Sql_attr_*

At this point, the search engine is finished more than half, the rest of you can do it by themselves.

Like what:

<?php

$ids = Array_keys ($res [' matches ']); Get primary Key

$ids = Join (', ', $ids);

$query = mysql_query ("select * from post WHERE ID in ({$ids})");

while ($row = Mysql_fetch_assoc ($query)) {

.....

}

Sphinx more configuration, the parameters of the program, etc., you can view the Sphinx documentation.

Sphinx Configuration + PHP

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.