- 1. Why use Sphinx
Suppose you run a forum now, the Forum data has been more than 100W, many users reflect the forum search is very slow, then you can consider the use of Sphinx (of course, other full-text search program or method also line).
- 2. What is Sphinx?
Sphinx, a high-performance full-Text search package developed by the Russian Andrew Aksyonoff, is issued under the GPL and Commercial agreement dual license agreement.
Full text retrieval is a kind of information retrieval technology which takes all the textual information of the document as the retrieval object. The retrieved object may be the title of the article, or it may be the author of the article, or it may be the article summary or content.
- 3. features of the Sphinx
L High-speed index (on the new CPU, nearly ten MB/s);
L High-speed search (average query speed in 2-4g text volume is less than 0.1 seconds);
L High Availability (up to a maximum of 100M of text on a single CPU);
L provide a good relevance ranking
L Support distributed search;
L provide document summary generation;
• Search from a plug-in storage engine inside MySQL
L Support Boolean, phrase, and synonyms query;
L support multiple full-text search domains per document (default maximum of 32);
L support multiple attributes per document;
L support word breaking;
L support single-byte encoding and UTF-8 coding;
- 4. Download and install Sphinx
Open the URL http://www.coreseek.cn/news/7/52/find the appropriate version of the operating system, such as I am windows so I can download Coreseek Win32 generic version, Linux can download the source package, build their own installation. Here explains why we download the program called Coreseek,coreseek is based on Sphinx development of a software, Sphinx made some changes, in the Chinese language support than Sphinx, so we use it.
After the download is complete, the program is extracted to the place you want to unzip, such as I want to extract to the e-packing directory, then modify the directory named Coreseek, finished Coreseek installation completed, installed directory is in E:\coreseek\.
- 5. using Sphinx
Here are a few things I need to do to use Sphinx
1) First you have to have data
2) Set up Sphinx configuration file
3) Build Index
4) Start Sphinx
5) Use (Call API or Search.exe program to query)
1th Item: (Import data)
We have a limited number of databases, tables, and data needed to build the tests, which are available in the attachments, and then imported to MySQL after downloading.
2nd: (Setup profile)
Next we need to create a Sphinx configuration file E:\coreseek\etc\mysql.conf and change its contents to the following:
SOURCE MySQL
{
Type = MySQL
Sql_host = localhost
Sql_user = root
Sql_pass =
sql_db = Test
Sql_port = 3306
Sql_query_pre = SET NAMES UTF8
Sql_query = SELECT id,addtime,title,content from post
Sql_attr_timestamp = Addtime
}
Index MySQL
{
Source = MySQL
Path = E:/coreseek/var/data/mysql
Charset_dictpath = e:/coreseek/etc/
Charset_type = Zh_cn.utf-8
}
Searchd
{
Listen = 9312
max_matches = 1000
Pid_file = E:/coreseek/var/log/searchd_mysql.pid
Log = E:/coreseek/var/log/searchd_mysql.log
Query_log = E:/coreseek/var/log/query_mysql.log
}
Let's start by talking about the meaning of each item in this configuration file.
SOURCE mysql{} Defines the name of the feed as MySQL, or it can be called other, for example: source xxx{}
Type data Source types
sql_* data-related configuration, such as Sql_host,sql_pass, these do not explain the bird
Sql_query the query command at the time of indexing, where possible not to use where or group by, the where and groupby content to Sphinx, the Sphinx for conditional filtering and groupby efficiency will be higher, note: select field must include a unique primary key and a field to be retrieved in full-text, and the fields to be used in the Where are also select
Sql_query_pre the SQL command executed before executing sql_query, there can be more than one
Sql_attr the configuration item that begins with this, represents the attribute field, and the fields that appear in the Where,orderby,groupby define an attribute, defining different types of fields to use different property names, such as the above sql_attr_ Timestamp is the timestamp type.
Index mysql{} The name of the definition is MySQL, or it can be called other, for example: Index xxx{}
source, which is defined by sources XXX.
Path index file, such as: E:/coreseek/var/data/mysql is actually stored in the e:/coreseek/var/data/directory, and then create multiple names for the MySQL suffix but different index files
CHARSET_DICTPATH Specifies the location of the dictionary file to be read by the word breaker, which is required when the Word segmentation method is enabled. When using libmmseg as the word breaker, you need to make sure that the dictionary file uni.lib in the specified directory
Charset_type character set, such as Charset_type = ZH_CN.GBK
searchd{} Sphinx Daemon Configuration
Listen Listening port
Max_matches the maximum number of matches, that is, to find more data to return only the 1000 set here
Pid_file pid File path
Log Full-Text search
Query_log query Log
Well, the configuration file is like this, there are many parameters to configure, you can check the documentation yourself.
3rd: (Build index)
Start-to-run input cmd Enter, open command-line tool
E:\coreseek\bin\indexer--config e:\coreseek\etc\mysql.conf--all
This string of things is actually called the indexer program to generate all the indexes
If you only want to index a data source, you can do this: e:\coreseek\bin\indexer--config e:\coreseek\etc\mysql.conf index name (the index name is defined in the configuration file)
--config,--All these are the parameters of the indexer program, and friends who want to know more about the parameters can view the document
After running the command if you do not see fatal,error these things, then the index file even if the build succeeded, for example, I see is
......... Omitted.........
Using config file ' e:\coreseek\etc\mysql.conf ' ...
Indexing index ' MySQL ' ...
Collected 4 docs, 0.0 MB
......... Omitted.........
4th: (Start Sphinx)
The same command line
E:\coreseek\bin\searchd--config e:\coreseek\etc\mysql.conf
After running, I was prompted with a lot of things
Using config file ' e:\coreseek\etc\mysql.conf ' ...
Listening on all interfaces, port=9312
Accepting connections
Don't worry about these birds what is the meaning, anyway Sphinx is started well.
Now there is a string of bird text This command line is not closed, because off the Sphinx also closed, if feel so uncomfortable, you can install Sphinx System Services, in the background to run.
Install system Services simply enter the following command on the command line
E:\coreseek\bin\searchd--config e:\coreseek\etc\mysql.conf--install
After installation Remember to start this service, will not start that I can not, myself Google.
5th step: (using Sphinx)
Create a search directory under the Web root directory (not at the root directory, also in the same directory name), copy E:\coreseek\api\ sphinxapi.php file to the search directory (sphinxapi.php This is the official API for Sphinx), start the PHP program writing.
Create a file in the search directory, the name is anything, I call it index.php, its contents are as follows
<?php
Include ' sphinxapi.php '; Loading the Sphinx API
$SC = new Sphinxclient (); Instantiating the API
$SC->setserver (' localhost ', 9312); Set service side, first parameter Sphinx server address, second Sphinx listening port
$res = $sc->query (' Sphinx ', ' MySQL '); Execute the query, the first parameter of the query keyword, the index name of the second query, the MySQL index name (this is also defined in the configuration file), multiple index names are separated, or you can use * to represent all indexes.
Print_r ($res);
Printing results:
Array
(
......... Omitted.........
[matches] = = Array
(
[2] = = Array
(
[Weight] = 2
[Attrs] = = Array
(
[Addtime] = 1282622004
)
)
[4] = = Array
(
[Weight] = 2
[Attrs] = = Array
(
[Addtime] = 1282622079
)
)
)
......... Omitted.........
)
Matches is the result of the query, but as if it is not the data we want, such as the contents of the Titile,content field is not queried, according to the official note is Sphinx is not connected to MySQL to fetch data, just according to its own index content to calculate, So if we want to use the API provided by Sphinx to get the data we want, we must also query MySQL again to get the data we want based on the result of the query.
The key values in the query results are expressed separately
2 Unique PRIMARY key
Weight weight
Attrs Configuration in Sql_attr_*
At this point, the search engine is finished more than half, the rest of you can do it by themselves.
Like what:
<?php
$ids = Array_keys ($res [' matches ']); Get primary Key
$ids = Join (', ', $ids);
$query = mysql_query ("select * from post WHERE ID in ({$ids})");
while ($row = Mysql_fetch_assoc ($query)) {
.....
}
Sphinx more configuration, the parameters of the program, etc., you can view the Sphinx documentation.
Sphinx Configuration + PHP