Applying Php+sphinx to build efficient in-site search engines

Source: Internet
Author: User
Using Php+sphinx to build efficient in-site search engines
1. Why Use Sphinx

Suppose you run a forum now, the Forum data has been more than 100W, many users reflect the forum search is very slow, then you can consider the use of Sphinx (of course, other full-text search program or method also line).

What is 2.Sphinx?

Sphinx, a high-performance full-Text search package developed by the Russian Andrew Aksyonoff, is issued under the GPL and Commercial agreement dual license agreement.
Full text retrieval is a kind of information retrieval technology which takes all the textual information of the document as the retrieval object. The retrieved object may be the title of the article, or it may be the author of the article, or it may be the article summary or content.

Features of the 3.Sphinx

? High-speed index (on the new CPU, nearly ten MB/s);
High-speed search (the average query speed in 2-4g text volume is less than 0.1 seconds);
High availability (up to a maximum of 100M of text on a single CPU);
? provide a good relevance ranking
? support distributed search;
? provide document summary generation;
? provide search from the plug-in storage engine inside MySQL
Support Boolean, phrase, and synonyms query;
Support multiple full-text search domains per document (default maximum of 32);
Support multiple attributes per document;
Support word breaking;
Support single byte encoding and UTF-8 encoding;

4. Download and install Sphinx

Open the URL http://www.coreseek.cn/news/7/52/find the appropriate version of the operating system, such as I am windows so I can download Coreseek Win32 generic version, Linux can download the source package, build their own installation. Here explains why we download the program called Coreseek,coreseek is based on Sphinx development of a software, Sphinx made some changes, in the Chinese language support than Sphinx, so we use it.
After the download is complete, the program is extracted to the place you want to unzip, such as I want to extract to the e-packing directory, then modify the directory named Coreseek, finished Coreseek installation completed, installed directory is in E:\coreseek\.

5. Using Sphinx

Here are a few things I need to do to use Sphinx
1) First you have to have data
2) Set up Sphinx configuration file
3) Build Index
4) Start Sphinx
5) Use (Call API or Search.exe program to query)

1th Item: (Import data)
We have a limited number of databases, tables, and data needed to build the tests, which are available in the attachments, and then imported to MySQL after downloading.

2nd: (Setup profile)
Next we need to create a Sphinx configuration file E:\coreseek\etc\mysql.conf and change its contents to the following:
SOURCE MySQL
{
type= MySQL
sql_host= localhost
sql_user= Root
sql_pass=
sql_db= Test
Sql_port= 3306
sql_query_pre= SET NAMES UTF8
sql_query= SELECT id,addtime,title,content from Post
sql_attr_timestamp= Addtime
}

Index MySQL
{
source= MySQL
Path= E:/coreseek/var/data/mysql
Charset_dictpath= e:/coreseek/etc/
Charset_type= Zh_cn.utf-8
}

Searchd
{
listen= 9312
max_matches= 1000
Pid_file= E:/coreseek/var/log/searchd_mysql.pid
log= E:/coreseek/var/log/searchd_mysql.log
query_log= E:/coreseek/var/log/query_mysql.log
}

Let's start by talking about the meaning of each item in this configuration file.
SOURCE mysql{} Defines the name of the feed as MySQL, or it can be called other, for example: source xxx{}
Type data Source types
sql_* data-related configuration, such as Sql_host,sql_pass, these do not explain the bird
Sql_query the query command at the time of indexing, where possible not to use where or group by, the where and groupby content to Sphinx, the Sphinx for conditional filtering and groupby efficiency will be higher, note: select field must include a unique primary key and a field to be retrieved in full-text, and the fields to be used in the Where are also select
Sql_query_pre the SQL command executed before executing sql_query, there can be more than one
Sql_attr the configuration item that begins with this, represents the attribute field, and the fields that appear in the Where,orderby,groupby define an attribute, defining different types of fields to use different property names, such as the above sql_attr_ Timestamp is the timestamp type.

Index mysql{} The name of the definition is MySQL, or it can be called other, for example: Index xxx{}
source, which is defined by sources XXX.
Path index file, such as: E:/coreseek/var/data/mysql is actually stored in the e:/coreseek/var/data/directory, and then create multiple names for the MySQL suffix but different index files
CHARSET_DICTPATH Specifies the location of the dictionary file to be read by the word breaker, which is required when the Word segmentation method is enabled. When using libmmseg as the word breaker, you need to make sure that the dictionary file uni.lib in the specified directory
Charset_type character set, such as Charset_type = ZH_CN.GBK

searchd{} Sphinx Daemon Configuration
Listen Listening port
Max_matches the maximum number of matches, that is, to find more data to return only the 1000 set here
Pid_file pid File path
Log Full-Text search
Query_log query Log

Well, the configuration file is like this, there are many parameters to configure, you can check the documentation yourself.

3rd: (Build index)
Start-to-run input cmd Enter, open command-line tool
E:\coreseek\bin\indexer--config e:\coreseek\etc\mysql.conf--all
This string of things is actually called the indexer program to generate all the indexes

If you only want to index a data source, you can do this: e:\coreseek\bin\indexer--config e:\coreseek\etc\mysql.conf index name (the index name is defined in the configuration file)
--config,--All these are the parameters of the indexer program, and friends who want to know more about the parameters can view the document
After running the command if you do not see fatal,error these things, then the index file even if the build succeeded, for example, I see is
......... Omitted.........
Using config file ' e:\coreseek\etc\mysql.conf ' ...
Indexing index ' MySQL ' ...
Collected 4 docs, 0.0 MB
......... Omitted.........

4th: (Start Sphinx)
The same command line
E:\coreseek\bin\searchd--config e:\coreseek\etc\mysql.conf
After running, I was prompted with a lot of things
Using config file ' e:\coreseek\etc\mysql.conf ' ...
Listening on all interfaces, port=9312
Accepting connections
Don't worry about these birds what is the meaning, anyway Sphinx is started well.
Now there is a string of bird text This command line is not closed, because off the Sphinx also closed, if feel so uncomfortable, you can install Sphinx System Services, in the background to run.
Install system Services simply enter the following command on the command line
E:\coreseek\bin\searchd--config e:\coreseek\etc\mysql.conf--install
After installation Remember to start this service, will not start that I can not, myself Google.
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.