Use PHP + Sphinx to build an efficient intra-site search engine

Source: Internet
Author: User
Use PHP + Sphinx to create an efficient intra-site search engine. why is sphtracing used? assume that you are currently running a forum with over 100 Forum data. many users have reported that the search speed of the forum is very slow, then you can consider using Sphinx (of course, other full-text retrieval programs or methods can also be used ). 2. what is sphtracing? the sphtracing uses PHP + sphtracing to build an efficient intra-site search engine by Russian Andrew.
1. why use Sphinx

Assume that you are running a forum, and the Forum data has exceeded 100 million. many users have reported that the forum search speed is very slow, then you can consider using Sphinx (of course, other full-text retrieval programs or methods can also be used ).

2. what is Sphinx

Sphenders, a high-performance full-text search package developed by Russian Andrew Aksyonoff, is released under the two-license agreement of GPL and commercial agreement.
Full-text search is an information retrieval technology that uses all text information of a document as the retrieval object. The retrieved object may be the title of the article, the author of the article, or the abstract or content of the article.

3. Sphinx features

? High-speed indexing (nearly 10 MB/s on the new CPU );
? High-speed search (the average query speed of 2-4G text is less than 0.1 seconds );
? High Availability (up to 100 GB of text and MB of documents can be supported on a single CPU );
? Provides a good correlation ranking
? Supports distributed search;
? Generate document summaries;
? Search from the plug-in storage engine in MySQL
? Support searching Boolean, phrase, and synonym;
? Supports multiple full-text search domains for each document (up to 32 by default );
? Multiple attributes of each document are supported;
? Support word breaking;
? Supports single-byte encoding and UTF-8 encoding;

4. download and install Sphinx

Open the Web site http://www.coreseek.cn/news/7/52/ find your own operating system version, for example, I am Windows so I can download Coreseek Win32 general version, Linux can download the source package, compile and install your own. Here, I will explain why the downloaded program is Coreseek. Coreseek is a software developed based on Sphinx. it has made some changes to Sphinx and has better support than sphek in Chinese, so we use it.
After the download is complete, decompress the program to the place you want to decompress it. for example, if you want to decompress the program to the root directory of the E disk, modify the directory name Coreseek. the installation is complete, the installation directory is in E: \ coreseek \.

5. use Sphinx

I need to do the following to use Sphinx:
1) first, you must have data.
2) create a Sphinx configuration file
3) generate an index
4) Start Sphinx
5 bytes (call apior search.exe program for query)

1st pieces: (import data)
We need to use databases, tables, and data to establish a test. the space is limited. these are included in the attachment. you can download them and import them to MySQL.

2nd: (create a configuration file)
Next we need to create a Sphinx configuration file E: \ coreseek \ etc \ mysql. conf and change its content to the following:
Source mysql
{
Type = mysql
SQL _host = localhost
SQL _user = root
SQL _pass =
SQL _db = test
SQL Port = 3306
SQL _query_pre = SET NAMES utf8
SQL _query = SELECT id, addtime, title, content FROM post
SQL _attr_timestamp = addtime
}

Index mysql
{
Source = mysql
Path = E:/coreseek/var/data/mysql
Charset_dictpath = E:/coreseek/etc/
Charset_type = zh_cn.utf-8
}

Searchd
{
Listen = 9312.
Max_matches = 1000
Pid_file = E:/coreseek/var/log/searchd_mysql.pid
Log = E:/coreseek/var/log/searchd_mysql.log
Query_log = E:/coreseek/var/log/query_mysql.log
}

First, let's talk about the meaning of each item in this configuration file.
Source mysql {} defines the source name as mysql. It can also be called another one, for example, source xxx {}
Type data source type
SQL _ * data-related configurations, such as SQL _host and SQL _pass.
The SQL query command used to create an index does not use where or group by here. the content of where and groupby is handed over to sphy. the efficiency of condition filtering and groupby by sphinx is higher. note: the select field must include a unique primary key and a field for full-text retrieval.
SQL _query_pre: the SQL command executed before SQL _query is executed. there can be multiple
The configuration item starting with SQL _attr indicates the attribute field. the fields in where, orderby, and groupby must define an attribute respectively. different types of fields must use different attribute names, for example, the above SQL _attr_timestamp is of the timestamp type.

Index mysql {} defines the index name as mysql. It can also be called another one, for example, index xxx {}
Source is defined by source xxx.
Path: index file storage path, for example, E:/coreseek/var/data/mysql is actually stored in the E:/coreseek/var/data/directory, create multiple index files with the same name as the mysql suffix but different
Charset_dictpath indicates the location where the word segmentation method reads dictionary files. it is required when the word segmentation method is enabled. When LibMMSeg is used as the word segmentation Library, make sure that the dictionary file uni. lib is in the specified directory.
Charset_type character set, such as charset_type = zh_cn.gbk

Searchd {} sphinx daemon configuration
Listen listening port
Max_matches: the maximum number of matches, that is, if there is more data to be searched, only the 1000 matching records set here are returned.
Pid_file pid file path
Log full-text retrieval log
Query_log

Now, there are many parameters configured in the configuration file. you can check the document by yourself.

3rd pieces: (index generation)
Start-> run-> enter cmd and press enter to open the command line tool.
E: \ coreseek \ bin \ indexer -- config e: \ coreseek \ etc \ mysql. conf -- all
This string is actually calling the indexer program to generate all the indexes.

If you only want to index a data source, you can: e: \ coreseek \ bin \ indexer -- config e: \ coreseek \ etc \ mysql. conf index name (the index name is defined in the configuration file)
-- Config, -- all are parameters of the indexer program. For more information, see the documentation.
If you do not see FATAL or ERROR after running the command, the index file is generated successfully.
......... Omitted .........
Using config file 'E: \ coreseek \ etc \ mysql. Conf '...
Indexing index 'mysql '...
Collected 4 docs, 0.0 MB
......... Omitted .........

4th: (start Sphinx)
Under the same command line
E: \ coreseek \ bin \ searchd -- config e: \ coreseek \ etc \ mysql. conf
A lot of things are prompted after running
Using config file 'E: \ coreseek \ etc \ mysql. Conf '...
Listening on all interfaces, port = 9312
Accepting connections
No matter what these birds mean, sphenders are started.
Currently, the command line of a string of birds cannot be closed, because Sphinx is disabled. if you feel uncomfortable, you can install Sphinx into a system service and run it in the background.
To install system services, enter the following command in the command line:
E: \ coreseek \ bin \ searchd -- config e: \ coreseek \ etc \ mysql. conf -- install
After installation, remember to start this service. if it doesn't start, I can't, google it myself.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.