Lnmp+sphinx realizes big data second-check

Source: Internet
Author: User

Sphinx is a full-text search engine developed by the Russian people Andrew Aksyonoff. Intent to provide high-speed, low-footprint, high-result correlation full-text search capabilities for other applications. Sphinx can be very easy to integrate with SQL database and scripting languages. The current system includes support for MySQL and PostgreSQL database data sources, as well as reading XML data in a specific format from standard input.


The characteristics of Sphinx are as follows:

A) high-speed indexing (peak performance of up to ten MB/s on the current CPU);

b) High-performance search (on 2–4GB text data, the average response time for each retrieval is less than 0.1 seconds);

c) can process a large amount of data (currently known to handle more than a few gigabytes of text data, on a single CPU system can process the documents of the millions M);

D) provides an excellent correlation algorithm, based on the phrase similarity and statistics (BM25) of the composite ranking method;

e) Support distributed search;

f) Support phrase search

g) Provide document digest generation

h) To provide search services as a MySQL storage engine;

i) support multiple search modes such as Boolean, phrase and word similarity;

j) document supports multiple full-text search fields (max. 32);

k) Document supports multiple additional attribute information (e.g., grouping information, timestamp, etc.);

L) support word breaking;


Although MySQL MyISAM provides full-text indexing, but the performance is not flattering, in addition, the database is not very good at doing such things, we need to put these jobs to more appropriate procedures to do, reduce the pressure on the database. So using Sphinx to do MySQL's full-text Indexing tool is a good choice. This week mainly to learn the use of this tool, the following will be a general record of the learning process, to do a memo, but also hope to learn the tool of other friends to be inspired.


    1. Installing Sphinx

wget HTTP://SPHINXSEARCH.COM/FILES/SPHINX-2.2.11-RELEASE.TAR.GZTAR-XF sphinx-2.2.11-release.tar.gz && CD Sphinx-2.2.11-release./configure--prefix=/usr/local/spinx--with-mysqlmake && make Installln-s/usr/local/ Mysql/lib/libmysqlclient.so.18/usr/lib64/libsphinxclient installation (required by PHP module) CD api/libsphinxclient./configure–prefix=/ Usr/local/sphinxmake && make Install

2. Installing PHP Extensions

wget Http://pecl.php.net/get/sphinx-1.3.0.tgztar zxf sphinx-1.3.3.tgz && CD sphinx-1.3.3./configure-- With-php-config=/usr/local/php/bin/php-config--with-sphinx=/usr/local/sphinx/make && make install


3. Create a configuration file

Cp/usr/local/sphinx/etc/sphinx-min.conf.dist/usr/local/sphinx/etc/sphinx.conf
## minimal sphinx configuration sample  (clean, simple, functional) #source  src1{        type                     = mysql         sql_host                 = localhost        sql_user                 = root         sql_pass                 = www.123        sql_db                   = test         sql_port                 = 3306  # optional, default is 3306         sql_query                =                  select id, group_id, unix_timestamp (date_added)  AS date_added,  title, content                  FROM documents        sql_attr_uint            = group_id         sql_attr_timestamp      = date_added}index test1{         source                   = src1        path                     = /usr/local/ spinx/var/data/test1}indexer{        mem_limit                = 32M}searchd{         listen                   = 9312        listen                   = 9306: mysql41        log                      = /usr/local/spinx/var/log/searchd.log         query_log                = /usr/local/spinx/var/log/query.log         read_timeout            = 5         max_children             = 30        pid_file                 = /usr/local/spinx/var/log/searchd.pid         seamless_rotate          = 1        preopen_indexes          = 1        unlink_old               = 1        workers                  = threads  # for RT to work        binlog_path              = /usr/local/spinx/var/data}


4. Create an index and start

/usr/local/spinx/bin/indexer-c/usr/local/spinx/etc/sphinx.conf--all/usr/local/spinx/bin/searchd-c/usr/local/ Spinx/etc/sphinx.conf

5. Query validation

Cd/root/sphinx-2.2.11-release/apipython test.py Testdeprecated:do Not call this method or, even better, use SPHINXQL in Stead of an apiquery ' test ' retrieved 3 of 3 matches in 0.000 secquery stats: ' Test ' found 5 times in 3 DOCUMENTSM Atches:1. Doc_id=1, weight=2, group_id=1, date_added=2016-11-30 01:21:202. doc_id=2, weight=2, group_id=1, date_added=2016-11-30 01:21:203. Doc_id=4, Weight=1, group_id=2, date_added=2016-11-30 01:21:20


mysql> select * from documents;+----+----------+-----------+---------------------+----- ------------+---------------------------------------------------------------------------+| id |  group_id | group_id2 | date_added           | title           | content                                                                       |+----+----------+-----------+-------- -------------+-----------------+---------------------------------------------------------------------------+|   1 |         1 |         5  | 2016-11-30 01:21:20 | test one         | this is my test document number one. also checking  search within phrases. | |   2 |        1 |          6 | 2016-11-30 01:21:20 | test two         | this is my test document number two                                          | |   3 |        2 |  &nBsp;      7 | 2016-11-30 01:21:20 | another doc      | this is another group                                                        | |   4 |        2 |          8 | 2016-11-30 01:21:20 | doc number four |  this is to test groups                                                       |+----+----------+-----------+------- --------------+-----------------+---------------------------------------------------------------------------+


Reference URL: http://blog.csdn.net/wangjiuwang/article/details/52002172

Http://www.cnblogs.com/findgor/p/5644540.html

Http://www.sphinxsearch.org/sphinx-faq


This article from "Do not abandon!" Do not give up "blog, be sure to keep this source http://thedream.blog.51cto.com/6427769/1878194

Lnmp+sphinx realizes big data second-check

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.