Use and testing of "Sphinx" Mysql+sphinx full-Text Search

Source: Internet
Author: User
Tags mysql client python script

Introduction: Recently online due to some indescribable reasons need to filter some words in the database, so the more lines on the line is the Select C1,c2 fron tb1 where C1 like '%name% ' similar to this look of fuzzy query. began to think of the full-text index for Word segmentation, but also think that these are online databases, and the data in the table is tens of millions of rows of data, online DDL or use of third-party tools, for during the modification will block the line DML operations, And for the full-text index maintenance also need to consume a certain amount of resources, so temporarily dispel this idea. In the discussion and development of the actual business, and finally intends to choose a third-party full-text Search tool, where the full-text Search tool Sphinx.

Installation and testing of Sphinx:

    1. Installation of Sphinx

Yum Install Sphinx-y


2. Configuration files

#
# Minimal Sphinx Configuration sample (clean, simple, functional)
#

SOURCE Sbtest_sbtest1
{
type= MySQL

sql_host= localhost
sql_user= Root
sql_pass= Redhat
sql_db= Sbtest
sql_port= 3306# Optional, default is 3306

sql_query= \
SELECT userid as ID, \
K, \
C, \
Pad \
From Sbtest1 \
where C like '%03679578678% ' \

Sql_attr_uint = k #数据类型int或者timestamp
Sql_field_string = C #数据类型text或者string
sql_attr_string = Pad #数据类型string
}

Index Sbtest1
{
Source= Sbtest_sbtest1
Path=/var/lib/sphinx/sbtest1 #索引文件的路径
}

Index TESTRT
{
type= RT
rt_mem_limit= 128M

Path=/VAR/LIB/SPHINX/TESTRT

rt_field= Title
rt_field= Content
rt_attr_uint= GID
}

Indexer
{
mem_limit= 128M
}

Searchd
{
listen= 9312
listen= 9306:mysql41 #对外提供服务的端口, similar to the 3306 port in MySQL
log=/var/log/sphinx/searchd.log
query_log=/var/log/sphinx/query.log
Read_timeout= 5
max_children= 30
Pid_file=/var/run/sphinx/searchd.pid
Seamless_rotate= 1
Preopen_indexes= 1
Unlink_old= 1
workers= Threads # for RT
Binlog_path=/var/lib/sphinx/
}

Note: In the configuration file, it is important to note that:

A. The Sql_query parameter is followed by an SQL statement, but the first field of the SQL statement must be an ID field, which does not need to be defined when the parameter type is defined, otherwise sphinx cannot recognize the attribute.

B. You must include a field of type sql_filed_string when defining the attribute, otherwise sphinx think you do not have to use full-text search for this feature.

3. Create an index

Indexer Sbtest1

4. Start the SEARCHD service

Service Searchd Start

5. View Searchd Service Status

[Email protected] data]# Searchd--status
Sphinx 2.3.2-id64-beta (4409612)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)

Using config file '/etc/sphinx/sphinx.conf ' ...

SEARCHD status
--------------
uptime:80403
Connections:11
maxed_out:0
Command_search:7
command_excerpt:0
command_update:0
command_delete:0
command_keywords:0
command_persist:0
Command_status:1
command_flushattrs:0
agent_connect:0
agent_retry:0
Queries:7
dist_queries:0
query_wall:0.010
Query_cpu:off
dist_wall:0.000
dist_local:0.000
dist_wait:0.000
Query_reads:off
Query_readkb:off
Query_readtime:off
avg_query_wall:0.001
Avg_query_cpu:off
avg_dist_wall:0.000
avg_dist_local:0.000
avg_dist_wait:0.000
Avg_query_reads:off
Avg_query_readkb:off
Avg_query_readtime:off
qcache_max_bytes:16777216
qcache_thresh_msec:3000
Qcache_ttl_sec:60
qcache_cached_queries:0
qcache_used_bytes:0
qcache_hits:0

6. Test using: MySQL Client

[[email protected] data]# mysql-uroot-predhat-h127.0.0.1-p9306-e "Select Id,k,c,pad from Sbtest1 where match (' 0367957 8678 ') "
MySQL: [Warning] Using a password on the command line interface can is insecure.
+------+----------+-------------------------------------------------------------------------------------------- -----------------------------+-------------------------------------------------------------+
| id   | k        | c                                                                                                                              | pad                                                            |
+------+----------+-------------------------------------------------------------------------------------------- -----------------------------+-------------------------------------------------------------+
| 1 | 15324329 |  64733237507-56788752464-03679578678-53343296505-31167207241-10603050901-64148678956-82738243332-73912678420-24566188603 | 78326593386-76411244360-77646817949-14319822046-41963083322 |
+------+----------+-------------------------------------------------------------------------------------------- -----------------------------+-------------------------------------------------------------+

7.python test Use

#!/usr/bin/env python
#coding: Utf-8

Import Pymysql

con = pymysql.connect (host= ' 127.0.0.1 ', port=9306, user= "", passwd= "", db= "")
With Con.cursor (pymysql.cursors.DictCursor) as cur:
Cur.execute ("select * from Sbtest1 where match ('%03679578678% ')")
Print (Cur.fetchall ())

[email protected] ~]# python sphinx.py
[{u ' C ': ' 64733237507-56788752464-03679578678-53343296505-31167207241-10603050901-64148678956-82738243332-73912678420-24566188603 ' , U ' k ': 15324329, U ' pad ': ' 78326593386-76411244360-77646817949-14319822046-41963083322 ', U ' ID ': 1}]

8. Table structure:

 [email protected] 18:06:  [ Sbtest]> show create table sbtest1 \g 
*************************** 1.  row ***************************
       table: sbtest1
create table: create table  ' Sbtest1 '   (
   ' id '  int (one)  not  null auto_increment,
   ' K '  int (one)  NOT NULL DEFAULT  ' 0 ',
   ' C '  char (+)  NOT NULL DEFAULT  ',
   ' pad '  char (60)  NOT NULL DEFAULT  ',
  PRIMARY KEY  (' id '),
  key   ' K_1 '   (' K ')
)  engine=innodb auto_increment=25000000 default charset=utf8mb4

The data for table Sbtest1 in the test environment has 25 million rows, and the test SQL is select * from Sbtest1 where C like '%03679578678% '. When MySQL executes this SQL, it takes nearly 4 minutes to use the index only for full table scans, but it takes more than 70 seconds to create the index sphinx. After the index has been created successfully, the basic execution of the Python script or the cost of using the MySQL client is basically at the millisecond level. Performance has improved a lot, and there is no link between Sphinx and MySQL after the index has been created successfully. Does not affect MySQL operations, Sphinx can be deployed independently on a single server.

9. Updating index files: Incrementally updating index files

Indexer--rotate test1

Sphinx Mysql+sphinx use and testing of full-text search

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.