How to install and configure sphinx + MySQL + mmseg

Source: Internet
Author: User
What is SphinxSphinx is a full-text search engine released under GPLv2, commercial authorization (for example, embedded in other programs) needs to contact us (Sphinxsearch.com) for commercial authorization. Generally, Sphinx is an independent search engine designed to provide other applications with high-speed, low-space usage, and high-result-related full-text searches.

What is sphtracing?

Sphinx is a full-text search engine released under GPLv2 with commercial authorization (for example, embedded in other programs)
You need to contact us (Sphinxsearch.com) for commercial authorization.
Generally, Sphinx is an independent search engine designed to provide high-speed, low-space usage, and high results for other applications.
Relevance full-text search function. Sphinx can be easily integrated with SQL databases and scripting languages.
Currently, the system supports built-in MySQL and PostgreSQL database data sources and supports reading specific formats from standard input.
. By modifying the source code, you can add new data sources (for example, other DBMS types ).
Native support ).
The search API supports PHP, Python, Perl, Rudy, and Java, and can also be used as a MySQL storage engine. Search
The API is very simple and can be transplanted to a new language within several hours.
Sphenders are short for SQL Phrase Index, but unfortunately they are the same as CMU's sphenders.

Sphinx features

High-speed index creation (in Contemporary CPU, peak performance can reach 10 MB/s );

High-Performance search (in 2? 4 GB of text data, the average retrieval response time is less than 0.1 seconds );

Massive data processing (it is known that it can process more than 100 GB of text data, on a single CPU system
Process 100 M documents );

It provides excellent relevance algorithms and a compound Ranking method based on phrase similarity and statistics (BM25;

Supports distributed search;

Provides document exceprts generation;

It can be used as the storage engine of MySQL to provide search services;

Supports multiple search modes such as Boolean, phrase, and word similarity;

The document supports multiple full-text search fields (up to 32 );

This document supports multiple additional attributes (such as group information and timestamp );

Stop word query;

Supports single-byte encoding and UTF-8 encoding;

Native MySQL support (both MyISAM and InnoDB are supported );

Native PostgreSQL support.

1. install required files
Mmseg-0.7.3.tar.gz Chinese word segmentation

Mysql-5.1.26-rc.tar.gzMysql-5.1.26 source code

Sphinx-0.9.9.tar.gz sphexample-0.9.9-release source code

Fix-crash-in-excerpts.patch sphinx support word segmentation patch

Sphinx-0.98rc2.zhcn-support.patch sphinx support word segmentation patch

II. start installation
1. install libmmseg

Tar-zxvf mmseg-0.7.3.tar.gz
Cd mmseg-0.7.3
./Configure -- prefix =/usr/local/mmseg
Make
Make install
Cd ..

Mmseg installation is complete. test it.

Mmseg
Coreseek COS (tm) MM Segment 1.0
Copyright By Coreseek.com All Right Reserved.
Usage: mmseg  
-U Unigram Dictionary
-R Combine with-u, used a plain text build Unigram Dictionary, default Off
-B Synonyms Dictionary
-H print this help and exit

If you have any questions, run the following command:

Echo '/usr/local/mmseg/lib'>/etc/ld. so. conf
Ldconfig-v
Ln-s/usr/local/mmseg/bin/mmseg

2. recompile mysql


Two patches must be installed before installation.


Tar-zxvf sphinx-0.9.8-rc2.tar.gz

Cd sph0000- 0.9.8

Patch-p1 <../sphinx-0.98rc2.zhcn-support.patch

Patch-p1 <../fix-crash-in-excerpts.patch

I have installed mysql5.1.26 before. the installation steps are skipped here.
Mysql compilation path
/Root/lemp/mysql-5.1.26-rc/
Mysql installation path
/Opt/mysql

Close mysql before installation

/Opt/mysql/bin/mysql. server stop

Next, copy the data in the mysqlse folder under sphse to the mysql-5.1.26-rc/storage/sphinx.
(In this way, the SphinxSE storage engine can be compiled when mysql is compiled)

Cp-rf mysqlse/root/lemp/mysql-5.1.26-rc/storage/sphse

Cd/root/lemp/mysql-5.1.26-rc

Make clean

Sh BUILD/autorun. sh

# This step is required. do not omit it.

Start re-compilation

CFLAGS = "-O3" CXX = gcc CXXFLAGS = "-O3-felide-constructors-fno-exceptions-fno-rtti ". /configure -- prefix =/opt/mysql -- localstatedir =/opt/mysql/var -- sysconfdir =/opt/mysql -- without-debug -- with-unix-socket-path =/opt /mysql. sock -- with-big-tables -- with-charset = gbk -- with-collation = gbk_chinese_ci -- with-client-ldflags =-all-static -- with-mysqld-ldflags =-all- static -- enable-generator er -- with-extra-charsets = gbk, gb2312, utf8 -- with-pthread -- enable-thread-safe-client -- with-innodb -- with-plugins = sphins

Make

Make install

Configure: error: unknown plugin: sphwn

Solution:
Sudo yum install autoconf automake libtool

Sh BUILD/autorun. sh
./Configure-h
Check whether sphquota is included at the minimum.

=== Sphsf-storage Engine ===

Plugin Name: sphinx

Description: sphsf-storage Engines

Supports build: static and dynamic

Deployments: max, max-no-ndb

Then compile

Make error

../Libtool: line 466: CDPATH: command not found

../Libtool: line 1144: func_opt_split: command not found

Libtool: Version mismatch error. This is libtool 2.2.6, butthe

Libtool: definition of this LT_INIT comes from an olderrelease.

Libtool: You shoshould recreate aclocal. m4 with macros from libtool2.2.6

Libtool: and run autoconf again.

Make [1]: *** [conf_to_src] error 63

Make [1]: Leaving directory '/home/andychu/lemp2/mysql-5.1.26-rc/strings'

Make: *** [all-recursive] Error 1

If the libtool version is different, an error occurs. you can overwrite the installed libtool to the compiling directory.
Cp/usr/local/bin/libtool.
Re-compile

In fedora, libtinfo. so.5 is missing.

Cd client

Vim Makefile

Find LIBS and add/lib/libtinfo. so.5

After compilation, start mysql and check whether the SphinxSE storage engine is compiled.

/Opt/mysql/bin/mysql. server start
/Opt/mysql/bin/mysql-uroot-p
Mysql> show engines;
+ ------------ + --------- + --------------------------------------------------------- + -------------- + ------ + ------------ +
| Engine | Support | Comment | Transactions | XA | Savepoints |
+ ------------ + --------- + --------------------------------------------------------- + -------------- + ------ + ------------ +
| CSV | YES | CSV storage engine | NO |
| SPHINX | YES | sphsf-storage engine 0.9.9 | NO |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables | NO |
| MRG_MYISAM | YES | Collection of identical MyISAM tables | NO |
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance | NO |
+ ------------ + --------- + --------------------------------------------------------- + -------------- + ------ + ------------ +
5 rows in set (0.00 sec)

Now we can see that the sphinxSE engine is included.

3. install sphinx

. /Configure -- prefix =/usr/local/sphinx -- with-mysql =/usr/local/mysql/-- with-mysql-primary des =/usr/local/mysql/include/mysql /-- with-mysql-libs =/usr/local/mysql/lib/mysql/-- with-mmseg-separated des =/usr/local/mmseg/include/mmseg/--- mmseg-libs =/usr/local/mmseg/lib -- with-mmseg

The header file cannot be found:

Tokenizer_zhcn.cpp: 1: 30: SegmenterManager. h: no file or directory

Tokenizer_zhcn.cpp: 2: 23: Segmenter. h: no file or directory

Make clean

./Configure -- prefix =/usr/local/sphure -- with-mysql =/opt/mysql \

-- With-mysql-connector des =/opt/mysql/include/mysql -- with-mysql-libs =/opt/mysql/lib/mysql \

-- With-mmseg-separated des =/usr/local/mmseg/include/mmseg -- with-mmseg-libs =/usr/local/mmseg/lib -- with-mmseg

/Root/sphinx/sphinx-0.9.8-rc2/src/tokenizer_zhcn.cpp: 34: undefined reference to 'libiconv _ close'

Collect2: ld returned 1 exit status

Solution on the official website:
In the meantime I 've change the configuration file and set
# Define USE_LIBICONV 0 in line 8179.
Modify the configure file to change the final value of # define USE_LIBICONV 0 from 1 to 0.
Recompile.

Make clean

./Configure -- prefix =/usr/local/sphure -- with-mysql =/opt/mysql \

-- With-mysql-connector des =/opt/mysql/include/mysql -- with-mysql-libs =/opt/mysql/lib/mysql \

-- With-mmseg-separated des =/usr/local/mmseg/include/mmseg -- with-mmseg-libs =/usr/local/mmseg/lib -- with-mmseg

Vi configure
Enter/define USE_LIBICONV to find the target row.
Press the I key and change 1 to 0. press esc and enter: wq to save and exit.

Copy a sphinx configuration

Cd/usr/local/sphinx/etc

Cp sphinx. conf. dist sphinx. conf

4. configure sphinx

Modify/usr/local/sphinx/etc/sphinx. conf

Type = mysql
# Some straightforward parameters for SQL source types
SQL _host = localhost
SQL _user = root
SQL _pass =
SQL _db = test
SQL _port = 3306 # optional, default is 3306
Address = 127.0.0.1 # the security point can only listen to the local machine


5. index creation

After installing sphvar, there are three directories in the sphinx Directory: bin etc var
The bin contains some execution files used by sphenders, including the searchd query server, which is the search query tool used by indexer indexing.

For the convenience of the following tests, we will first import the example. SQL script that comes with sphexample. conf into mysql.
// Creates a test database and creates the documents test table and data.

/Opt/mysql/bin/mysql-uroot-p </usr/local/sphinx/etc/example. SQL

The indexing method is

/Usr/local/sphinx/bin/indexer -- config/usr/local/sphinx/etc/sphinx. conf test1

Test1 indicates the resource name. If this parameter is left blank, all indexes are created by default.
Appendix:
During index creation, indexer may not be able to find the shared library libmysqlclient. so.16 due to different database versions.
Copy the/opt/mysql/lib/mysql/libmysqlclient. so.16.0.0 file to/usr/lib or use a soft connection.

6. query the server

/Usr/local/sphinx/bin/searchd -- config/usr/local/sphinx/etc/sphinx. conf # enabled
/Usr/local/sphinx/bin/searchd -- config/usr/local/sphinx/etc/sphinx. conf -- stop # disabled

Sphsf-query can be roughly divided into three types:

1. query in the database engine

2. query using the search tool
/Usr/local/sphinx/bin/search -- config/usr/local/sphinx/etc/sphinx. conf test

3. for details about how to query through the php interface, see sphinxapi. php.

3. use SphinxSE to call Sphinx in mysql

1. use SphinxSE to call Sphinx in mysql
First, create an index dedicated table:

CREATETABLE 'sphsecret '(
'Id' int (11) NOTNULL,
'Weight' int (11) NOTNULL,
'Query' varchar (255) NOTNULL,
'Catalogid' INTNOTNULL,
'Edituserid' INTNOTNULL,
'Hits 'intnull,
'Addtime' INTNOTNULL, KEY
'Query' ('query ')
) ENGINE = SPHINXDEFAULTCHARSET = utf8CONNECTION = 'hsf-: // localhost: 3312/test1'

Test1: index name, which can be found in sphexample. conf.
After creating an index dedicated table, we can use it in mysql. for example, enter

SELECT doc. * FROM documents doc JOIN sphsf-on (doc. id = sphsf-. id) WHERE query = 'Doc; mode = any'

After running, the record line containing the doc string will be displayed in the result record

For more information about the query syntax and sphinx configurations, see:
Http://www.sphinxsearch.com/doc.html

2. chinese word segmentation application
Generate Dictionary
After entering the source code directory of mmseg

Cd data

Mmseg-u unigram.txt

A file unigram.txt. uni will be created under data
This is the generated dictionary and rename it uni. lib to a readable directory.

Cp unigram.txt. uni/usr/local/sphinx/uni. lib

Modify the configuration file sphinx. conf (/usr/local/sphinx/etc/sphinx. conf)
Add to index

Charset_type = zh_cn.utf-8

Charset_dictpath =/usr/local/sphinx/

Add a piece of Chinese data to the database

Insert into 'test'. 'documents '(
'Id ',
'Group _ id ',
'Group _ id2 ',
'Date _ added ',
'Title ',
'Content'
) VALUES (NULL, '3', '9', NOW (), 'hashes Chinese search', 'hashes is an SQL-based full-text search engine that can be combined with MySQL, postgreSQL performs full-text search. It provides more professional search functions than the database itself, making it easier for applications to implement professional full-text search. Sphinx specially designs Search API interfaces for some scripting languages, such as PHP, Python, Perl, and Ruby. It also designs a Storage Engine plug-in for MySQL. ');

(If searchd has been run again, kill it before running)
Note: after adding data, you need to reload the index so that new data can be cached.
Re-create the index. after successful re-indexing, enable index listening.

/Usr/local/sphinx/bin/indexer -- config/usr/local/sphinx/etc/sphinx. conf -- all

/Usr/local/sphinx/bin/searchd -- config/usr/local/sphinx/etc/sphinx. conf

Indexing speed on ide hard disks

Indexing index 'test1 '...

Collected 423228 docs, 637.2 MB

Sorted 125.5 Mhits, 100.0% done

Total 423228 docs, 637201412 bytes

Total 753.401 sec, 845766.13 bytes/sec, 561.76 docs/sec

In this way, you canPhpmyadmin.

SELECT doc. * FROM documents doc JOIN sph0000on (doc. id = sph0000. id)

WHERE query = 'design; mode = any'

It doesn't seem to come out ....

Modify sphsf-. conf
Remove the following comments

SQL _query_pre = SET NAMES utf8

Restart searchd to search the result.

Refer:
Http://www.coreseek.com/uploads/pdf/sphinx_doc_zhcn_0.9.pdf
Http://www.sphinxsearch.com/wiki/doku.php? Id = sphinx_chinese_tutorial
Http://www.cnblogs.com/hushixiu/articles/1295605.html
Http://blog.xoyo.com/dcyhldcyhl/article/839863.shtml
Http://blog.sina.com.cn/s/blog_5aefd9770100axf1.html
Http://blog.s135.com/post/360/

Updated on
Encoding Solution (not tested)
-------------------------------------------------------------------------------
Convert existing table data
Iconv-f GB18030-t UTF-8-o dump. SQL dump_utf8. SQL

You can also directly use the existing GBK data without conversion, but you need to set the connection mode.
Mysql_query ("SET character_set_client = 'gbk'", $ conn );
Mysql_query ("SET character_set_connection = 'gbk'", $ conn); // SET character_set_connection and collation_connection
// Mysql_query ("SET collation_connection = 'gbk'", $ conn );
Mysql_query ("SET character_set_results = 'utf8'", $ conn );
After these three settings, the query result is the result of UTF8 encoding. Suitable for SPHINX.

Mysql_query ("set session query_cache_type = OFF", $ conn );
// Query when indexer creates an index, which does not need to be cached
-------------------------------------------------------------------------------

Updated on
The coreseek website seems to have a problem and cannot be downloaded.
Download some installation files, totaling 3.6 MB, including the following files
Build_delta_index.sh
Build_main_index.sh
Fix-crash-in-excerpts.patch
Mmseg-0.7.3.tar.gz
Sphinx-0.9.8-rc2.tar.gz
Sphinx-0.98rc2.zhcn-support.patch
Sphexample. conf
Sphinxapi. php
Test. php
Test2.php

> Click here to download the sphinx installation file.

-------------------------------------------------------------------------------

Updated on
Use service to control sphinx

First add a sphinx user and belong to the website Group (already exists), and modify the owner of the sphsite Directory

Useradd-d/usr/local/sph?- g website-s/sbin/nologinsph=

Chown-R sphsite: website/usr/local/sphinx

Create/etc/init. d/sphinx script

#! /Bin/sh

# Sphenders: Startup script for sphsf-search

#

# Chkconfig: 345 86 14

# Description: This is a daemon for high performance full text \

# Search of MySQL and PostgreSQL databases .\

# See http://www.sphinxsearch.com/for more info.

#

# Processname: searchd

# Pidfile: $ sphinxlocation/var/log/searchd. pid

# Source function library.

./Etc/rc. d/init. d/functions

Processname = searchd

Servicename = sphinx

Username = sphinx

Sphinxlocation =/usr/local/sphinx

Pidfile = $ sphinxlocation/var/log/searchd. pid

Searchd = $ sphinxlocation/bin/searchd

RETVAL = 0

PATH = $ PATH: $ sphinxlocation/bin

Start (){

Echo-n $ "Starting sphsf-daemon :"

Daemon -- user = $ username -- check $ servicename $ processname

RETVAL =$?

Echo

[$ RETVAL-eq 0] & touch/var/lock/subsys/$ servicename

}

Stop (){

Echo-n $ "Stopping Sphinx daemon :"

$ Searchd -- stop

# Killproc-p $ pidfile $ servicename-TERM

RETVAL =$?

Echo

If [$ RETVAL-eq 0]; then

Rm-f/var/lock/subsys/$ servicename

Rm-f $ pidfile

Fi

}

# See how we were called.

Case "$1" in

Start)

Start

;;

Stop)

Stop

;;

Status)

Status $ processname

RETVAL =$?

;;

Restart)

Stop

Sleep 3

Start

;;

Condrestart)

If [-f/var/lock/subsys/$ servicename]; then

Stop

Sleep 3

Start

Fi

;;

*)

Echo $ "Usage: $0 {start | stop | status | restart | condrestart }"

;;

Esac

Exit $ RETVAL

Modify permissions and add them to the service. The machine is automatically started when it is started.

Chmod 755/etc/init. d/sphinx

Chkconfig -- add sphinx

Chkconfig -- level 345 sph0000on

Chkconfig -- list | grep sphinx # Check

Service sphinx start # run

Service sphsf-stop # stop. the official script has some problems with my as4, so I changed it rudely.

Service sphinx restart # restart

Service sph1_status # check whether it is running

Check that sphsf-user has been used for running

Ps aux | grep searchd

Sph000024612 0.0 0.3 11376 6256 pts/1 S searchd

> Click here to download the sphinx startup script.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.