Sphinx-for-chines Chinese Word segmentation installation configuration and API call

Source: Internet
Author: User
Tags mysql host create database
these days the project needs to do a new full-text search function on goods, so thought of using Sphinx, because the need for Chinese participle, so chose Sphinx for Chinese, of course, you can also choose Coreseek, suggest that these two choose one, Do not choose the original Sphinx (support for Chinese is not very good). Also because the server used MySQL at that time compiled and did not compile Sphinx extension, and recompile MySQL and join Sphinx temporarily and unable to implement (the project used multiple servers, without affecting the existing business The case is not possible to recompile MySQL, so the use of the program through the API to call the Sphinx.sphinx from the outside of the API has Php,python,ruby,java and so many versions, so basically enough, I use the programming language is PHP, so the following article using the example of PHP version of the API.

I. Installation and configuration of Sphinx and preparation of test data
1. Pre-Installation preparatory work
(1) Please confirm the installation of MYSQL,GCC and common development environment package
(2) Download sphinx-for-chinese-1.10.1 (Sphinx-for-chinese official download) and Chinese word segmentation dictionary xdict_1.1

CD/USR/LOCAL/SRC
wget-c http://sphinx-for-chinese.googlecode.com/files/ sphinx-for-chinese-1.10.1-dev-r2287.tar.gz 
wget-c http://sphinx-for-chinese.googlecode.com/files/xdict_1.1. tar.gz

2. Install sphinx-for-chinese-1.10.1

CD/USR/LOCAL/SRC
tar zxvf sphinx-for-chinese-1.10.1-dev-r2287.tar.gz 
CD sphinx-for-chinese-1.10.1-dev-r2287 
#MySQL安装在默认位置的使用如下命令 
./configure--prefix=/usr/local/ sphinx-for-chinese-1.10.1--with-mysql

If the error shown in the following illustration indicates that MySQL is not installed in the default location, follow the command below

#如果MySQL不是安装在默认位置 (especially if you compile MySQL, please note), please specify the relevant location of MySQL, mainly MySQL include and Lib directory (Sphinx compile time to use the inside of the. h header file),-- With-mysql-includes and--with-mysql-libs To specify these two locations.

./configure--prefix=/usr/local/sphinx-for-chinese-1.10.1--with-mysql-includes=/usr/local/webserver/mysql/ Include/mysql--with-mysql-libs=/usr/local/webserver/mysql/lib/mysql

When you display what is shown in the following illustration, you can then perform the following made and make install commands

Make make
install
#最后执行命令 
LS/USR/LOCAL/SPHINX-FOR-CHINESE-1.10.1/

If the BIN,ETC and VAR three directories show that the installation was successful

3. Let Sphinx support Chinese participle

CD/USR/LOCAL/SRC
tar zxvf xdict_1.1.tar.gz 
/usr/local/sphinx-for-chinese-1.10.1/bin/mkdict xdict_1.1.txt xdict_1.1

If you are prompted for errors such as bin/mkdict:error while loading shared libraries:libmysqlclient.so.16 ( The location of the libmysqlclient.so.16.0.0 in the following command depends on the location of our MySQL compiled installation.

Ln-s/usr/local/webserver/mysql/lib/mysql/libmysqlclient.so.16.0.0/usr/lib/libmysqlclient.so.16

Executed again

/usr/local/sphinx-for-chinese-1.10.1/bin/mkdict xdict_1.1.txt xdict_1.1 
#提示Chinese Dictionary was successfully created! Chinese Word segmentation dictionary to generate successful 
CP xdict_1.1/usr/local/sphinx-for-chinese-1.10.1/etc/xdict_1.1

4. Configure Sphinx

Vi/usr/local/sphinx-for-chinese-1.10.1/etc/sphinx.conf

Enter the following content

# Sphinx Basic Configuration # Index source sources source GOODS_SRC {# database type = mysql # mysql host ip sql_host = localhost # MySQL user name sql_user = sphinxuser # mysql Password sql_pass = sphinxpass # mysql Database sql_db = Sphinx # MySQL Port (if firewall is limited, please open) sql_port= 3306 # MySQL sock file settings (default to/tmp/mysql.sock, if not the same, please specify) Sql_sock =/tmp/mysql.s Ock # mysql retrieval encoding (database is not utf8 likely to be retrieved) Sql_query_pre = SET NAMES UTF8 # SQL statement to get data Sql_query = SELECT goods _id,goods_id as goods_id_new,goods_name,goods_color,goods_name as goods_name_search,goods_color as Goods_color_ Search from Goods_test # The following are properties for filtering or conditional queries (the following fields are displayed in the query results, and the fields that are not listed below are the fields to search for when searching, such as the goods_color_search,goods_name_ in SQL statements Search) # unsigned integer #goods_id为主键, if added here in the generation of the index will be reported ' goods_id ' not found, here with goods_id_new to adapt Sql_attr_uin t = goods_id_new # String Type sql_attr_string = Goods_name sql_attr_string = goods_color # for the Command Interface End (CLI) Call of the test Try (generally not required) #sql_query_info = SELECT* from goods_test Where goods_id = $goods _id; # Indexed index Goods {# index Source declaration origin = goods_src # index file location path =/usr/local/sphinx-for-chinese-1. 
    10.1/var/data/goods # File storage mode (default is extern) DocInfo = extern # cache data Memory Lock Mlock = 0 # Markov morphology (invalid for Chinese) 
    morphology = none # index word Minimum Length Min_word_len = 1 # Data encoding (set to UTF8 to index Chinese) charset_type = utf-8 # Chinese Word segmentation Dictionary 
    chinese_dictionary =/usr/local/sphinx-for-chinese-1.10.1/etc/xdict_1.1 # minimum index prefix length min_prefix_len = 0 
    # Minimum index infix length Min_infix_len = 1 # for non-alphabetic data length cut (for CJK indexing) Ngram_len = 1 # To remove HTML tags for user input query content Html_strip = 0} # Indexer set indexer {# Memory size limit defaults to 32M, Max 2047M, recommended for 256M to 1024M mem_limit = 256M} # Sphinx Service Process Search related configuration searchd {# Monitoring port and form, a few can, default to native 9312 Port # listen = 127.0.0.1 # listen = 192.168.0.1: 9312 # listen = 9312 # listen =/var/run/searchd.sock # search Process log Path logs =/USR/local/sphinx-for-chinese-1.10.1/var/log/searchd.log # query Log address Query_log =/USR/LOCAL/SPHINX-FOR-CHINESE-1.10.1/ 
    Var/log/query.log # Read Timeout read_timeout = 5 # Request Timeout Time client_timeout = # searche Maximum number of processes running Max_children = 30 # Process id File Pid_file =/usr/local/sphinx-for-chinese-1.10.1/var/log/searchd.pid # maximum query result return Number max_matches = 1000 # whether to support seamless switching (required for incremental indexing) Seamless_rotate = 1 # Whether to load all index files in advance when the runtime is started Preopen_indexe 
    s = 0 # Whether to release the old index file Unlink_old = 1 # MVA and new pool size (default is 1M) Mva_updates_pool = 1M # maximum allowable network packet size (default 8M) max_packet_size = 8M # maximum allowable number of filters per query (default 256) Max_filters = 256 #每个过滤器最大允许的值的个数 (default 4096) Max_filter_val ues = 4096 # maximum number of queries per group (by default) max_batch_queries = # Sphinx configuration file End

Chinese in Linux may see garbled, do not pipe

5. Create test database and add test content (please connect to your MySQL database first)

Execute the following command in MySQL

Mysql> CREATE DATABASE Sphinx collate ' UTF8_GENERAL_CI ';
Mysql> grant all privileges in sphinx.* to ' sphinxuser ' @ ' identified by ' sphinxpass ';
Mysql> grant all privileges in sphinx.* to ' sphinxuser ' @ ' localhost ' identified by ' sphinxpass ';
Mysql> use Sphinx; mysql> CREATE TABLE IF not EXISTS ' goods_test ' (' goods_id ' int (TenUnsigned not NULL auto_increment COMMENT ' commodity id ', ' goods_name ' varchar (255Not NULL COMMENT ' commodity name ', ' goods_color ' varchar ( ANot NULL COMMENT ' product color ', PRIMARY KEY (' goods_id ') engine=myisam DEFAULT charset=utf8 comment= ' commodity table, sphinx example ' AUTO_INCR Ement= One; mysql> INSERT into ' goods_test ' (' goods_id ', ' goods_name ', ' Goods_color ') VALUES (1, ' hot-selling backpack ', ' black '), (2, ' hot-selling laptop backpack ', ' gray ', (3, ' Colorful fashion makeup bag ', ' black '), (4, ' Colorful fashion makeup bag ', ' Blue '), (5, ' Colorful fashion makeup bag ', ' Pink '), (6, ' Extreme sexy women's shirts ', ' Black '), (7, ' Personality manifesto Men's short-sleeved shirt ', ' Blue '), (8, ' Personality manifesto Men's short-sleeved shirt ', ' Red '), (9, ' Personality manifesto Men's short sleeve shirts ', ' green '), (Ten, ' Personality manifesto Men's short sleeve shirt ', ' black ');

All right, it's all set, you can start the combat Operation Sphinx.

Two. Actual Operation Sphinx
1. Indexing (if the configuration file has changed, you should regenerate the index file, if the SEARCHD process in the following 3rd is turned on, should be closed first)

#生成goods索引 [You need to make sure that the MySQL database for the host you want to connect to is running properly, and 3306 ports are accessible]

/usr/local/sphinx-for-chinese-1.10.1/bin/indexer-c/usr/local/sphinx-

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.