Install and configure the application in Sphinx. Sphinx install and configure the application. sphinx install and configure sphoff is a full-text search engine developed by Russian AndrewAksyonoff. It is intended to provide other applications with high-speed, space occupation, high-end Sphinx installation and configuration applications, and sphinx installation and configuration
Sphoff is a full-text search engine developed by Russian Andrew Aksyonoff. It is intended to provide other applications with full-text search functions featuring high speed, space occupation, and high result relevance. Sphinx can be easily integrated with SQL databases and scripting languages. Currently, the built-in MysqL and PostgreSQL database data sources support reading xml data in a specific format from standard input. By modifying the source code, you can add new data sources (for example, native support for other types of DBMS)
1. sphsf-chinese word segmentation
Chinese full-text search is based on semantics. Currently, most databases do not support Chinese full-text search, such as Mysql. If you need to search the full text of Chinese characters, you also need some plug-ins, such as coreseek and sfc.
- CoreseekIt is the most widely used sphinx full-text search. It provides a Chinese word segmentation package designed for Sphinx.LibMMSeg. It also provides binary distributions for multiple systems, including rpm, deb, and binary packages in windows.
- Sfc (sphsf-for-chinese)Is another Chinese word segmentation plug-in provided by happy brother. Its Chinese dictionary usesXdict. According to its introduction, after testing, the current version of the index speed (Linux testing platform) can basically reach half of the index UTF-8 English, that is, the official claim that the speed of half. (Time is mainly used for word splitting ).
2. Installation
There are two ways to apply Sphinx on mysql:
(1) use API calls, such as querying using API functions or methods such as PHP and java. The advantage is that mysql does not need to be re-compiled, the server process is "low coupling", and the program can be called flexibly and conveniently. The disadvantage is that some programs need to be modified under the conditions of existing search programs. Recommended for programmers.
(2) use the plug-in method (sphinxSE) to compile sphashes into a mysql plug-in and use specific SQL statements for retrieval. It is characterized by convenient combination on the SQL end and direct return of data to the client without secondary queries (note). in the program, you only need to modify the corresponding SQL, however, this is inconvenient for programs developed using frameworks, such as using ORM. In addition, you also need to recompile mysql, and the mysql-5.1 version and above support plug-in storage. This method can be used by the system administrator.
Use the first method for installation:
Software environment:
- Operating system: Centos-5.2
- Database: mysql-5.0.77-3.el5 mysql-devel (if you want to use the sphinxSE plug-in for storage, use mysql-5.1 or a later version)
- Compilation software: gcc-c ++ autoconf automake
- Sphinx: sph0000-0.9.9 (latest stable version)
Installation:
- [Root @ localhost ~] # Yum install-y mysql-devel
- [Root @ localhost ~] # Yum install-y automake autoconf
- [Root @ localhost ~] # Cd/usr/local/src/
- [Root @ localhost src] # wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
- [Root @ localhost src] # tar zxvf sphinx-0.9.9.tar.gz
- [Root @ localhost local] # cd sphinx-0.9.9
- [Root @ localhost sphinx-0.9.9] #./configure-prefix =/usr/local/sphfix # Note: sphfix supports mysql by default.
- [Root @ localhost sph0000- 0.9.9] # make & make install # The "warning" can be ignored.
After the installation is complete, check whether there are three directories, bin etc var, in/usr/local/sphinx. if so, the installation is correct!
Sfc installation
Coreseek installation
3. configuration
- [Root @ localhost ~] # Cd/usr/local/sphinx/etc # enter the sphinx configuration file directory
- [Root @ localhost etc] # cp sphinx. conf. dist sphinx. conf # Create a Sphinx configuration file
- [Root @ localhost etc] # vim sphinx. conf # Edit sphinx. conf
Specific instance configuration file:
##### Index Source ###########
Source article_src
{
Type = mysql ##### data source type
SQL _host = 192.168.1.10 ###### mysql host
SQL _user = root ####### mysql User name
SQL _pass = pwd ############ mysql password
SQL _db = test ######## mysql database name
SQL _port = 3306 ########## mysql Port
SQL _query_pre = SET NAMES UTF8 ### mysql retrieval encoding. pay special attention to this. many people cannot retrieve Chinese characters because the database encoding is GBK or other non-UTF8
SQL _query = SELECT id, title, cat_id, member_id, content, created FROM sphinx_article ###### SQL statement for obtaining data
##### Attributes used for filtering or conditional query ############
SQL _attr_uint = cat_id ######## unsigned integer attribute
SQL _attr_uint = member_id
SQL _attr_timestamp = created ############ UNIX timestamp attribute
SQL _query_info = select * from sphinx_article where id = $ id ######### test the command interface (CLI) call
}
### Index ###
Index article
{
Source = article_src #### declare the Index source
Path =/usr/local/sphinx/var/data/article ####### index file storage path and index file name
Docinfo = extern ##### document information storage method
Mlock = 0 ### cache data memory lock
Morphology = none #### morphology (invalid for Chinese)
Min_word_len = 1 #### minimum length of indexed words
Charset_type = UTF-8 ##### data encoding
##### Sequence table. note: if this method is used, sphsequence splits Chinese words,
##### Index words. to use Chinese word segmentation, you must use other word segmentation plug-ins such as coreseek and sfc.
Charset_table = U + ff10 .. U + FF19-> 0 .. 9, 0 .. 9, U + FF41 .. U + FF5A-> .. z, U + ff21 .. U + FF3A-> .. z
}
######### Indexer configuration #####
Indexer
{
Mem_limit = 256 M ###### memory limit
}
########### Sphinx service process ########
Searchd
{
# Listen = 9312 ### listener Port. in this version, the official port 9312 has been officially authorized in IANA, and the default port 3312 in earlier versions is
Log =/usr/local/sphinx/var/log/searchd. log #### service process logs. If an sphworker exception occurs, you can query the valid information here. you can find the answer to the rotate error.
Query_log =/usr/local/sphinx/var/log/query. log ### query logs on the client. note: to collect statistics on some keywords, you can analyze this log file.
Read_timeout = 5 # Request Timeout
Max_children = 30 ### maximum number of searchd processes that can be executed simultaneously
Pid_file =/usr/local/sphinx/var/log/searchd. pid ####### process ID File
Max_matches = 1000 ### maximum number of returned results
Seamless_rotate = 1 ### whether seamless switchover is supported. it is usually required for incremental indexing.
}
4. create an index
[Root @ localhost sphinx] # bin/indexer-c etc/sphinx. conf article ### create an index file
Sph00000.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
Using config file 'etc/sphexample. Conf '...
Indexing index 'article '...
Collected 1000 docs, 0.2 MB
Sorted 0.4 Mhits, 99.6% done
Total 1000 docs, 210559 bytes
Total 3.585 sec, 58723 bytes/sec, 278.89 docs/sec
Total 2 reads, 0.031 sec, 1428.8 kb/call avg, 15.6 msec/call avg
Total 11 writes, 0.032 sec, 671.6 kb/call avg, 2.9 msec/call avg
5. Applications
In the previous step, we have created an index. now we have tested the index we just created. There are two methods for testing: CLI side and API call
(1) The Command test on the CLI is to use the search command that comes with sphinx: search
[Root @ localhost sphinx] # bin/search-c etc/sphinx. conf Liu Li
Sph00000.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
Using config file 'etc/sphexample. Conf '...
Index 'mdmloginlog': query 'Lily': returned 6 matches of 6 total in 0.000 sec
Displaying matches:
1. document = 2, weight = 2
2. document = 3, weight = 2
3. document = 4, weight = 2
4. document = 5, weight = 2
5. document = 7, weight = 2
6. document = 8, weight = 2
Words:
1. 'Liu ': 6 documents, 6 hits
2. 'Lil': 6 users, 6 hits
(2) use PHP APIs for testing. before testing, start the sphinx service process and open port 9312 to the centos firewall.
[Root @ localhost sphinx] # bin/searchd-c etc/sphinx. conf & ### run sphinx in the background
[1] 5759
[Root @ localhost sph1_] # sph1_0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
Using config file 'etc/sphexample. Conf '...
Listening on all interfaces, port = 9312
[1] + Done bin/searchd-c etc/sphinx. conf
Reference http://www.sphinxsearch.org/sphinx-tutorial
Parse sphoff is a full-text search engine developed by Russian Andrew Aksyonoff. It is intended to provide high-speed, space occupation, and high-end services for other applications...