Dedecms sphinx Configuration

Source: Internet
Author: User
First, let's take a look at the full-text indexing knowledge of sphinx. Considering the actual needs of Sphinx full-text indexing, we will mainly introduce the Chinese support of Sphinx full-text indexing. Here, I would like to thank Li mounan for his contribution to the Chinese support for sphsf-full-text indexing!

Http://www.sphinxsearch.com/
Official documents: http://www.sphinxsearch.com/docs/
Chinese support: http://www.coreseek.cn/
English user manual download: http://www.coreseek.cn/uploads/pdf/sphinx_doc_zhcn_0.9.pdf
English online manual: http://www.coreseek.cn/docs/coreseek_3.2-sphinx_0.9.9.html

1. Install sphenders in Windows
1. Preparations before the start
Download coreseek 3.2.13 from http://www.coreseek.cn/products/ft_down/. here we take the Windows environment as an example:
Download and directly decompress coreseek-3.2.13-win32.zip, we suppose here to decompress to: D: \ coreseek-3.2.13-win32. Here we need a simple understanding of a few directories:

[D: \ coreseek-3.2.13-win32 \ API] API directory, including PHP, Python, Ruby and other operating examples, test_coreseek.php is a good example of Chinese search.

[D: \ ***** \ bin] application directory, which contains the following files
* Indexer: used to create a full-text index;
* Search: A simple CLI test program for full-text indexing;
* Searchd: a daemon. Other software can use this daemon for full-text retrieval;
* Sphinxapi: A series of searchd client API libraries for popular web scripting languages (PHP, Python, Perl, Ruby, Java ).
* Spelldump: A simple command line tool used to extract entries from dictionaries in the ispell or myspell (OpenOffice built-in binding) format. You can use these terms to customize indexes when using wordforms.
* Indextool: A Tool program used to dump debugging information about Multiple indexes. This tool is added from coreseek 3.1 (sph00000.9.9-RC2.
* Mmseg: tool program and Library. coreseek is used to provide Chinese Word Segmentation and dictionary processing.

[D: \ *** \ etc] sphinx configuration directory
[D: \ ***** \ var] sphinx variable & Index & log storage directory

1. 2. Create a configuration file
Because dedecms uses MySQL, We need to configure a sphinx template for MySQL. You can copy csft_mysql.conf and change it to csft_dedecmsv57.conf. For example, we will only perform full-text search here, we need to make the following Configuration:
First, create a statistical table in dedecms. You can run the following code in the dedecms background [system]> [SQL command line tool:
Create Table 'dede _ sphregion '(
'Countid' int (11) unsigned not null,
'Maxid' int (11) unsigned not null,
Primary Key ('countid ')
) Engine = MyISAM default charset = GBK
This is a sphinx. It is used to generate indexes in batches when the data volume is large.
After the data table is created, modify the configuration file of sphinx, that is, csft_dedecmsv57.conf. The content is as follows:
Bytes --------------------------------------------------------------------------------------------

# Source Definition
Source MySQL
{
Type = MySQL

# Basic configuration information of database servers
SQL _host = 192.168.0.103
SQL _user = dedev57
SQL _pass = dedecms
SQL _db = dedecmsv57gbk
SQL Port = 3306

# Set the encoding. Here we use GBK encoding. If it is UTF-8, you can set it as follows:
# SQL _query_pre = set names utf8
SQL _query_pre = set names GBK

# Incremental data retrieval
SQL _range_step = 1000

# Current latest document ID count
SQL _query_pre = replace into dede_sphinx select 1, max (ID) from dede_archives

# Search criteria
SQL _query = select arc. ID, arc. typeid, arc. typeid2, arc. sortrank, arc. flag, arc. channel, arc. ismake, arc. arcrank, arc. click, arc. title, arc. title, arc. color, arc. writer, arc. source, arc. litpic, arc. pubdate, arc. senddate, arc. mtype, arc. description, arc. badpost, arc. goodpost, arc. scores, arc. lastpost, arc. keywords, arc. mid, art. body from dede_archives as Arc left join dede_addonarticle as art on arc. id = art. aid where arc. id> = $ start and arc. ID <= $ end # the ID of the first column of SQL _query must be an integer.
# The title and body are full-text indexed as string/text fields

# Retrieving the current maximum search ID
SQL _query_range = select 1, maxaid from dede_sphinx where countid = 1


SQL _attr_uint = typeid # The value read from SQL must be an integer.
SQL _attr_uint = typeid2
SQL _attr_uint = Channel
SQL _attr_uint = click
SQL _attr_uint = badpost
SQL _attr_uint = goodpost
SQL _attr_uint = scores
SQL _attr_uint = mid
SQL _attr_timestamp = pubdate # The value read from SQL must be an integer as a time attribute.
SQL _attr_timestamp = senddate
SQL _attr_timestamp = lastpost

# Read original data from the database during command line query
SQL _query_info = select arc. *, art. Body from dede_archives as Arc left join dede_addonarticle as art on Arc. ID = art. Aid where arc. ID = $ ID
}

Source Delta
{
Type = MySQL

# Basic configuration information of database servers
SQL _host = 192.168.0.103
SQL _user = dedev57
SQL _pass = dedecms
SQL _db = dedecmsv57gbk
SQL Port = 3306
SQL _query_pre = set names GBK

# Incremental index, starting with the maximum ID
SQL _query = select arc. ID, arc. typeid, arc. typeid2, arc. sortrank, arc. flag, arc. channel, arc. ismake, arc. arcrank, arc. click, arc. title, arc. title, arc. color, arc. writer, arc. source, arc. litpic, arc. pubdate, arc. senddate, arc. mtype, arc. description, arc. badpost, arc. goodpost, arc. scores, arc. lastpost, arc. keywords, arc. mid, art. body from dede_archives as Arc left join dede_addonarticle as art on arc. id = art. aid where arc. id> (select maxaid from dede_sphinx where countid = 1)
# The value read from SQL must be an integer.

SQL _query_post = replace into dede_sphinx select 1, max (ID) from dede_archives

SQL _attr_uint = typeid
SQL _attr_uint = typeid2
SQL _attr_uint = Channel
SQL _attr_uint = click
SQL _attr_uint = badpost
SQL _attr_uint = goodpost
SQL _attr_uint = scores
SQL _attr_uint = mid
SQL _attr_timestamp = pubdate # The value read from SQL must be an integer as a time attribute.
SQL _attr_timestamp = senddate
SQL _attr_timestamp = lastpost

# Read original data from the database during command line query
SQL _query_info = select arc. *, art. Body from dede_archives as Arc left join dede_addonarticle as art on Arc. ID = art. Aid where arc. ID = $ ID
}

# Index Definition
Index MySQL
{
Source = MySQL # corresponding source name
Path = D: // coreseek-3.2.13-win32/var/data/MySQL
Docinfo = extern
Mlock = 0
Morphology = none
Min_word_len = 1
Html_strip = 0
# Charset_dictpath =/usr/local/mmseg3/etc/# BSD, set in Linux, and end with a/Symbol
Charset_dictpath = D:/coreseek-3.2.13-win32/etc/# settings in windows,/Symbol end
Charset_type = zh_cn.gbk
}

Index delta: MySQL
{
Min_word_len = 1
Source = delta
Path = D: // coreseek-3.2.13-win32/var/data/Delta. New
}

# Global index Definition
Indexer
{
Mem_limit = 128 m
}

# Searchd service definition
Searchd
{
Listen = 9312.
Read_timeout = 5
Max_children = 30
Max_matches = 1000
Seamless_rotate = 0
Preopen_indexes = 0
Unlink_old = 1
Pid_files = D:/coreseek-3.2.13-win32/var/log/searchd_mysql.pid
Log = D: // coreseek-3.2.13-win32/var/log/searchd_mysql.log
Query_log = D:/coreseek-3.2.13-win32/var/log/query_mysql.log
}

Bytes -------------------------------------------------------------------------------------------------------

1. 3. Create an index
After the configuration is complete, create an index, open [run] in the Start Menu, enter "cmd", and then open the command line. Enter the following code:

D: & cd d: \ coreseek-3.2.13-win32 \ bin

Switch to the bin directory of Sphinx first, and then execute:

Indexer.exe-c d: \ coreseek-3.2.13-win32 \ etc \ csft_dedecmsv57.conf MySQL -- rotate

At this time, sphpilot starts to build an index. If the data volume is large, this time may be relatively long and you need to wait patiently (1 ).

 

Create an incremental index and run the following command:
Indexer.exe-c d :\ coreseek-3.2.13-win32 \ etc \ csft_dedecmsv57.conf Delta -- rotate

. Test whether the search is normal
After the index is created, we can check whether the Matching content can be found. You can continue to enter the following command in cmd:

Search.exe-c d :\ coreseek-3.2.13-win32 \ etc \ csft_dedecmsv57.conf dededecms

If data can be normally returned (2), the index is successfully created.

 

2. Use Sphinx in conjunction with the dedecms Program
2. 1. Enable the Sphinx Service
In the preceding steps, we have successfully generated an index. To be able to use a client call, We need to enable the Sphinx service.
Run the following command in cmd:
Searchd.exe-c d :\ coreseek-3.2.13-win32 \ etc \ csft_dedecmsv57.conf
In this way, we have enabled the Sphinx Service (3). We can write a simple example for testing:

 

Sphinx_test.php
Bytes --------------------------------------------------------------------------------------------------------
<? PHP
Set_time_limit (0 );
Require_once (dirname (_ file _). "/include/common. Inc. php ");
$ Sphclient = new sphinxclient;

$ Mode = sph_match_any; // matching mode
$ Host = "localhost"; // service IP Address
$ Port = 9312; // service port

$ Sphinx-> setserver ($ host, $ port );
$ Sphray-> setarrayresult (true );
$ Sphmode-> setmatchmode ($ mode );

$ Res = $ sphtasks-> query ('zhimeng Content Management System ');

// Var_dump ($ sphump );
// Var_dump ($ res );
$ Total = count ($ res ['matches']);
For ($ I = 0; $ I <$ total; $ I ++)
{
Var_dump ($ res ['matches'] [$ I]);
}

Bytes -----------------------------------------------------------------------------------------------------
Run sphinx_test.php. You can see that the communication is normal and the content is returned.

 

2. Create a dedesphworkflow Service
We have enabled the service through searchd.exe, but it is not good that we cannot continue to access the CMD window. The solution is as follows:
Switch to the bin directory and run the following command:

Searchd.exe -- install-c d: \ coreseek-3.2.13-win32 \ etc \ csft_dedecmsv57.conf -- servicename dedesphinx
In this way, a dedesphinx service is successfully created in the system without opening the window (5 ).

 

3. update and maintenance
For full-text search indexes, We need to generate indexes from time to time. If the data volume is small, you can directly use the index Generation Command to recreate the index. If the data volume is large, we need to update the full-text index as defined.

If the content is updated frequently, the following commands need to be executed every minute (you can create a script and run it regularly using Windows scheduled tasks)

Generate an incremental index:

Indexer.exe-c d :\ coreseek-3.2.13-win32 \ etc \ csft_dedecmsv57.conf Delta -- rotate

Of course, you need to merge the incremental index into the primary index MySQL every day. You need to execute:

Indexer.exe-c d :\ coreseek-3.2.13-win32 \ etc \ csft_dedecmsv57.conf -- merge MySQL Delta -- rotate

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.