"Finishing" the Chinese search engine coreseek4 installation of Linux, and PHP using Sphinx three ways (sphinxapi,sphinx PHP extension, sphinxse as the MySQL storage engine)

Source: Internet
Author: User
Tags automake chat line

One, software preparation
    1. coreseek4.1 (includes the latest version of Coreseek Beta and mmseg, and test packages "built-in Chinese word segmentation and search, Word segmentation, MySQL data source, Python data source, RT Real-time index and other test configurations")
    2. MySQL source bundle (must be selected in accordance with the version of MySQL you have installed)

To avoid missing dependencies in the installation, you need to make a chicken bleed:

Yum install make gcc g++ gcc-c++ libtool autoconf automake imake mysql-devel libxml2-devel expat-devel# or Apt-get install MA Ke gcc g++ automake libtool mysql-client libmysqlclient15-dev libxml2-dev libexpat1-dev# More system configuration Please click this transfer to view

Second, Coreseek quick installation
1, install Mmseg3
# #前提: Pre-installed operating system base Development library and MySQL dependent library to support MySQL data source and XML data source # #安装mmseg $ cd mmseg-3.2.14$./bootstrap    #输出的warning信息可以忽略, If an error occurs, you need to resolve the $./configure--prefix=/usr/local/mmseg3 $ make && make install$ CD.

"Note" If you compile Mmseg prompt cannot find input file:src/makefile.in fail, you can try the following workaround:

#下面命令提示没有安装, install Aclocallibtoolize--force after the run has an error, do not care about it. Automake--add-missingautoconfautoheadermake clean./configure--prefix=/usr/local/mmseg3

2, install Coreseek
# #安装coreseek $ cd csft-4.1$ sh buildconf.sh    #输出的warning信息可以忽略, if error occurs you need to resolve $./configure--prefix=/usr/local/ Coreseek  --without-unixodbc--with-mmseg--with-mmseg-includes=/usr/local/mmseg3/include/mmseg/-- with-mmseg-libs=/usr/local/mmseg3/lib/--with-mysql    # #如果提示mysql问题, you can view the MySQL data source installation Instructions # Do not want to trouble, you can directly use the following sentence $. Configure--prefix=/usr/local/coreseek  --without-unixodbc--with-mmseg--with-mmseg-includes=/usr/local/ mmseg3/include/mmseg/--with-mmseg-libs=/usr/local/mmseg3/lib/--with-mysql-includes= "MySQL installation directory"/include-- with-mysql-libs= "MySQL installation directory"/lib
$ make && make install$ CD.

"Note" If the Sphinx/coreseek 4.1 execution buildconf.sh Error, unable to generate configure file problems, click this delivery solution. It is worth mentioning that VI ' replaces all ' tricks: input in the last line mode: 1, $s/t val = Expreval (This->m_parg, Tmatch)/t val = This->expreval (This->m_parg, Tmatch)/g "Replace from first line to last row

3, test mmseg participle, coreseek search
# #测试mmseg分词, Coreseek search (requires a pre-set character set of ZH_CN. UTF-8, make sure Chinese is displayed correctly) $ cd testpack$ cat Var/test/test.xml    #此时应该正确显示中文 $/usr/local/mmseg3/bin/mmseg-d/usr/local/mmseg3 /etc var/test/test.xml$/usr/local/coreseek/bin/indexer-c etc/csft.conf--all$/usr/local/coreseek/bin/search-c etc/ csft.conf Web Search


1 if input/usr/local/coreseek/bin/indexer-c etc/csft.conf--all times/usr/local/coreseek/bin/indexer:error while loading Shared Libraries:libmysqlclient.so.18:cannot open Shared object file:no such file or directory error, can be edited by VI/ETC/LD.SO.C onf file to fix this error, add/usr/local/mysql/lib to the file to the tail, save the file, and then run the Ldconfig command.

  2, if the input/usr/local/coreseek/bin/indexer-c etc/csft.conf--all, xmlpipe2 support is not compiled in. To use Xmlpipe2, install missing error, apt-get install expat-* or yum  install expat-devel* re-edit installation Coreseek after installation.

Three, sphinx configure MySQL data source:
1, first, to figure out what the three main files in the bin directory of Coreseek are:
1./usr/local/coreseek/bin/indexer Indexer file for creating/updating/Merging indexes of data sources
/usr/local/coreseek/bin/indexer-c etc/csft_mysql.conf--all indexing for all data sources configured in the csft_mysql.conf configuration file
2./usr/local/coreseek/bin/search search file for testing data source searches/usr/local/coreseek/bin/search-c etc/csft_mysql.conf web search Test whether the data source configured in the csft_mysql.conf configuration file has a "network search" content
3./usr/local/coreseek/bin/searchd searchd file, which is responsible for accepting queries, processing queries, and returning datasets to the service/usr/local/coreseek/bin/searchd-c Etc/csft_ Mysql.conf the services that are responsible for accepting queries, processing queries, and returning datasets through Csft_mysql.conf's SEARCHD configuration
    2, and again, figure out which of the tables in MySQL we want to index in a database.
Here is a case as an example: an online chat project, using the database test_qq, with say as the storage chat content table, this project has a "Find chat record" function, this table content will basically reach a huge amount of, so in order to improve the speed of search, the table say Index ( Sphinx data source), table say has the following structure:
mysql> use test_qq;database changedmysql> desc say;+---------+------------------+------+-----+---------+----- -----------+| Field   | Type             | Null | Key | Default | Extra          |+---------+------------------+------+-----+---------+----------------+| ID      | int (ten) unsigned | NO   | PRI | NULL    | auto_increment | | fromid  | int (ten) unsigned | NO   |     | 0       | |                | toid    | int (ten) unsigned | NO   |     | 0       |                | | content | text             | NO   |     | NULL    |                | +---------+------------------+------+-----+---------+----------------+
To configure the MySQL data source, we copy the csft_mysql.conf file under Coreseek installation package/testpack/etc to the ETC directory in Coreseek:
Cp/coreseek Installation package Location/testpack/etc/csft_mysql.conf/usr/local/coreseek/etc/
Modify Csft_mysql.conf
View Code
After the modification, first turn on SEARCHD background service
/usr/local/coreseek/bin/searchd-c/usr/local/coreseek/etc/csft_mysql.conf re-executing the overall index
/usr/local/coreseek/bin/indexer-c/usr/local/coreseek/etc/csft_mysql.conf--all You can now use Search to test your searches
/usr/local/coreseek/bin/search-c/usr/local/coreseek/etc/csft_mysql.conf You get the search results, that is, the configuration is successful but the problem is, the indexer index is one-time, If the index is 10:00am, search can only find 10:00am previous content, if more people in the 10:00am after the message, the say table more than the chat line can not be indexed, search test can not be searched for more than the content, Let's consider the incremental index. Configure incremental indexes: 1. Create an index record in MySQL offset table sph_say_counter, when the index is executed once, the Sph_say_counter table updates the maximum ID of the secondary index, and the next time the index is executed, The index is executed from the record's maximum ID, and then the maximum ID of that index is logged to Sph_say_counter. So repeat. 

Sph_say_counter Table Structure:

Mysql> desc sph_say_counter;+------------+---------------------+------+-----+---------+----------------+| Field      | Type                | Null | Key | Default | Extra          |+------------+---------------------+------+-----+---------+----------------+| ID         | tinyint (3) unsigned | NO   | PRI | NULL    | auto_increment | | max_offset | INT (ten) unsigned    | NO   |     | 0       |                | +------------+---------------------+------+-----+---------+----------------+

  2. Increase the Incremental index configuration in csft_mysql.conf

#增加以下配置source delta:mysql #此写法为delta继承mysql源中的所有配置, overwrite the parent configuration {        Sql_query       =       SELECT Id,fromid If the configuration is identical in the child configuration. Toid,content from say where ID > (SELECT max_offset from sph_say_counter where id = 1) #从记录大于max_offset后的数据开始索引        sql _query_post  =       REPLACE into Sph_say_counter (id,max_offset) SELECT 1,max (ID) from say #索引后更新最大记录, Sql_query_ The pre is the statement executed before the query executes, Sql_query_post is the statement executed after executing the query}index delta:mysql{        source          =       Delta        Path            =       / Usr/local/coreseek/var/data/delta}

When you perform an incremental index/usr/local/coreseek/bin/indexer-c/usr/local/coreseek/etc/csft_mysql.conf Delta--rotate, Search results for more content can be searched through search

You can merge the index/usr/local/coreseek/bin/indexer-c/usr/local/coreseek/etc/csft_mysql.conf--merge MySQL Delta--rotate-- Merge-dst-range deleted 0 The index of merging MySQL source and Delta Source (--merge-dst-range deleted 0 0 is to prevent multiple keywords from pointing to the same document)


In order to update the Sph_say_counter table in the execution of the index, this sentence is also added to source MySQL:

Sql_query_pre           = REPLACE into Sph_say_counter (id,max_offset) SELECT 1,max (ID) from say

The/usr/local/coreseek/bin/indexer-c/usr/local/coreseek/etc/csft_mysql.conf--all--rotate can be executed at this time to re-establish all indexes and update Sph_say _counter table


To this csft_mysql.conf the full configuration is:

View Code

  3. Add an execution index to a Linux scheduled task, perform a scheduled update, merge, and overall index

For convenience, we will perform a global index, an incremental index, and a merge index written to 3 shell files, respectively.

Then, execute CRONTAB-E writes the following

*/1 * * * */bin/sh/usr/local/coreseek/sh/delta.sh >/dev/null 2>&1 # #每1分钟执行增量索引 */5 * * * */BIN/SH/USR/LOCAL/C oreseek/sh/merge.sh >/dev/null 2>&1 # #每5分钟合并索引30 1 * * *  /bin/sh/usr/local/coreseek/sh/all.sh >/dev/ Null 2>&1   # #每天的1:30 performing a global index
Specific execution times are defined by yourself, and you can click this transfer to see more details about the Linux scheduled tasks. For rotate, you can click here to see the details. After this step, the MySQL data source configuration is basically complete, detailed MySQL data source configuration, can point to this view, detailed analysis of the configuration file, you can click here to view.

Four, PHP uses sphinx/coreseek three ways:

Four-1, using sphinxapi.php

In the/coreseek installation package/TESTPACK/API Directory provides PHP interface file sphinxapi.php, this file contains a Sphinxclient class, you can copy into the project directory under the inclusion of the use.

 1 <?php 2 3/* 4 test_sph.php 5 sphinxclient class Test 6 */7 8 $key =trim ($_get[' key ']); 9 echo $key, include (' sphinxapi.php '), $sp =new sphinxclient (), $sp->setserver (' localhost ', 9312), $SP- >setarrayresult (true); $sp->setmatchmode (Sph_match_all); $sp->setsortmode (sph_sort_relevance); 17 $ res= $sp->query ($key, ' MySQL '), echo ' <pre> ', Print_r ($res), Echo ' </pre> ', if (Isset ($res [' 123 '), $mysql->query (' Set names UTF8 '), $mysql->query (' Use test_qq '), $sql = ' select * from say wher E ID in ('; $res [' matches '] as $v) {$sql. = $v [' id ']. ', ';}34 $sql =trim ($sql, ', ' ).‘)‘; The Echo $sql, $mysql->query ($sql) as $v), the "<pre>", and the Print_r ($ v), Echo ' </pre> ',}43}44 else45 {echo ' not recorded47} 

Browser input localhost/test_sph.php?key= Search keywords view search results

Four-2, installing PHP's Sphinx extension

In addition to directly containing the sphinxapi.php file, you can also install the PHP Sphinx Extension module to call Sphinxclient directly, and efficiency than the inclusion of API files, installation Sphinx need to rely on libsphinxclient package, So install first.

1, install Libsphinxevent
1 # Cd/coreseek Install package Directory/CSFT-4.1/API/LIBSPHINXCLIENT/2 #./configure  --prefix=/usr/local/sphinxclient 3  4 Configure:creating./config.status 5 config.status:creating Makefile 6 config.status:error:cannot Find input File:mak Efile.in   #报错configure失败     7  8//Handling Configure Error 9 a Config.status:error:cannot find input file:src/was reported during the compilation process Makefile.in this error, and then run the following command to compile it again: Ten # aclocal11 # libtoolize--force12 # automake--ADD-MISSING13 # autoconf14 # Autohea DER15 # Make Clean16 17//compile from new Configure #./CONFIGURE19 # make && make install
2, install Sphinx PHP extension
1 Http://pecl.php.net/package/sphinx 2 # wget http://pecl.php.net/get/sphinx-1.3.0.tgz 3 # tar zxvf sphinx-1.3.0.tgz 4 # C  D sphinx-1.3.0 5 # phpize 6 #./configure--with-php-config=/usr/bin/php-config--with-sphinx=/usr/local/sphinxclient 7 #  Make && make install 8 # CD/ETC/PHP.D/9 # cp Gd.ini  sphinx.ini10 # vi sphinx.ini11 extension=sphinx.so13 # Service PHP-FPM Restart

After installation, you can use the four-2 test_sqh.php test Search


Paging test:
 1 <?php 2 header ("content-type:text/html; Charset=utf-8 "); 3 require ("./sphinxapi.php"); 4 $s = new sphinxclient; 5 $s->setserver ("", 9312); 6 7//sph_match_all, matching all query terms (default mode); Sph_match_any, matches any one of the query words; SPH_MATCH_EXTENDED2, support for special operators query 8 $s->setmatchmode (sph_match_all);                                        9 $s->setmaxquerytime (30);                                        Set maximum search Time $s->setarrayresult (false);                                            Whether to replace the matches key with an ID $s->setselect ("*");                                Sets the content of the returned information, equivalent to SQL12 $s->setrankingmode (SPH_RANK_BM25); Setting the scoring mode, SPH_RANK_BM25 may reduce the quality of the results of queries that contain multiple words.                            //$s->setsortmode (sph_sort_extended);        Finding that adding this parameter will result in inaccurate results//$s->setsortmode (sph_sort_extended, "from_id asc,id desc"); Sets the sort pattern for a match, sph_sort_extended the columns in an SQL-like manner, in ascending or descending order.     $weights = Array (' company_name ' = 20); 16$s->setfieldweights ($weights);                                Set field weights $s->setlimits (0, 30, 1000, 0);        Set the result set offset setlimits (cheap amount, number of matches, number of result sets of query default 1000, threshold reached after stop)//$s->setfilter ($attribute, $values, $exclude =false);    Set attribute filter//$s->setgroupby ($attribute, $func, $groupsort = "@group desc"); Set properties for grouping $res = $s->query (' @* ' "Car" ', ' main ', '--single-0-query--'); #[BMW] keyword, [news] Data source SOURCE21 22 23//Code Highlight $tags = Array (); $tags _name = Array (); + foreach ($res [ ' Matches '] as $key = $value) {$tags [] = $value [' attrs '];28 $company _name[] = $value [' attrs '] [' Compan Y_name '];29 $description [] = $value [' attrs '] [' description '];30}31 $company _name = $s->buildexcerpts        ($company _name, ' main ', ' car ', $opts =array ());        Do highlight, here the index name must not be used *32 $description = $s->buildexcerpts ($description, ' main ', ' car ', $opts =array ()); Do highlight, the index name here must not be used *33     foreach ($tags as $k = + $v) $tags [$k] [' company_name '] = $company _name[$k];    After highlighting the $tags [$k] [' description '] = $description [$k]; After highlighting 37}38 39//Highlight after overlay $i = 0;41 foreach ($res [' matches '] as $key + = $value) {$res [' mat Ches ' [$key] = $tags [$i];43 $i ++;44}45 $err = $s->getlasterror (); Echo ' <pre&gt ;‘; Var_export ($res); Var_export ($err); Echo ' </pre> ';

For more information on the Sphinxclient class, refer to the official documentation.

Four-3, install Sphinxse for MySQL 5.5.x compilation

Install sphinxse directly as the MySQL storage engine, PHP can use Sphinx full-text indexing without any changes.

#---------------MySQL 5.5 basic settings Compile-----------#以下指令中的VERSION表示MySQL的版本, for example: 5.5.8, or 5.5.9$ tar xzvf coreseek-4.1-beta.tar.gz$ tar xzvf mysql-version.tar.gz #就是上面下载的mysql源码包 $ cp-r coreseek-4.1-beta/csft-4.1/mysqlse mysql-version/storage/sphinx# The above statement is to copy the Mysqlse folder to the storage folder and rename it to Sphinx, note the CD mysql-version$ CMake. -dcmake_build_type=release-dwith_sphinx_storage_engine=1 #如果提示没有命令 First "yum-y install CMake" installation cmake# if the above statement "Warning: Bison executable not found in PATH, install Bison and run this statement # ncurses install apt-get or Libncurses-dev install yum-y if prompted for missing Ncurs library Es-devel after installing the ncurses library, delete the CMakeCache.txt file and CMake again # to see the cmake available parameters, execute: CMake. -lh$ make#----------Install Sphinxse to the already installed MySQL 5.5-----------#首先执行: MySQL 5.5 basic setup compilation process # Special note: You need to select MySQL with the currently installed 5.5 The corresponding version is compiled $ CP storage/sphinx/ha_sphinx.so/path_to_your_mysql_5.5/lib/plugin$ Mysql-uroot-p??? Mysql>install plugin Sphinx soname "ha_sphinx.so"; mysql>show engines; #ENGINE列表显示出SPHINX表示安装成功

Test using Sphinxse:

#-------------------------Test sphinxse--------------------------------------mysql> use test_qq;mysql> CREATE TABLE documents_sphinxse-> (id INTEGER UNSIGNED not null,-> weight INTEGER not null,-> query VARCHAR (3072) Not null,-> group_id integer,-> INDEX (query)) Engine=sphinx connection= "Sphinx://localhost:9312/mysql"; #- ----------Execute sphinxse Query-----------------------------mysql>select * from Documents_sphinxse WHERE query= ' network search; mode =any '; #--------perform an associated sphinxse query to get raw data information:-----------
mysql>SELECT dse.*, d.title from Documents_sphinxse as DSE left JOIN documents as D USING (ID) WHERE query= ' network search ; Mode=any ';

Five, reference document





Article Source: http://www.cnblogs.com/GaZeon/p/5327578.html

"Finishing" the Chinese search engine coreseek4 installation of Linux, and PHP using Sphinx three ways (sphinxapi,sphinx PHP extension, sphinxse as the MySQL storage engine)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.