Sphtracing installation and api Study Notes

Source: Internet
Author: User

Sphinx is an SQL-based full-text search engine that can be used in combination with MySQL and PostgreSQL for full-text search. It provides more professional search functions than the database itself.

Install sphinx

There are two ways to apply Sphinx on mysql:

1. Use API calls, such as querying using API functions or methods such as PHP and java. The advantage is that mysql does not need to be re-compiled, the server process is "low coupling", and the program can be called flexibly and conveniently. The disadvantage is that some programs need to be modified under the conditions of existing search programs. Recommended for programmers.

2. Use the plug-in method (sphinxSE) to compile sphashes into a mysql plug-in and use specific SQL statements for retrieval. It is characterized by convenient combination at the SQL end and direct return of data to the client. You do not need to perform secondary queries. You only need to modify the corresponding SQL statement in the program. However, this is inconvenient for programs developed using the framework, such as using ORM. In addition, you also need to recompile mysql, And the mysql-5.1 version and above support plug-in storage.

The installation here mainly introduces the first method of calling through api. Install Sphinx as follows:

The Code is as follows: Copy code

# Download the latest stable version

Wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz

Tar xzvf sphinx-0.9.9.tar.gz

Cd sph0000- 0.9.9

./Configure -- prefix =/usr/local/sphure/-- with-mysql -- enable-id64


Make

Make install

Note: Chinese Word Segmentation is not supported for installation in this mode.


3. sphsf-full-text search for Chinese Word Segmentation is different from latin series such as English. The latter is word breaking based on special characters such as spaces, while Chinese is Word Segmentation Based on semantics. There are two plug-ins for Chinese Word Segmentation

1. Coreseek

Coreseek is the most widely used sphinx full-text search. It provides LibMMSeg, a Chinese Word Segmentation package designed for Sphinx, and is developed based on sphek.

2. sfc (sphsf--for-chinese)

Sfc is another chinese Word Segmentation plug-in provided by happy brother. Its Chinese Dictionary uses xdict.

This section describes how to install Coreseek.

4. Install Coreseek (sphek that supports Chinese search) 1. Install and upgrade autoconf

Because coreseek requires autoconf 2.64 or a later version, you need to upgrade autoconf, otherwise it will report an error from the http://download.chinaunix.net/download.php? Id=29328&resourceid=648download autoconf-2.64.tar.bz2, the installation method is as follows:

The Code is as follows: Copy code

Tar-jxvf autoconf-2.64.tar.bz2

Cd autoconf-2.64

./Configure

Make

Make install

2. Download coreseek

The new version of coreseek puts the dictionary and sphinx source program in a package, so you only need to download the coreseek package.

Wget http://www.wapm.cn/uploads/csft/3.2/coreseek-3.2.14.tar.gz

3. Install mmseg (the dictionary used by coreseek)

The Code is as follows: Copy code

Tar xzvf coreseek-3.2.14.tar.gz

Cd mmseg-3.2.14

./Bootstrap # The output warning information can be ignored. If an error occurs, it must be resolved.

./Configure -- prefix =/usr/local/mmseg3

Make & make install

Cd ..

4. Install coreseek (sphworkflow)

The Code is as follows: Copy code

Cd csft-3.2.14

Sh buildconf. sh # The output warning information can be ignored. If an error occurs, it must be resolved.

. /Configure -- prefix =/usr/local/coreseek -- without-unixodbc -- with-mmseg-separated des =/usr/local/mmseg3/include/mmseg/--- mmseg-libs =/usr/local/mmseg3/lib/-- with-mysql

Make & make install

Cd ..

5. Test mmseg word segmentation and coreseek search

Note: you need to set the character set to zh_CN.UTF-8 in advance to ensure the correct display of Chinese, my system character set to en_US.UTF-8 is also possible.

The Code is as follows: Copy code

Cd testpack

Cat var/test. xml # Chinese characters should be correctly displayed at this time

/Usr/local/mmseg3/bin/mmseg-d/usr/local/mmseg3/etc var/test. xml

/Usr/local/coreseek/bin/indexer-c etc/csft. conf -- all

/Usr/local/coreseek/bin/search-c etc/csft. conf

Network Search

In this case, the correct response should be returned.

Words:

1. 'network': 1 agents, 1 hits

2. 'search': 2 documents, 5 hits

6. Generate the mmseg dictionary and configuration file

The new version is automatically generated.


Complete usage and important attributes of the sphinx api, so that you can forget it!

The Code is as follows: Copy code
$ Cl = new SphinxClient ();
// The default host for installation: localhost and sphsf-port: 3312
$ Cl-& gt; SetServer ("localhost", "3312 ");
// Optional. Set the weight for each full-text search field based on the order of the field defined in SQL _query. The sphsf-system will adjust the weight later. You can set the weight by field name. for more information, see SetFieldWeights (array (100, 1 ))
$ Cl-> SetWeights (array (100, 1 ));


// Query mode: SPH_MATCH_ALL matches all query words (and) (default mode) SPH_MATCH_ANY, matches any of the query words (or) SPH_MATCH_PHRASE, the entire query is regarded as a phrase, and SPH_MATCH_BOOLEAN must be matched in order. The query is regarded as a Boolean expression SPH_MATCH_EXTENDED, and the query is considered as an expression of the internal Query Language of Sphinx. There is also a special "Full scan" mode, which is automatically activated when the following conditions are met:
// 1. the query string is null (that is, the length is zero)
// 2.doc info is stored as extern. In the full scan mode, all indexed documents are considered as matching. Such matching will still be filtered, sorted, or grouped, but no real full-text retrieval will be performed. This mode can be used to unify full-text search and non-full-text search code, or reduce the burden on the SQL Server (sometimes Sphinx scan is faster than similar MySQL queries)
$ Cl-> SetMatchMode ("SPH_MATCH_ALL ");

// Only search for forum_id = 1 or 3 or 7. If $ cl-> SetFilter ("forum_id", array (, 7), true), only search for forum_id! = 1 or! = 2 or! = 7
$ Cl-> SetFilter ("forum_id", array (1, 3, 7 ));

// SPH_GROUPBY_DAY, which extracts the year, month, and day from the timestamp in the format of YYYYMMDD
// SPH_GROUPBY_WEEK, which extracts the first day of the year and the specified number of weeks (from the beginning of the year) from the timestamp in the format of YYYYNNN
// SPH_GROUPBY_MONTH, which extracts the month from the timestamp in the YYYYMM format
// SPH_GROUPBY_YEAR, which extracts the year from the timestamp in the YYYY format // The final search result contains an optimal match for each group. The group function value and the number of matches in each group are returned in the form of "virtual" attribute @ group and @ count.
// SPH_SORT_RELEVANCE ignores any additional parameters and is always sorted by relevance score. All other patterns require additional sort clauses. The syntax of the clauses is related to the specific pattern.
$ Cl-> SetGroupBy ("UserName", SPH_GROUPBY_ATTR, $ groupsort );

$ Cl-> SetGroupDistinct ($ distinct );
/*
$ Cl-> SetGroupBy ("category", SPH_GROUPBY_ATTR, "@ count desc ");
$ Cl-> SetGroupDistinct ("vendor ");
Equivalent:
SELECT id, weight, all-attributes,
COUNT (DISTINCT vendor) AS @ distinct,
COUNT (*) AS @ count
FROM products
Group by category
Order by @ count DESC
*/



// SPH_SORT_RELEVANCE mode: sph_sort_sort_asc mode: SPH_SORT_ATTR_DESC mode: SPH_SORT_RELEVANCE mode: sph_sort_sort_attr_asc mode, in the SPH_SORT_TIME_SEGMENTS mode, the attributes are sorted in ascending order (the smaller the attribute values are, the more advanced the columns are). The SPH_SORT_TIME_SEGMENTS mode is sorted in descending order by the time period (last hour, day, week, or month, then, sort the columns in ascending or descending order by relevance in the SPH_SORT_EXTENDED mode in a way similar to SQL. SPH_SORT_EXPR mode, which is sorted by an arithmetic expression.

The Code is as follows: Copy code
$ Cl-> SetSortMode (SPH_SORT_EXTENDED, "post_date ");

// Starting from 0th, get $ limit. The third parameter limits the maximum offset not greater than 1000.
$ Cl-> SetLimits (0, $ limit, ($ limit> 1000 )? $ Limit: 1000 );


// Set the scoring mode: * SPH_RANK_PROXIMITY_BM25, which is the default mode. Both the phrase score and BM25 score are used and combined. * SPH_RANK_BM25: Statistical relevance calculation mode. Only BM25 is used for scoring (the same as most full-text search engines ). This mode is fast, but may degrade the quality of the query results containing multiple words. * SPH_RANK_NONE: The scoring mode is disabled. This is the fastest mode. In fact, this mode is the same as Boolean search. All matching items are assigned a weight of 1. * SPH_RANK_WORDCOUNT, Which is sorted by the number of keyword occurrences. This sorter calculates the number of occurrences of keywords in each field, then multiply the count and the weight of the field, and sums the product as the final result. * SPH_RANK_PROXIMITY, version 0.9.9-rc1 is added, and the original phrase similarity is returned as a result. Internally, this mode is used to simulate SPH_MATCH_ALL queries. * SPH_RANK_MATCHANY, version 0.9.9-rc1 added, returns the previously calculated bits in SPH_MATCH_ANY. In this internal mode, it is used to simulate SPH_MATCH_ANY queries. * SPH_RANK_FIELDMASK is added in version 0.9.9-rc2. A 32-bit mask is returned. The Nth digit corresponds to the nth full-text field and starts counting from 0, if a field contains a keyword that satisfies the query, the corresponding flag is set to 1.

The Code is as follows: Copy code
$ Cl-> SetRankingMode ("SPH_RANK_PROXIMITY_BM25 ");

// For PHP. Control the return format of the search result set (returns matching items by array or by hash) $ the arrayresult parameter should be boolean. If $ arrayresult is false (default), the matching items are returned in PHP hash format. The Document ID is the key and other information (weight and attribute) is the value. If $ arrayresult is true, the matching items are returned in a normal array, including all information about the matching items (including the Document ID)
$ Cl-> SetArrayResult (true );



// Connect to the searchd server and execute the given query based on the current settings of the server to obtain and return the result set. $ Query is a query string, and $ index is a string containing one or more index names. If a general error occurs, false information is returned and GetLastError () information is set. If the search succeeds, the search result set is returned. In addition, $ comment will be sent to the front of the search part in the query log, which is very useful for debugging. Currently, the length of a comment must be less than 128 characters. The default value of $ index is "*", which means to query all local indexes. The index name can contain letters (a-z), numbers (0-9), minus signs (-), and underscores (_). other characters are considered as delimiters. Therefore, the following example calls are valid and two identical indexes will be searched:

The Code is as follows: Copy code
$ Res = $ cl-> Query ($ query, $ index );
/*
$ Cl-> Query ("test query", "main delta ");
$ Cl-> Query ("test query", "main; delta ");
$ Cl-> Query ("test query", "main, delta ");
*/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.