Configure and use Sphinx

Source: Internet
Author: User
Configure and use Sphinx

In the previous article, I talked about the installation of Sphinx. in this article, I would like to explain how to use it.

Taking my blog articles as an example, I want to search the full text of my blog content

Table name wp_posts. The field description is as follows:


Focus on the configuration of sphworker. conf

Both index generation and search are based on this file. to perform full-text search, you must configure it so that sphsf-can know which fields need to be indexed and which fields are in the where clause,

Group and order by are used.

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465 # Source wordpress_s {type = mysql SQL _host = localhost SQL _user = test SQL _pass = 111 SQL _db = wordpress SQL _port = 3306 # optional, defaultis 3306 SQL _query_pre = set names UTF8 # Character SET to be SET before SQL execution (SET NAMES UTF8) # full-text search to display content, according to the official statement: Do not use WHERE or GROUPBY, it is more efficient to give it to SPHINX. the select field must contain at least one unique primary key, and the full-text retrieval field # The SELECT field must contain at least one unique primary key, in addition, if the unique primary key is not an id, the as is the id and the full-text retrieval fields are required. the fields used in the where clause must also be selected. # This I index SQL _query = select id as ID, post_title, post_title, post_author on id, post_title, post_content, post_name, post_author from wp_posts # SQL _attr * to indicate attribute fields, the fields you plan to use in where and orderby group must be defined here and can be used for filtering and sorting during search. # SQL _attr_uint, SQL _attr_string, SQL _attr_timestamp, etc. for details about attributes, see http://www.coreseek.cn/docs/coreseek_4.1-sphinx_2.0.1-beta.html#attributes SQL _attr_uint = ID SQL _field_string = post_title SQL _field_string = post_name # SQL _attr_timestamp = date_added // SQL _query_info query the document information. Optional. it is null by default. Valid only for mysql data sources. It is used only for command line search to obtain and Display Document Information. Currently, it is only valid for MySQL and only for debugging purposes. This query obtains the document information to be displayed by the CLI search tool for each document ID. It must contain the $ id macro to correspond to the queried document ID. SQL _query_info = SELECT * FROM wp_posts WHERE id = $ id # $ id does not seem to be any other one. # Strip_html = 0 # whether to remove the HTML tag. The new version does not seem to support this setting anymore. it should be replaced by html_strip in the index .} # Index Name # index nameindex wordpress_ I {source = wordpress_s # source Name path =/home/admin/sph?/ var/data/wordperss # index generation directory docinfo = extern min_prefix_len = 0 # the minimum word segmentation prefix min_infix_len = 1 # min_word_len = 1 # The minimum length of words indexed is set to 1. you can search for a single byte, the smaller the index, the more accurate the index, but the longer it takes to create the index. charset_type = UTF-8 # set the data encoding to UTF8 charset_table = 0 .. 9, .. z-> .. z, _, .. z, U + 410 .. U + 42F-> U + 430 .. U + 44F, U + 430 .. U + 44F ngram_chars = U + 3000 .. U + 2FA1F ngram_len = 1 # cut the length of non-letter data into html_strip = 1 # Remove the HTML tag} indexer {mem_limit = 32 M} searchd {listen = 9312 listen = 9306: mysql41 log =/home/admin/sphinx/var/log/searchd. log query_log =/home/admin/sphinx/var/log/query. log read_timeout = 5 max_children = 30 pid_file =/home/admin/sphinx/var/log/searchd. pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 workers = threads # forRT to work binlog_path =/home/admin/sphinx/var/data}

Sphenders support several attribute types

SQL _attr_uint and SQL _attr_bigint


#32-bit unsigned integer and 64-bit signed integer. You can use these two types for all integer database fields and DATE.

SQL _attr_float #32-bit floating point value. If you want to store geographic coordinates, you can use this attribute type. It should also be noted that if you need higher accuracy, it will not be solved

METHOD. The field is rounded to seven decimal places.

SQL _attr_bool # A Boolean (single bit) value, similar to the tinyint value of MySQL.

SQL _attr_timestamp # A UNIX timestamp that represents a date/time value from. You cannot directly

Use the DATE or DATETIME column type. You must use the UNIX_TIMESTAMP () function to convert them into timestamps. If you only need a date, you can use

The TO_DAYS () function converts the DATE field to an integer.

SQL _attr_string and SQL _field_string # string (obviously !), However, the former is only used for retrieval, while the latter can be indexed as full text.

Its structure is mainly composed of the following:

1234567891011121314151617 Source Name 1 {// specify some data Source configurations} Index name 1 {Source = Source name 1} Source name 2 {some configurations} Index name 2 {Source = Source name 2} Indexer {mem_limit = 32 M // memory usage restrictions during indexing} Searchd {// Configure the searchd daemon itself}

The Source configuration items are as follows:

123456789101112131415161718192021222324252627 # Type Database type, currently, mysql and pgsql # SQL _host database host address # SQL _user database username # SQL _pass database password # SQL _db database name # SQL _port database port # SQL _query_pre character set to be set before SQL execution, to use utf8, you must set names utf8 # SQL _query the content to be displayed in full-text search. do not use where or group by here, and submit the content of where and groupby to sphinx, the efficiency of condition filtering and groupby by sphinx is higher # Note: The select field must contain at least one unique primary key (ARTICLESID) and the field to be searched in full text, you plan to select the fields used in the where clause. # Here, you do not need to use attribute fields starting with orderby # SQL _attr _. you originally planned to use where, orderby, fields in groupby should be defined here

The index part is as follows:

123456789101112131415 Source = wordpress_s # source Name path =/home/admin/sphinx/var/data/wordperss # Index generation directory docinfo = extern charset_type = UTF-8 # Character set encoding charset_table = 0 .. 9, .. z-> .. z, _, .. z, U + 410 .. U + 42F-> U + 430 .. U + 44F, U + 430 .. U + 44F # indicates the valid character set ngram_chars = U + 3000 that can be recognized by the unary Character Segmentation mode .. U + 2FA1F # indicates the character set ngram_len = 1 # indicates that the unidimensional character segmentation mode is used to index a single Chinese character. html_strip = 1 # filter HTML tags

After configuration, you can use bin/indexer to generate indexes. The following describes several frequently used commands.

12345678 # Update all indexes. if "-c" is not specified, the sphenders will be loaded by default. conf configuration> bin/indexer-c (-- config)/home/amdin/sphdin/etc/sphinx. conf -- all # update the index wordpress_ I> bin/indexer-c/home/amdin/sph1_/ etc/sph1. conf [index name] # used to rotate the index. it will create an additional Index. Once the index is created, it will send a signal to searchd, and searchd will rename the original index *. old, load this additional Index> bin/indexer -- config/home/myuser/sphinx. conf -- all # merge incremental indexes to primary indexes> bin/indexer -- merge main addtion_index -- rotate

How to search?

Start the search service

> Bin/searchd

The corresponding stop command is

> Bin/searchd-stop

Search

> Bin/search-c etc/sphenders. conf's'

The full-text search for the word "person", because my minimum split number is 1, the person will be split into "s" and "man" for search, although the search is the best, however, it takes longer to create an index.

A large amount of information will be displayed after the operation is successful,

Marked red is important information in comparison: index, search term, number of matching results, search time, etc,

Document indicates the number of hits in this data item.

Methods are used in PHP,

12345678910 Require ("sphinxapi. php "); $ q = $ argv [1]; $ cl = newSphinxClient (); $ cl-> SetServer ('localhost', 9312 ); $ cl-> SetArrayResult (true); $ cl-> SetLimits (0, 10); $ result = $ cl-> Query ($ q); print_r ($ result );

Run

>/Home/admin/fpm-php/bin/php test. php's people'

View results,

But there seems to be a problem,

I don't know why I didn't display the text content. I only had some basic information. Why can't I get the ID number and go to MYSQL to get the detailed data ??

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.