Sphinx Simple Configuration

Source: Internet
Author: User

Example: sphinx.conf fragment:

... sql_query = SELECT ID, title, content, author_id, forum_id, post_date from my_forum_postssql_attr_uint = author_idsql_a Ttr_uint = Forum_idsql_attr_timestamp = Post_date ...
Example: Application code (PHP):
Only search posts by author whose ID is 123$cl->setfilter ("author_id", Array (123));//only search posts in sub -forums 1, 3 and 7$cl->setfilter ("forum_id", Array (1,3,7));//Sort found posts by posting date in descending Orde R$cl->setsortmode (Sph_sort_attr_desc, "post_date");

A specific property can be indicated by name, and the name is case-insensitive (note: Until now, Sphinx does not support Chinese as the name of the property). Properties are not full-text indexed, they are stored in the index file only as they are intact.
The ID of all documents must be a unique unsigned nonzero integer (32-bit or 64-bit, depending on the option of Sphinx construction)

When an index is established, Sphinx obtains a text document from the specified data source, divides the text into a collection of words, and then converts each word into a case, so that "ABC", "ABC" and "ABC" are all treated as the same word (word, or more scholarly, term)

In order to get the job done correctly, Sphinx needs to know:

    • What is the source text encoded;
    • Those characters are letters, which are not;
    • Which characters need to be converted, and what is converted to.

These can be charset_type configured separately with the and charset_table options for each index. Specifies whether the charset_type encoding of the document is single-byte (SBCS) or UTF-8. In Coreseek, if the Chinese word mode is started by Charset_dictpath, the encoding of GBK and BIG5 can be used, but in the internal implementation, it is still pre-converted to UTF-8 encoding for processing. The charset_table corresponding table that specifies the alphabetic characters to their case-converted versions, the characters that do not appear in this table are considered to be non-alphabetic characters, and are treated as a word's delimiter when indexing and retrieving.

In Coreseek, when the Chinese word breaker is enabled, the system uses the MMSEG built-in code table (which is hardcoded in the MMSEG program), so charset_table is invalidated after the word breaker is enabled.

Sphinx Simple Configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.