How to build a search system using Coreseek (Sphinx)

Source: Internet
Author: User
Tags elastic search

Coreseek Actual Combat summary

This article contains the following content:

    • Typical architecture of the Coreseek
    • Real-time Solutions
    • MMSEG participle Use experience
    • Experience with synonyms
    • Successor goals
Typical architecture of the Coreseek

The typical structure of coreseek is that the incremental index satisfies the approximate real-time, and the new records cannot be searched in time. For records that exist in the search system, updates to non-string fields are generally called by the Update method. If the search engine wants to return other fields of the business, the real-time nature of the field is quasi-real-time. This situation can be solved in a typical architecture, as shown in Figure 1 is the current typical architecture.

Figure 1
This scheme has some defects in real-time, which satisfies the low-real-time business. However, if the search engine is required to return other business fields, it is best to query the K-v storage system based on the DocId in the search results.

Real-time Solutions

If you want to search for new results in a short delay, if you do not use Sphinx RT mode, you have to think of other options. However, in the current Coreseek version, the RT mode is not very mature, not very stable. However, if the latest version of the Sphinx, there is no Chinese word support, so in the current typical architecture, the need to add a small system, using search results merged.
This is the system architecture shown in Figure 2.

Figure 2
Of course, we can also use elastic search. Here the business records are updated to accelerate the middle of the DB can be msq asynchronous, decoupled, Message Queuing failed, do not need reconciliation, because the subsequent incremental index is equivalent to reconciliation.

MMSEG participle Use experience

Mmseg use of the use of experience:
Adding a thesaurus, but what words to add, is fastidious. Personal experience, do not add the length of more than 5 Chinese words, one coreseek inside the default word length is 5, and the word length of more than 5, can be split into independent words, and then combined, and after several times the thesaurus adjustment experience: segmentation less than the segmentation accuracy is relatively high, just personal experience.

Experience with synonyms

Coreseek synonyms seem not to support very well, later I was by modifying the source: In the index, through the keyword to obtain the synonym of the keyword, and index, there is a need for source code can contact me.
Let's talk about the use of consent words:
- search for the same type of product
If we want to search urine not wet goods, the urine is not wet actually is the diaper. But our product name does not appear urine does not wet. In this case, you can add synonyms to the mmseg:
Paper Diapers
-Urine is not wet,
- abbreviations for some brands
Some brand users like to search for short, such as product brand is Avanade. However, it is possible for users to use Cerviño to search.
If we put avanade this keyword into the thesaurus, then users use Cerviño estimate is not to search. If Avanade is not in the thesaurus, the index will be divided into: AI | Then the user search keyword Cerviño, if not in the keyword, will also cut into: | The search can be searched at this time. If either side is in the thesaurus, then the other party must also be the thesaurus.
Then add synonyms:
Avanade
-Cerviño,

Subsequent

You can try to add the Coreseek Chinese word breaker to the latest version, using RT mode.

How to build a search system using Coreseek (Sphinx)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.