Differences between Coreseek, sphkernel-for-chinaese, sphashes + Scws, and coreseekscws

Source: Internet
Author: User

Differences between Coreseek, sphkernel-for-chinaese, sphashes + Scws, and coreseekscws

Sphtracing is an SQL-based full-text search engine. It is widely used in many websites.

Sphinx has the following features:

A) high-speed indexing (in contemporary CPUs, peak performance can reach 10 Mb/s );

B) high-performance search (on 2-4 GB of text data, the average response time for each retrieval is less than 0.1 seconds );

C) Processing of massive data (it is known that it can process over 100 GB of text data, and 100 M of documents can be processed on a single CPU system );

Sphinx itself does not support Chinese characters.

It is mainly reflected in a broken word. English only needs to be segmented by space, but it is difficult for a wide and profound Chinese.

Word Segmentation is used in two places;

1. Index raw data based on Word Segmentation

2. Word Segmentation for user input during search and query in the Index

Currently, the three most common solutions are Coreseek, sphsf--for-chinaese, and sphsf-+ Scws.

1. Coreseek is a program developed by Chinese people based on Sphinx. Currently, the most stable version is based on the classic Sphinx0.9.9 version.

Advantages:Mature documents and communities are available. The mmseg word segmentation is currently the most useful word segmentation in China, and can be used for indexing and search word segmentation;

Disadvantages:Slow development and version updates; slow Indexing

Policy: A dictionary management background is used to maintain the dictionary. dictionaries are generated on a regular basis. This suite automatically performs word segmentation and indexing;

Applicable scenarios: Common young people, similar searches, applicable to common websites

2. sphsf-for-chinaese is an extended version of Chinese 2 developed based on the classic Sphinx0.9.9 version.

Advantages:Easy to deploy, easy to operate, Embedded Word Segmentation and word segmentation, indexes and search word segmentation can be used;

Disadvantages:Version updates are slow, word segmentation is weak, and indexing is slow.

Policy: Same

Applicable scenarios: ordinary youth, quick building of search sites

3. Sphinx + Scws are two independent systems deployed separately. The so-called high cohesion and low coupling are strongly recommended.

Advantages:The two systems are relatively independent, with their respective servers. Word Segmentation can be used for other purposes; version updates are faster;

Disadvantages:The deployment is a little complicated and the use is a little complicated; the index word segmentation can only use one dollar word segmentation, a large amount of data

Policy: The word segmentation service is called before the word segmentation service is called.

Applicable scenarios: Young people in literature and art, building a decent search, good young people in literature and art

The differences between Coreseek, sphsf-for-chinaese, and sphsf-+ Scws in this article are all the content shared by Alibaba Cloud. I hope you can give us a reference and support for the customer's house.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.