Sphtracing and coreseek: Powerful open-source full-text search engine

Source: Internet
Author: User
Introduction
Sphinx is an SQL-based full-text search engine that can be used in combination with MySQL and PostgreSQL for full-text search. It provides more professional search functions than the database itself, this makes it easier for applications to implement professional full-text retrieval. Sphinx specially designs search API interfaces for some scripting languages, such as PHP, Python, Perl, and Ruby. It also designs a storage engine plug-in for MySQL.

Sphinx a single index can contain a maximum of 0.1 billion records. The query speed of 10 million records is 0. x seconds (in milliseconds ). The index creation speed of sphenders is: 3 to 3 for creating an index with 1 million records ~ In four minutes, you can create an index of 10 million records within 50 minutes, and only the incremental Index containing the latest 0.1 million records takes dozens of seconds to recreate the index.


Features
The main features of Sphinx include:
High-speed indexing (nearly 10 Mb/s on the new CPU );
High-speed search (the average query speed of 2-4g text is less than 0.1 seconds );
High Availability (up to 100 GB of text and MB of documents can be supported on a single CPU );
Provides a good correlation ranking
Supports distributed search;
Generate document summaries;
Search from the plug-in storage engine in MySQL
Support searching Boolean, phrase, and synonym;
Supports multiple full-text search domains for each document (up to 32 by default );
Multiple attributes of each document are supported;
Support word breaking;
Supports single-byte and UTF-8 encoding.


Coreseek
Coreseek's development work is similar to sphek (started in 2001), which can be traced back to 2006. At that time, it tried to find an acceptable Chinese search solution for a database-driven website, however, at that time, there were no solutions that could completely and directly meet the requirements. In fact, the main problems are as follows:
· Search quality (for example, effective relevance algorithms similar to Google): The effectiveness of pure statistical methods is very poor, especially in the collection of a large number of short documents, such as forums and blogs.
· The search speed is particularly significant when the search phrase includes "Stop Word", for example, "To be or not to be"
· When an index is created, controllable disk and CPU consumption are in the current hardware environment. This is more important than the index construction speed.
· The accuracy and efficiency of Chinese search because it is well known that only accurate Chinese word segmentation can improve the accuracy of Chinese search and greatly reduce the computational workload.
Through the Internet, we learned that countless people have similar requirements. Then we explored different ways and tried different ways. After repeated practices, finally, we developed the coreseek full-text search engine based on sphplug and mmseg, and released it according to the gplv2 protocol for enterprises and individuals to solve Chinese search problems.
Year after year, many other solutions have been improved and new solutions are emerging. However, we agree that there is still no good solution, this allows us to migrate the search platform without sphtracing. In recent years, sphenders/coreseek users have given us a lot of positive feedback and suggestions. We have also continuously improved and improved the python data source, the application scope of sphsf-/ coreseek is expanded from the known world to the unknown world, and its application scenarios reach infinite possibilities. Therefore, obviously, the development process of sphworkshop/coreseek will continue (and may continue until the end of the world ).

Download
The original version of sphenders can be downloaded from the official sphenders website http://www.sphinxsearch.com.
Coreseek is available for http://www.coreseek.cn/download from coreseek official website.

Extension
Sph00000.9.9/coreseek 3.2 Chinese Reference Manual
For details about how to use sphinx, refer to here.

Remarks
This is just an introduction (Memorandum). Read the manual in detail and build an experiment environment.

Sphtracing and coreseek: Powerful open-source full-text search engine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.