Sphinx principle and its advantages and disadvantages

Source: Internet
Author: User
Tags mysql query
What's What/sphinx?

Definition: Sphinx is a full-text search engine.

Features: excellent indexing and performance makes it easy to integrate SQL and XML data sources, and can use Sphinxapi, SPHINXQL, or sphinxse search interfaces to easily expand high-speed indexing through distributed search (on contemporary CPUs, peak performance can be up to ~ 15mb/seconds) High-Performance search (search on 1.2G text, 1 million documents, support up to 150~250 queries per second) why/why use Sphinx encounters

Encountered a similar requirement: Users can search through the article title and article to a piece of content, and the article title and the content of the article are stored in different libraries, but also across the computer room. Optional Options

A, directly in the database to implement a cross library like query

Advantages: Simple operation disadvantage: low efficiency, resulting in large network overhead

B, combined with Sphinx Chinese word search engine

Advantages: High efficiency, with high scalability disadvantage: not responsible for data storage

Use the Sphinx search engine to index the data, load it in at once, and then save it in memory. This allows users to search only on the Sphinx server to retrieve the data. Furthermore, Sphinx does not have MySQL-associated disk I/O defects and performs better. Other typical usage scenarios

1, fast, efficient, scalable and core Full-text search

When the volume of data is large, it is faster than MyISAM and InnoDB. The ability to create indexes on mixed data from multiple source tables, not limited to fields on a single table. The ability to consolidate search results from multiple indexes. Full-text search can be optimized based on the conditions attached to the attribute.

2. Use the WHERE clause and the limit phrase efficiently

When a select query is made in more than one where condition, poor index selectivity or no index-supported fields is less performance. Sphinx can index keywords. The difference is that in MySQL, the internal engine decides whether to use an index or a full scan, and Sphinx is the one that lets you choose which access method to use. Because Sphinx is saving data to ram, Sphinx does not do much I/O. And MySQL has a kind of called half random I/O disk read, read the record line to the sort buffer, and then sorted, and finally discarded most of the rows. So Sphinx uses less memory and disk I/O.

3. Optimize GROUP BY query

The sorting and grouping in Sphinx are fixed memory, which is a little more efficient than a MySQL query where the data set can all be placed in RAM.

4. Produce the result set in parallel

Sphinx allows you to produce several results simultaneously from the same data, using a fixed amount of memory. In contrast, a traditional SQL method either runs two queries or creates a temporary table for each search result set. And Sphinx uses a multi-query mechanism to accomplish this task. Instead of initiating a query one after another, you make a batch of several queries and then submit them in a single request.

5, extend up and outward

Scaling up: Increasing the cpu/kernel, extending disk I/O outward expansion: multiple machines, i.e. distributed Sphinx

6. Aggregated fragment data

Suitable for distributing data between different physical MySQL servers. Example: There is a 1TB size table, which has 1 billion articles, through the user ID slice to 10 MySQL server, under the individual user's query of course very quickly, if need to implement an archive paging function, show a user of all friends published articles. Then you need to visit a number of MySQL server colleagues. It's going to be slow. Instead, sphinx only needs to create several instances, mapping frequently accessed article properties in each table, and then paging queries, a total of three lines of code configuration. how/How to use Sphinx

Sphinx Work Flow chart:

Flowchart Explanation: Database: Data source, is the data source that Sphinx do index. Because the Sphinx is independent of the storage engine, the database, so the data source can be MySQL, PostgreSQL, XML and other data. Indexer: Indexing program, fetching data from the data source and generating the data in Full-text indexing. It is possible to run indexer periodically to meet the requirements of regular update indexes according to requirements. Searchd:searchd directly with the client program, and uses the indexer program to build a good index to quickly process the search query. APP: Client program. Receives a search string from user input, sends a query to the SEARCHD program, and displays the returned results. The working principle of Sphinx

Sphinx's entire workflow is indexer program to the database to extract data, segmentation of data, and then generate a single or multiple indexes according to the generated participle, and pass them to the SEARCHD program. The client can then search through API calls.

After introducing the working principle of Sphinx, the next step is to let Sphinx work, first look at the configuration of Sphinx. Configuration of Sphinx Data Source Configuration

Let's take a look at a sample configuration file for a data source:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17-18 Source Test {       type                       = MySQL            Sql_host                   = 127.0.0.1        sql_user                   = root        sql_pass                   = root         sql_db                     = Test        sql_port                   = 3306      # optional, Default is 3306           sql_query_pre             = SET NAMES UTF8        sql _query         = SELECT ID, name,

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.