The method of using Sphinx to implement multithread search in Mysql

The method of using Sphinx to implement multithread search in Mysql _mysql

Last Update:2017-01-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The queries in MySQL, Sphinx, and many databases and search engines are single-threaded. For example, to execute a query on a 32-CPU-Core, 16-disk R910 server, it uses at most one core and one disk. Yes, only one is used.

If the query is CPU intensive, it uses about 3% of the machine's CPU capacity (in the case of the 32 nuclear machines mentioned above). In the case of disk-intensive, approximately 6% of the machine IO capability is used (the same configuration as the previous example, with 16 disks consisting of RAID10 or RAID0).

Let me put it another way. If you run a query on a single core disk and take 10 seconds, it takes 10 seconds for the same query to be on a 32-core 16-disk machine, without any improvement.

You already knew that, didn't you? So, my question is-is there any way to improve it?

If it is Sphinx, great, the answer is YES! And it doesn't take too much effort. You don't even need to modify the application and the database, just slightly change the Sphinx configuration.

Plan

First of all, let me explain our goal.

The Sphinx itself supports distributed search, which has long been designed to scale horizontally. If the index is not placed on a single machine, you can have more than one machine to index different parts, set up an aggregation node, responsible for receiving requests from the application, and then send the request to all the data nodes at the same time, and finally the results of their return merged to return to the application. In the application it looks as if only one server is serving it.

Well, guess what? Ha, we can apply this function to a single machine, so that our query faster than n times. And now that Sphinx has supported this approach, we don't have to pretend to query which remote nodes.

Another benefit is that, after configuring a distributed search, indexes can be built in parallel!

It's still a point to note that while this approach can speed up the vast majority of queries, there are some exceptions. Because the results of parallel queries still need to be merged, the merge process is single-threaded. Also, merging includes some CPU-intensive operations, such as rating, sorting, or even count with group by, and if the volume of data is large, the merge process becomes a bottleneck.

To confirm this is also very simple, just look at the Sphinx query log, and see how many records each query matches, we have a number of minds.

Perform

Assume that the previous index configuration on the server is as follows (many of the details are omitted):

Copy Code code as follows:

SOURCE Src1
{
Type = MySQL
Sql_query = SELECT ID, text from table
}

Index idx1
{
Type = Plain
Source = Src1
}

Searchd
{
dist_threads = 0 # Default
}

Now we're using a machine with 3 CPU cores and disks to do the index--that's the idx1. Here is the configuration file that we changed:

Copy Code code as follows:

SOURCE Src1
{
Type = MySQL
Sql_query = SELECT ID, text from table
}

SOURCE Src1p0:src1
{
Sql_query = SELECT ID, text from table WHERE ID% 3 = 0;
}

SOURCE Src1p1:src1
{
Sql_query = SELECT ID, text from table WHERE ID% 3 = 1;
}

SOURCE Src1p2:src1
{
Sql_query = SELECT ID, text from table WHERE ID% 3 = 2;
}

Index Idx1_template
{
Type = Plain
Source = Src1
}

Index Idx1p0:idx1_template
{
Source = src0
}

Index Idx1p1:idx1_template
{
Source = Src1
}

Index Idx1p2:idx1_template
{
Source = Src2
}

Index idx1
{
Type = Distributed
Local = Idx1p0
Local = IDX1P1
Local = IDX1P2
}

Searchd
{
Dist_threads = 3
}

After you have done this, you need to rebuild the index. But now the index indexer command idx1p0 to IDX1P2 can be synchronized.

In addition, the use of different operations to separate data is not the best way, you can use a secondary table in MySQL to distinguish their scope, with sql_query_range use or whatever, depending on your data to determine.

Written in the last

I've always loved that Sphinx,sphinx can be so easily extended to the machines you need so much that it was used many years ago. Then, I thought, I didn't use this feature to make queries on a single machine go faster, as I always did. Well, that's not to say it's slow or what it is, it's just that the query is never too fast, is it?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The method of using Sphinx to implement multithread search in Mysql _mysql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The method of using Sphinx to implement multithread search in Mysql _mysql

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support