A mysql Optimization Problem

Last Update:2018-05-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

{Code ...} the channel field is channel01, channel02... channel20status indicates different status identifiers. For example, 1 indicates the number of opened requests, and 2 indicates the number of disabled requests: count the total number of ip addresses in different channels and different statuses each day...

** The log table structure is as follows ** id int (10) primary key # primary key ip varchar (32) # iptime int (10) # Time channel varchar (128) # channel id normal index status int (10) # status id

The channel field is channel01, channel02. .. channel20
Status indicates different status identifiers. For example, 1 indicates the number of opened times and 2 indicates the number of disabled times.

Requirement:
Count the total number of ip addresses in different statuses in different channels per day
Count the number of unique ip addresses in different statuses of different channels per day
Count the number of new ip addresses added per day for different statuses in different channels (ip addresses of the current day and ip addresses not included in the database before the current day)

The following is the number of items in channel channel01 and status = 1.
The current implementation method is as follows:
For example, the number
Use php to calculate the early morning timestamp of $ begin and early morning timestamp of $ end

** A -- total ip addresses of the current day $ total ** select count (*) FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01; ** B -- number of unique ip addresses on the current day $ group ** SELECT COUNT (*) FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01 group by ip; ** C -- ip address of the day $ ip_str (PHP processing) ** SELECT ip FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01 group by ip; ** D -- number of ip addresses that have appeared before the current day $ before ** select count (*) FROM log WHERE ip IN ($ ip_str) AND time <{$ begin} group by ip; ** E -- number of new ip addresses ** $ new = $ group-$ before

Now there are a total of 0.52 million data records in the Table. About 30 thousand new data records are added every day, and there are more than 20 thousand records after deduplication. The execution time of Step 4 (Statement D) is about 3 seconds.

Is there any way to optimize this SQL statement, or is there any other way to achieve this?

Reply content:

** The log table structure is as follows ** id int (10) primary key # primary key ip varchar (32) # iptime int (10) # Time channel varchar (128) # channel id normal index status int (10) # status id

The channel field is channel01, channel02. .. channel20
Status indicates different status identifiers. For example, 1 indicates the number of opened times and 2 indicates the number of disabled times.

Requirement:
Count the total number of ip addresses in different statuses in different channels per day
Count the number of unique ip addresses in different statuses of different channels per day
Count the number of new ip addresses added per day for different statuses in different channels (ip addresses of the current day and ip addresses not included in the database before the current day)

The following is the number of items in channel channel01 and status = 1.
The current implementation method is as follows:
For example, the number
Use php to calculate the early morning timestamp of $ begin and early morning timestamp of $ end

** A -- total ip addresses of the current day $ total ** select count (*) FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01; ** B -- number of unique ip addresses on the current day $ group ** SELECT COUNT (*) FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01 group by ip; ** C -- ip address of the day $ ip_str (PHP processing) ** SELECT ip FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01 group by ip; ** D -- number of ip addresses that have appeared before the current day $ before ** select count (*) FROM log WHERE ip IN ($ ip_str) AND time <{$ begin} group by ip; ** E -- number of new ip addresses ** $ new = $ group-$ before

Now there are a total of 0.52 million data records in the Table. About 30 thousand new data records are added every day, and there are more than 20 thousand records after deduplication. The execution time of Step 4 (Statement D) is about 3 seconds.

Is there any way to optimize this SQL statement, or is there any other way to achieve this?

Your table has a big problem.
Varchar (32) should not be used for IP addresses. It is much less efficient when you want to query them later. The general practice is to use unsigned int in combination with the inet_aton function.
Similarly, if the channel field remains unchanged, you can use enum to replace varchar. You can create an index on varchar (128), which is no more inefficient. If you really don't want to use enum, you can consider local indexes, such as the first 12 characters, depending on the situation.
The time field should be indexed. You can create an index that contains three fields (channel, status, time). The order is very important, but there are fewer than the first and the last.

In d, the efficiency of in is relatively low. If all ip addresses are used minus one day ago, group by displays the top data by default. The time may need to be sorted.

The SQL statement for removing duplicate IP addresses can be improved as follows:
B) number of unique ip addresses on the current day

SELECT COUNT(DISTINCT ip)   FROM log  WHERE time >= {$begin}        AND time < {$end}        AND status = 1        AND channel = channel01;

C) ip address deduplication list for the current day

SELECT DISTINCT ip  FROM log  WHERE time >= {$begin}        AND time < {$end}        AND status = 1        AND channel = channel01;

D) The deduplicated number of ip addresses that appeared before the current day is similar to B.

This problem should be solved using counters to avoid complicated logic queries. Otherwise, it is difficult to solve it when there is enough data.
Counters can be directly performed using db, memcache, redis, and so on.
Or it is not suitable to run scripts on a daily basis for data statistics and view mysql in real time.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A mysql Optimization Problem

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A mysql Optimization Problem

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support