A mysql Optimization Problem

Source: Internet
Author: User
{Code ...} the channel field is channel01, channel02... channel20status indicates different status identifiers. For example, 1 indicates the number of opened requests, and 2 indicates the number of disabled requests: count the total number of ip addresses in different channels and different statuses each day...
** The log table structure is as follows ** id int (10) primary key # primary key ip varchar (32) # iptime int (10) # Time channel varchar (128) # channel id normal index status int (10) # status id

The channel field is channel01, channel02. .. channel20
Status indicates different status identifiers. For example, 1 indicates the number of opened times and 2 indicates the number of disabled times.

Requirement:
Count the total number of ip addresses in different statuses in different channels per day
Count the number of unique ip addresses in different statuses of different channels per day
Count the number of new ip addresses added per day for different statuses in different channels (ip addresses of the current day and ip addresses not included in the database before the current day)

The following is the number of items in channel channel01 and status = 1.

The current implementation method is as follows:
For example, the number
Use php to calculate the early morning timestamp of $ begin and early morning timestamp of $ end

** A -- total ip addresses of the current day $ total ** select count (*) FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01; ** B -- number of unique ip addresses on the current day $ group ** SELECT COUNT (*) FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01 group by ip; ** C -- ip address of the day $ ip_str (PHP processing) ** SELECT ip FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01 group by ip; ** D -- number of ip addresses that have appeared before the current day $ before ** select count (*) FROM log WHERE ip IN ($ ip_str) AND time <{$ begin} group by ip; ** E -- number of new ip addresses ** $ new = $ group-$ before

Now there are a total of 0.52 million data records in the Table. About 30 thousand new data records are added every day, and there are more than 20 thousand records after deduplication. The execution time of Step 4 (Statement D) is about 3 seconds.

Is there any way to optimize this SQL statement, or is there any other way to achieve this?

Reply content:
** The log table structure is as follows ** id int (10) primary key # primary key ip varchar (32) # iptime int (10) # Time channel varchar (128) # channel id normal index status int (10) # status id

The channel field is channel01, channel02. .. channel20
Status indicates different status identifiers. For example, 1 indicates the number of opened times and 2 indicates the number of disabled times.

Requirement:
Count the total number of ip addresses in different statuses in different channels per day
Count the number of unique ip addresses in different statuses of different channels per day
Count the number of new ip addresses added per day for different statuses in different channels (ip addresses of the current day and ip addresses not included in the database before the current day)

The following is the number of items in channel channel01 and status = 1.

The current implementation method is as follows:
For example, the number
Use php to calculate the early morning timestamp of $ begin and early morning timestamp of $ end

** A -- total ip addresses of the current day $ total ** select count (*) FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01; ** B -- number of unique ip addresses on the current day $ group ** SELECT COUNT (*) FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01 group by ip; ** C -- ip address of the day $ ip_str (PHP processing) ** SELECT ip FROM log WHERE time >={ $ begin} AND time <{$ end} AND status = 1 AND channel = channel01 group by ip; ** D -- number of ip addresses that have appeared before the current day $ before ** select count (*) FROM log WHERE ip IN ($ ip_str) AND time <{$ begin} group by ip; ** E -- number of new ip addresses ** $ new = $ group-$ before

Now there are a total of 0.52 million data records in the Table. About 30 thousand new data records are added every day, and there are more than 20 thousand records after deduplication. The execution time of Step 4 (Statement D) is about 3 seconds.

Is there any way to optimize this SQL statement, or is there any other way to achieve this?

Your table has a big problem.
Varchar (32) should not be used for IP addresses. It is much less efficient when you want to query them later. The general practice is to use unsigned int in combination with the inet_aton function.
Similarly, if the channel field remains unchanged, you can use enum to replace varchar. You can create an index on varchar (128), which is no more inefficient. If you really don't want to use enum, you can consider local indexes, such as the first 12 characters, depending on the situation.
The time field should be indexed. You can create an index that contains three fields (channel, status, time). The order is very important, but there are fewer than the first and the last.

In d, the efficiency of in is relatively low. If all ip addresses are used minus one day ago, group by displays the top data by default. The time may need to be sorted.

The SQL statement for removing duplicate IP addresses can be improved as follows:
B) number of unique ip addresses on the current day

SELECT COUNT(DISTINCT ip)   FROM log  WHERE time >= {$begin}        AND time < {$end}        AND status = 1        AND channel = channel01;

C) ip address deduplication list for the current day

SELECT DISTINCT ip  FROM log  WHERE time >= {$begin}        AND time < {$end}        AND status = 1        AND channel = channel01;

D) The deduplicated number of ip addresses that appeared before the current day is similar to B.

This problem should be solved using counters to avoid complicated logic queries. Otherwise, it is difficult to solve it when there is enough data.
Counters can be directly performed using db, memcache, redis, and so on.
Or it is not suitable to run scripts on a daily basis for data statistics and view mysql in real time.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.