1, using the Count function to generate the summary
The function function is easy to use, and the COUNT (*) statement is quick for the MyISAM table, but for BDB or InnoDB tables, avoid using it as much as possible, because the statement requires a full scan and is slow, and the solution is from Information_ Number of table_rows extracted in schema database
Use with the IF statement:
Create a view to simplify the usage summary (when using summaries frequently):
Working with Views:
With min (), Max () is similar:
Similar to the sum () and AVG () functions generate a summary:
2. Use the distinct function to eliminate duplicates
3, find the maximum child and minimum values related to the value
The cluster functions such as min () and Max () cannot be used in the WHERE clause, but we will have a numeric value similar to the maximum population, and the corresponding city name can be stored in a user-defined variable by the maximum population value, and then the data row will be compared to the value of the variable to solve the problem:
There is another way to pick a different data column from a data row that contains a minimum or maximum value: Use a connection, place the selected data in another table, and then match the table with the original table to the data row of that numeric value.
4, dividing the digest into subgroups
To more accurately find out how much information each sender sends from each host, you should use two columns of data to divide the subgroups.
The count here is the number of words each sender (SrcUser) sends from each host (Srchost).
Sub-groups The data rows in the mail table against SrcUser, and then displays the size of the largest message sent in each subgroup and the size of the smallest message
A recurring problem: When group by is used to divide subgroups, the only clauses that can be selected are the clauses that divide the subgroups and the values of the summaries computed from the subgroups.
Red is marked as an error because the subgroup is divided into name, and the presence of trav_date is not a clause in the subgroup and a digest value computed from the subgroup, and the data rows that are associated with the minimum maximum subgroup values are typically used to introduce the connected technology:
5, select the group using the identified attributes
The difference between where and having is that the having operation is able to carry out additional conditional constraints on the cluster subgroups on the set of data that have been selected and partitioned into subgroups.
Of course, you can also use aliases:
Use having and count to determine the uniqueness of a record within a subgroup:
can also be used with union values. For example, in order to find a Send/receive pair that only sends a message, you can look for a union that only occurred once in the mail table:
You can also use expression results to group:
6, category no category data
For example, in the absence of duplicate data, such as:
There is no duplicate number, so it is not good to use group by to divide the subgroup.
Divide the population into 5个百万 levels:?????
7, Control summary display order
To sort by the name of the molecule group, in days to travel:
8, find the minimum or Maximum summary value
Min (), Max () cannot be used as a parameter to other clustering functions. For example: You can easily find the total mileage of each driver:
does not work properly,
However, you can sort the data rows first, and then use limit to select the first row of data:
9, generate reports that include summaries and lists
Use Pthon to generate summary and list reports:
+ View Code?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import Cookbook import sys import MySQLdb name_map
=
{} conn
=
Cookbook.connect() cursor
=
conn.cursor() cursor.execute(
"""
select name, count(name),sum(miles)
from driver_log group by name
"""
) for
(name,days, miles)
in cursor.fetchall():
name_map[name]
=
(days,miles) # 选择每个司机的行程数并打印报告,显示行程列表中每个司机的摘要汇总项 cursor.execute(
"""
select name, trav_date, miles
from driver_log group by name, trav_date
"""
) cur_name
=
"" for (name, trav_date, miles)
in cursor.fetchall():
if cur_name !
=
name:
print "Name: %s; days on road: %d; miles drien:%d "
\
%
(name, name_map[name][
0
],name_map[name][
1
])
cur_name
=
name
print "date:%s, trip length:%d "
%
(trav_date,miles)
cursor.close()
|