<title>2014-06-06-sql-for-statistics</title>---layout:defaulttitle: Use SQL statements to do simple data statistics Category:tech---{{page.title}}
Have done a period of time data analysis work, when just took over the work, but also want to learn a special statistical tools;
Later found some simple statistics, with the database to do it, occasionally with awk to match, or Excel to do a diagram can be done;
Here is a simple introduction of some of the SQL statements I used, using the MySQL database syntax;
Part 1: Basic content
First of all, some SQL comes with statistical functions that are necessary, count,sum,avg,max,min;
These functions can be words too literally, all very simple;
Another basis is group by and order by;
The so-called group by IS in accordance with the statistics, generally with the above several words too literally function use, if the group to add conditions to use having;
The following examples are based on this table structure:
Job table fields: server_id (int), ID (int), Jobresult (varchar255), DT (timestamp), status (int);
Example:
{% highlight Java%}select server_id, COUNT (*) as NUM from job GROUP by server_id ORDER by Num; {% Endhighlight%}
This SQL means, from the job table, query how many records each server has, and sort by num from small to large;
Part2: You may not know the content;
1, by TIME statistics: for example, every five minutes for a statistical time,
Example: Query the number of records in all 5-minute periods of June 6, 2014 server_id 30;
{% highlight Java%}select HOUR (DT) as H, floor (MINUTE (DT)/5) as V,count (*) from job where server_id=30 and dt< "2014- 06-06 "and dt>" 2014-06-05 "Group by H,v; {% Endhighlight%}
Smart You can certainly understand what this means, hour and minute are to take the time function, floor is rounding;
2, by day statistics, a table has n days of data, statistics of daily data;
All you need to know is that a function can be substr (dt,1,10) as T, which is the date of the first 10 characters of the first 10 characters "xxxx-xx-xx" time;
If there are other ways to welcome the notification;
3, case when, multi-conditional statistics;
Example: Statistics on the success rate of all servers doing the task (Status=1 success) see below;
{% highlight Java%}select server_id,count (0) as job_num,100*sum (case is Status=1 then 1 else 0 end)/count (0) as Succ_rat E from job, group by server, order by rate; {% Endhighlight%}
Such a statement can be done, if you do not write, you can use more than one statement to achieve, it will be a lot of trouble;
PART3: Some tips:
1, build the index, if you want to query fast, then build index;
ALTER TABLE ' Analyip '. ' v2ip00 ' ADD UNIQUE ' NewIndex1 ' (' tk ');
2, update the data, according to the conditions of the two tables to update the data;
UPDATE v2ip01 a,tkip01 b SET b.v2ip=a.v2ip, B.sid=a.sid WHERE a.tk=b.tk;
3, distinct do not repeat, you can sum (distinct server_id) query how many different server_id;
4, Group_concat ();
With select Group_concat (ID) from job limit 10; The output is a comma-separated ID, which makes it easier to do further work, and the shell script is easy to implement if you don't write the program at the end;
{{page.date | date_to_string}}
From for notes (Wiz)
2014-06-06-sql-for-statistics