Source: Internet
Author: User


<title>2014-06-06-sql-for-statistics</title>---layout:defaulttitle: Use SQL statements to do simple data statistics Category:tech---{{page.title}}
Have done a period of time data analysis work, when just took over the work, but also want to learn a special statistical tools;
Later found some simple statistics, with the database to do it, occasionally with awk to match, or Excel to do a diagram can be done;
Here is a simple introduction of some of the SQL statements I used, using the MySQL database syntax;

Part 1: Basic content

First of all, some SQL comes with statistical functions that are necessary, count,sum,avg,max,min;
These functions can be words too literally, all very simple;
Another basis is group by and order by;
The so-called group by IS in accordance with the statistics, generally with the above several words too literally function use, if the group to add conditions to use having;
The following examples are based on this table structure:
Job table fields: server_id (int), ID (int), Jobresult (varchar255), DT (timestamp), status (int);
{% highlight Java%}select server_id, COUNT (*) as NUM from job GROUP by server_id ORDER by Num; {% Endhighlight%}
This SQL means, from the job table, query how many records each server has, and sort by num from small to large;

Part2: You may not know the content;

1, by TIME statistics: for example, every five minutes for a statistical time,
Example: Query the number of records in all 5-minute periods of June 6, 2014 server_id 30;
{% highlight Java%}select HOUR (DT) as H, floor (MINUTE (DT)/5) as V,count (*) from job where server_id=30 and dt< "2014- 06-06 "and dt>" 2014-06-05 "Group by H,v; {% Endhighlight%}
Smart You can certainly understand what this means, hour and minute are to take the time function, floor is rounding;
2, by day statistics, a table has n days of data, statistics of daily data;
All you need to know is that a function can be substr (dt,1,10) as T, which is the date of the first 10 characters of the first 10 characters "xxxx-xx-xx" time;
If there are other ways to welcome the notification;
3, case when, multi-conditional statistics;
Example: Statistics on the success rate of all servers doing the task (Status=1 success) see below;
{% highlight Java%}select server_id,count (0) as job_num,100*sum (case is Status=1 then 1 else 0 end)/count (0) as Succ_rat E from job, group by server, order by rate; {% Endhighlight%}
Such a statement can be done, if you do not write, you can use more than one statement to achieve, it will be a lot of trouble;

PART3: Some tips:

1, build the index, if you want to query fast, then build index;
ALTER TABLE ' Analyip '. ' v2ip00 ' ADD UNIQUE ' NewIndex1 ' (' tk ');
2, update the data, according to the conditions of the two tables to update the data;
UPDATE v2ip01 a,tkip01 b SET b.v2ip=a.v2ip, B.sid=a.sid WHERE a.tk=b.tk;
3, distinct do not repeat, you can sum (distinct server_id) query how many different server_id;
4, Group_concat ();
With select Group_concat (ID) from job limit 10; The output is a comma-separated ID, which makes it easier to do further work, and the shell script is easy to implement if you don't write the program at the end;

{{page.date | date_to_string}}

From for notes (Wiz)


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: