In MySQL, how does one calculate the median of a group of data ?, Mysql Median
To obtain the median of a set of data (for example, the median of income in a region or company), we need to first divide this task into three smaller tasks:
Example:
Table creation statement:
Create table 'Welcome '('name' VARCHAR (10) not null default '', 'Welcome' INT (11) not null default '0 ') ENGINE = InnoDB default charset = utf8; insert into test. income (name, income) VALUES ('marat', 20000); insert into test. income (name, income) VALUES ('Lee 4', 12000); insert into test. income (name, income) VALUES ('zhang san', 10000); insert into test. income (name, income) VALUES ('wang 'er, 16000); insert into test. income (name, income) VALUES ('tuhao', 40000 );
Query statement of Task 1:
SELECT a1.name, a1.income, count(*) AS rankFROM income AS a1, income AS a2WHERE a1.income < a2.income OR (a1.income = a2.income AND a1.name <= a2.name)GROUP BY a1.name, a1.incomeORDER BY rank;
Query statement of Task 2:
SELECT (COUNT(*) + 1) DIV 2FROM income;
Query statement of Task 3:
SELECT income AS medianFROM (SELECT a1.name, a1.income, count(*) AS rank FROM income AS a1, income AS a2 WHERE a1.income < a2.income OR (a1.income = a2.income AND a1.name <= a2.name) GROUP BY a1.name, a1.income ORDER BY rank) a3WHERE rank = (SELECT (COUNT(*) + 1) DIV 2 FROM income)
So far, we have found a way to obtain the median from a group of data.
Next, we will introduce another method to optimize ranking statements.
We all know how to sort a group of data. In this example, the implementation method is as follows:
SELECT name, incomeFROM incomeORDER BY income DESC
Can we go further and add a column to the query result. The data in this column is the ranking?
We can achieve this through three methods of custom variables:
- The first variable is used to record the income of the current row of data.
- The second variable is used to record the income of the last row of data.
- The third variable is used to record the ranking of the data in the current row.
SET @curr_income := 0;SET @prev_income := 0;SET @rank := 0;SELECT name, @curr_income := income AS income, @rank := if(@prev_income != @curr_income, @rank + 1, @rank) AS rank, @prev_income := @curr_income AS dummyFROM incomeORDER BY income DESC
The query result is as follows:
Then, find the ranking number of the median to further find the median of income:
SET @curr_income := 0;SET @prev_income := 0;SET @rank := 0;SELECT income AS medianFROM (SELECT name, @curr_income := income AS income, @rank := if(@prev_income != @curr_income, @rank + 1, @rank) AS rank, @prev_income := @curr_income AS dummy FROM income ORDER BY income DESC) AS a1WHERE a1.rank = (SELECT (COUNT(*) + 1) DIV 2 FROM income)
So far, we have found two ways to solve the median problem. Spread flowers.