標籤:
目前postgresql中的部分函數在hive中也是存在的,所以今天就以postgresql為例來進行說明,這個過程同時可以應用到hive中。
1、建立表
CREATE TABLE employee ( empid INT, deptid INT, salary DECIMAL (10, 2));
2、匯入資料
INSERT INTO employee VALUES(1, 10, 5500.00); INSERT INTO employee VALUES(2, 10, 4500.00); INSERT INTO employee VALUES(3, 20, 1900.00); INSERT INTO employee VALUES(4, 20, 4800.00); INSERT INTO employee VALUES(6, 40, 14500.00); INSERT INTO employee VALUES(7, 40, 44500.00); INSERT INTO employee VALUES(10, 40, 44500.00); INSERT INTO employee VALUES(11, 40, 44501.00); INSERT INTO employee VALUES(8, 50, 6500.00); INSERT INTO employee VALUES(9, 50, 7500.00);
3、應用
--每個部門工資最高的 select * from (select e.*,row_number() over(partition by e.deptid order by e.salary desc) rank from employee e ) eewhere ee.rank =1;
--應該用rank,對於工資相同的應該都顯示出來select * from (select e.*,rank() over(partition by e.deptid order by e.salary desc) rank from employee e ) eewhere ee.rank =1;
--rank和dense_rank的區別就是前者佔位,後者不佔位--ntile可以看作是把有序的資料集合平均分配到expr指定的數量的桶中,將桶號分配給每一行。--如果不能平均分配,則較小桶號的桶分配額外的行,並且各個桶中能放的行數最多相差1。select e.*, ntile(3) over(partition by e.deptid order by e.salary desc) ntile, rank() over(partition by e.deptid order by e.salary desc) rank, dense_rank() over(partition by e.deptid order by e.salary desc) drank, row_number() over(partition by e.deptid order by e.salary desc) rownum, first_value(salary) over(partition by e.deptid order by e.salary desc) fv, last_value(salary) over(partition by e.deptid order by e.salary asc) lv from employee e
好了,就這些了,通過例子能很清楚的看到各個函數的應用
postgresql/hive中函數學習