Previously, the Department used the transform to implement row_number. I think it is more convenient to use it after UDF implementation, eliminating the hassle of the transform syntax.
Previously, the Department used the transform to implement row_number. I think it is more convenient to use it after UDF implementation, eliminating the hassle of the transform syntax.
Previously, the Department used the transform to implement row_number. I think it is more convenient to use it after UDF implementation, eliminating the hassle of the transform syntax.
The test table used is:
Hive> desc row_number_test;
OK
Id1 int
Id2 string
Age int
Score double
Name string
Hive> select * from row_number_test;
OK
2 t04 25 60.0 youlia
1 t01 20 85.0 liujiannan
1 t0224 70.0 zengqiu
2 t03 30 88.0 hongqu
2 t03 27 70.0 yongqi
1 t02 19 75.0 wangdong
1 t0224 70.0 zengqiu
Before using the SQL statement, you must partition and sort the data in the subquery. For example, the following SQL statement in Oracle:
Select row_number () over (partition by id1 order by age desc) from row_number_test;
The hive statement should be:
Select row_number (id1) from -- partition by field passed to row_number Function
(Select * from row_number_test distribute by id1 sort by id1, age desc);
If partition by has two fields:
Select row_number () over (partition by id1, id2 order by score) from row_number_test;
The hive statement should be:
Select row_number (id1, id2) -- pass the partition by field to the row_number Function
From (select * from row_number_test distripartition by id1, id2 sort by id1, id2, score);
The query results are as follows:
1.
Select id1, id2, age, score, name, row_number (id1) rn from (select * from row_number_test distribute by id1 sort by id1, age desc);
OK
2 t03 30 88.0 hongqu 1
2 t03 27 70.0 yongqi 2
2 t04 25 60.0 youlia 3
1 t0224 70.0 zengqiu 1
1 t0224 70.0 zengqiu 2
1 t01 20 85.0 liujiannan 3
1 t02 19 75.0 wangdong 4
2.
Select id1, id2, age, score, name, row_number (id1, id2) rn from (select * from row_number_test distripartition by id1, id2 sort by id1, id2, score);
OK
2 t04 25 60.0 youlia 1
1 t0224 70.0 zengqiu 1
2 t03 27 70.0 yongqi 1
1 t0224 70.0 zengqiu 2
1 t02 19 75.0 wangdong 3
1 t01 20 85.0 liujiannan 1
2 t03 30 88.0 hongqu 2
The following code only implements the evaluator method that receives one parameter and two parameters. You can copy more parameters to the Code. The Code is for reference only:
Package com. Hadoopbook. hive;
Import org.apache.hadoop.hive.ql.exe c. UDF;
Import org. apache. hadoop. hive. ql. udf. UDFType;
@ UDFType (deterministic = false)
Public class Row_number extends UDF {
Private static int MAX_VALUE = 50;
Private static String comparedColumn [] = new String [MAX_VALUE];
Private static int rowNum = 1;
Public int evaluate (Object... args ){
String columnValue [] = new String [args. length];
For (int I = 0; I
ColumnValue [I] = args [I]. toString ();
If (rowNum = 1)
{
For (int I = 0; I
ComparedColumn [I] = columnValue [I];
}
For (int I = 0; I
{
If (! ComparedColumn [I]. equals (columnValue [I])
{
For (int j = 0; j
{
ComparedColumn [j] = columnValue [j];
}
RowNum = 1;
Return rowNum ++;
}
}
Return rowNum ++;
}
Public static void main (String args [])
{
Row_number t = new Row_number ();
System. out. println (t. evaluate (123 ));
System. out. println (t. evaluate (123 ));
System. out. println (t. evaluate (123 ));
System. out. println (t. evaluate (1234 ));
System. out. println (t. evaluate (1234 ));
System. out. println (t. evaluate (1234 ));
System. out. println (t. evaluate (1235 ));
}
}
Hive details: click here
Hive: click here
Related reading:
Hadoop cluster-based Hive Installation
Differences between Hive internal tables and external tables
Hadoop + Hive + Map + reduce cluster installation and deployment
Install in Hive local standalone Mode
WordCount word statistics for Hive Learning