Implementation of udf row_number in Hive and problems encountered
Add row_number for each piece of hive data. First, add the row number. You must consider that the data must be put in a reduce for execution. Code first
Package xx. xxxxx. hive. udf;
Import org.apache.Hadoop.hive.ql.exe c. UDF;
Import org. apache. hadoop. hive. ql. udf. UDFType;
@ UDFType (deterministic = false)
Public class RowNumber extends UDF {
Private static int MAX_VALUE = 50;
Private static String comparedColumn [] = new String [MAX_VALUE];
Private static int rowNum = 1;
Public int evaluate (Object... args ){
String columnValue [] = new String [args. length];
For (int I = 0; I <args. length; I ++ ){
ColumnValue [I] = args [I]. toString ();
}
If (rowNum = 1 ){
For (int I = 0; I <columnValue. length; I ++)
ComparedColumn [I] = columnValue [I];
}
For (int I = 0; I <columnValue. length; I ++ ){
If (! ComparedColumn [I]. equals (columnValue [I]) {
For (int j = 0; j <columnValue. length; j ++ ){
ComparedColumn [j] = columnValue [j];
}
RowNum = 1;
Return rowNum ++;
}
}
Return rowNum ++;
}
}
Package the jar package and create a function.
Add jar/home/hdbatch/jars/iclickhiveudf. jar;
Create temporary function row_number as 'cn. iclick. hive. udf. rownumber ';
However, note that if I want to mark the row number and two SQL statements for the data in a table,
Create table test_tony as select row_number (1), tid from (select distinct tid from cookie where I _date = 20131105) t order by tid;
The preceding statement marks a row number error and generates 11 reduce tasks. Therefore, 11 rows with the same number are generated. Therefore, an error occurs. Why is there a different explanation ??? Check the explain SQL statement because of the predicate push-down error encountered when writing a non-deterministic UDF.
For details, see:
For more details, please continue to read the highlights on the next page:
Hive details: click here
Hive: click here
Hadoop cluster-based Hive Installation
Differences between Hive internal tables and external tables
Hadoop + Hive + Map + reduce cluster installation and deployment
Install in Hive local standalone Mode
WordCount word statistics for Hive Learning