Use a user-defined function (UDF) in Hive to implement the row_number function of the analysis function.

Source: Internet
Author: User
Previously, the Department used the transform to implement row_number. I think it is more convenient to use it after UDF implementation, eliminating the hassle of the transform syntax.

Previously, the Department used the transform to implement row_number. I think it is more convenient to use it after UDF implementation, eliminating the hassle of the transform syntax.

Previously, the Department used the transform to implement row_number. I think it is more convenient to use it after UDF implementation, eliminating the hassle of the transform syntax.

The test table used is:

Hive> desc row_number_test;
OK
Id1 int
Id2 string
Age int
Score double
Name string

Hive> select * from row_number_test;
OK
2 t04 25 60.0 youlia
1 t01 20 85.0 liujiannan
1 t0224 70.0 zengqiu
2 t03 30 88.0 hongqu
2 t03 27 70.0 yongqi
1 t02 19 75.0 wangdong
1 t0224 70.0 zengqiu

Before using the SQL statement, you must partition and sort the data in the subquery. For example, the following SQL statement in Oracle:

Select row_number () over (partition by id1 order by age desc) from row_number_test;

The hive statement should be:

Select row_number (id1) from -- partition by field passed to row_number Function

(Select * from row_number_test distribute by id1 sort by id1, age desc);

If partition by has two fields:

Select row_number () over (partition by id1, id2 order by score) from row_number_test;

The hive statement should be:

Select row_number (id1, id2) -- pass the partition by field to the row_number Function

From (select * from row_number_test distripartition by id1, id2 sort by id1, id2, score);

The query results are as follows:

1.

Select id1, id2, age, score, name, row_number (id1) rn from (select * from row_number_test distribute by id1 sort by id1, age desc);

OK
2 t03 30 88.0 hongqu 1
2 t03 27 70.0 yongqi 2
2 t04 25 60.0 youlia 3
1 t0224 70.0 zengqiu 1
1 t0224 70.0 zengqiu 2
1 t01 20 85.0 liujiannan 3
1 t02 19 75.0 wangdong 4

2.

Select id1, id2, age, score, name, row_number (id1, id2) rn from (select * from row_number_test distripartition by id1, id2 sort by id1, id2, score);

OK
2 t04 25 60.0 youlia 1
1 t0224 70.0 zengqiu 1
2 t03 27 70.0 yongqi 1
1 t0224 70.0 zengqiu 2
1 t02 19 75.0 wangdong 3
1 t01 20 85.0 liujiannan 1
2 t03 30 88.0 hongqu 2

The following code only implements the evaluator method that receives one parameter and two parameters. You can copy more parameters to the Code. The Code is for reference only:

Package com. Hadoopbook. hive;

Import org.apache.hadoop.hive.ql.exe c. UDF;

Import org. apache. hadoop. hive. ql. udf. UDFType;

@ UDFType (deterministic = false)

Public class Row_number extends UDF {

Private static int MAX_VALUE = 50;

Private static String comparedColumn [] = new String [MAX_VALUE];

Private static int rowNum = 1;

Public int evaluate (Object... args ){

String columnValue [] = new String [args. length];

For (int I = 0; I

ColumnValue [I] = args [I]. toString ();

If (rowNum = 1)

{

For (int I = 0; I

ComparedColumn [I] = columnValue [I];

}

For (int I = 0; I

{

If (! ComparedColumn [I]. equals (columnValue [I])

{

For (int j = 0; j

{

ComparedColumn [j] = columnValue [j];

}

RowNum = 1;

Return rowNum ++;

}

}

Return rowNum ++;

}

Public static void main (String args [])

{

Row_number t = new Row_number ();

System. out. println (t. evaluate (123 ));

System. out. println (t. evaluate (123 ));

System. out. println (t. evaluate (123 ));

System. out. println (t. evaluate (1234 ));

System. out. println (t. evaluate (1234 ));

System. out. println (t. evaluate (1234 ));

System. out. println (t. evaluate (1235 ));

}

}

Hive details: click here
Hive: click here

Related reading:

Hadoop cluster-based Hive Installation

Differences between Hive internal tables and external tables

Hadoop + Hive + Map + reduce cluster installation and deployment

Install in Hive local standalone Mode

WordCount word statistics for Hive Learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.