Implementation of udf row_number in Hive and problems encountered

Source: Internet
Author: User

Implementation of udf row_number in Hive and problems encountered

Add row_number for each piece of hive data. First, add the row number. You must consider that the data must be put in a reduce for execution. Code first

Package xx. xxxxx. hive. udf;

Import org.apache.Hadoop.hive.ql.exe c. UDF;
Import org. apache. hadoop. hive. ql. udf. UDFType;

@ UDFType (deterministic = false)
Public class RowNumber extends UDF {
Private static int MAX_VALUE = 50;
Private static String comparedColumn [] = new String [MAX_VALUE];
Private static int rowNum = 1;

Public int evaluate (Object... args ){
String columnValue [] = new String [args. length];
For (int I = 0; I <args. length; I ++ ){
ColumnValue [I] = args [I]. toString ();
}
If (rowNum = 1 ){
For (int I = 0; I <columnValue. length; I ++)
ComparedColumn [I] = columnValue [I];
}

For (int I = 0; I <columnValue. length; I ++ ){
If (! ComparedColumn [I]. equals (columnValue [I]) {
For (int j = 0; j <columnValue. length; j ++ ){
ComparedColumn [j] = columnValue [j];
}
RowNum = 1;
Return rowNum ++;
}
}
Return rowNum ++;
}
}

Package the jar package and create a function.

Add jar/home/hdbatch/jars/iclickhiveudf. jar;
Create temporary function row_number as 'cn. iclick. hive. udf. rownumber ';

However, note that if I want to mark the row number and two SQL statements for the data in a table,

Create table test_tony as select row_number (1), tid from (select distinct tid from cookie where I _date = 20131105) t order by tid;

The preceding statement marks a row number error and generates 11 reduce tasks. Therefore, 11 rows with the same number are generated. Therefore, an error occurs. Why is there a different explanation ??? Check the explain SQL statement because of the predicate push-down error encountered when writing a non-deterministic UDF.
For details, see:

For more details, please continue to read the highlights on the next page:

Hive details: click here
Hive: click here

Hadoop cluster-based Hive Installation

Differences between Hive internal tables and external tables

Hadoop + Hive + Map + reduce cluster installation and deployment

Install in Hive local standalone Mode

WordCount word statistics for Hive Learning

  • 1
  • 2
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.