Generate a unique ID for each line of a big data file

Last Update:2015-06-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

generate a unique ID for each line of a big data file

4 Main ideas:

1 single Thread processing

2 Common multithreading

3 Hive

4 Hadoop

Search for some references

Hadoop in Action notes-2, Hadoop input and output

https://book.douban.com/annotation/17068812/

Textinputformat : File offset : Entire row of data

But this offset, it seems, is in the offset of a file, not the global.

Generate auto-increment Id in Map-reducejob

http://shzhangji.com/blog/2013/10/31/generate-auto-increment-id-in-map-reduce-job/

Generate Unique customer Id/insert uniquerows in hive

Http://stackoverflow.com/questions/26855003/generate-unique-customer-id-insert-unique-rows-in-hive

Need to add auto increment column in atable using hive

Http://stackoverflow.com/questions/23082763/need-to-add-auto-increment-column-in-a-table-using-hive

https://hadooptutorial.info/writing-custom-udf-in-hive-auto-increment-column-hive/

Here make sure this addition of annotation@UDFType (stateful = true) is required otherwisecounter value would not Get increment in the Hive column, it'll just returnvalue 1 for all the rows and not the actual row number.

Finally I took the scheme of writing UDF with Hive.

Package hive.udf;/** * Licensed to the Apache software Foundation (ASF) under one * or more contributor license agreements  .  See the NOTICE file * Distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * under the Apache License, Version 2.0 (The * "License");  You are not a use of this file except in compliance * with the License. Obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * unless required by applicab Le law or agreed into writing, software * Distributed under the License is distributed on a "as is" BASIS, * without WAR Ranties or CONDITIONS of any KIND, either express OR implied. * See the License for the specific language governing permissions and * limitations under the License. */import Org.apache.hadoop.hive.ql.exec.description;import Org.apache.hadoop.hive.ql.exec.udf;import org.apache.hadoop.hive.ql.udf.udftype;/** * udfrowsequence. */@Description (name = "Row_sequencE ", value =" _func_ ()-Returns a generated row sequence number starting from 1 ") @UDFType (deterministic = False, Statef  UL = true) The//stateful parameter is necessary for public class Udfrowsequence extends udf{private int result;  Public udfrowsequence () {result=0;    } public int Evaluate () {result++;  return result; }}//End Udfrowsequence.java

This article linger

This article link: http://blog.csdn.net/lingerlanlan/article/details/46430747

Generate a unique ID for each line of a big data file

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Generate a unique ID for each line of a big data file

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support