Read HDFs files through the MapReduce program write HBase

Source: Internet
Author: User
1. Create a Maven project Pom file above eclipse as follows:
<project xmlns= "http://maven.apache.org/POM/4.0.0" xmlns:xsi= "Http://www.w3.org/2001/XMLSchema-instance" xsi: schemalocation= "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > < Modelversion>4.0.0</modelversion> <groupId>hadoop-hbase-maven</groupId> <artifactId>
        Hadoop-bahase-maven</artifactid> <version>0.0.1-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactid>hadoop-comm on</artifactid> <version>2.2.0</version> </dependency> <dependency&
            Gt
            <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.2.0</version> </dependency> <dependency> <groupid>org
 .apache.hadoop</groupid>           <artifactId>hadoop-client</artifactId> <version>2.2.0</version> &lt ;/dependency> <dependency> <groupId>org.apache.hbase</groupId> <ar
        Tifactid>hbase-client</artifactid> <version>1.0.3</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactid>hbase- server</artifactid> <version>1.0.3</version> </dependency> <depende Ncy> <groupId>org.apache.hadoop</groupId> &LT;ARTIFACTID&GT;HADOOP-HDFS&LT;/ARTIFAC
            tid> <version>2.2.0</version> </dependency> <dependency> <groupId>jdk.tools</groupId> <artifactId>jdk.tools</artifactId> <vers Ion>1.7</version&gT <scope>system</scope> <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath> <
                /dependency> </dependencies> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactid>maven-compiler-plugin</a
                    Rtifactid> <configuration> <source>1.7</source> <target>1.7</target> </configuration> </plugin> </plugin S> </build> </project>
The 2.java program is as follows:
Package com.lijie.hbase;
Import java.io.IOException;
Import Java.text.SimpleDateFormat;
Import Java.util.Calendar;
Import org.apache.hadoop.conf.Configuration;
Import org.apache.hadoop.hbase.client.Mutation;
Import Org.apache.hadoop.hbase.client.Put;
Import Org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
Import Org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
Import Org.apache.hadoop.hbase.mapreduce.TableReducer;
Import org.apache.hadoop.io.LongWritable;
Import org.apache.hadoop.io.NullWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

Import Org.apache.hadoop.mapreduce.lib.input.TextInputFormat; public class Hadoop2hbase {@SuppressWarnings ("deprecation") public static void main (string[] args) throws Except
        Ion {Configuration conf = new configuration (); Conf.set ("HBase.Zookeeper.quorum "," Lijie ");

        Conf.set (tableoutputformat.output_table, "T1");
        Job Job = new Job (conf, "hadoop2hbase");
        Tablemapreduceutil.adddependencyjars (Job);

        Job.setjarbyclass (Hadoop2hbase.class);
        Job.setmapperclass (Hbasemapper.class);

        Job.setreducerclass (Hbasereducer.class);
        Job.setmapoutputkeyclass (Longwritable.class);

        Job.setmapoutputvalueclass (Text.class);
        Job.setinputformatclass (Textinputformat.class);

        Job.setoutputformatclass (Tableoutputformat.class);
        Fileinputformat.setinputpaths (Job, "hdfs://192.168.80.123:9000/mytest/*");
    Job.waitforcompletion (TRUE); } static class Hbasemapper extends Mapper<longwritable, Text, longwritable, text> {@Overri De protected void Map (longwritable key, text value, mapper<longwritable, text, longwritable, Te Xt>. Context context) throws IOException, Interruptedexception {
            SimpleDateFormat SDF = new SimpleDateFormat ("Yyyymmddhhmmss");
            string[] split = Value.tostring (). Split ("\ t"); 
                            Context.write (Key, New Text (Split[0]+sdf.format (Calendar.getinstance (). GetTime ())
        + "\ T" + value.tostring ()));
        }} static Class Hbasereducer extends Tablereducer<longwritable, Text, nullwritable> {
                @Override protected void reduce (longwritable key, iterable<text> values, Reducer<longwritable, Text, nullwritable, Mutation>
                Context context) throws IOException, interruptedexception {for (Text text:values) {
                string[] split = Text.tostring (). Split ("\ t");
                Put put = new put (split[0].getbytes ()); Put.addcolumn ("CF". GetBytes (), "Onecolumn". GetBytes (), text. toString (). GetbYtes ());
                Put.addcolumn ("CF". GetBytes (), "id". GetBytes (), split[1].getbytes ());
                Put.addcolumn ("CF". GetBytes (), "name". GetBytes (), split[2].getbytes ());
                Put.addcolumn ("CF". GetBytes (), "Age". GetBytes (), split[3].getbytes ());
                Put.addcolumn ("CF". GetBytes (), "addr". GetBytes (), split[4].getbytes ());
            Context.write (Nullwritable.get (), put);
 }
        }
    }
}
3. Create a table on HBase T1
HBase (main):001:0> create ' t1 ', ' CF '
0 row (s) in 0.7700 seconds
4. Simulate a file upload to HDFs
1001    Lijie  shengzhen
1002    zhangsan
30 Chongqing 1003    Lisi  Shanghai
5. Execute the program and view the T1 table information
HBase (main):002:0> scan ' t1 ' ROW Column+cell 100120170209144933 column =cf:addr, timestamp=1486651412673, Value=shengzhen 10012                                                                         
 0170209144933 column=cf:age, timestamp=1486651412673, value=24 100120170209144933 Column=cf:id, timestamp=1486651412673, value =1001 100120170209144933 Co Lumn=cf:name, timestamp=1486651412673, Value=lijie 1 00120170209144933 Column=cf:onecolumn, timestamp=1486651412673, value=100120170209144933\x091001\x09 Lijie\x0924\x09shengzhen               
 100220170209144933 column=cf:addr, timestamp=1486651412673, value=chongqing 100220170209144933 Column=cf:age, Timesta                       mp=1486651412673, value=25 100220170209144933                                                                        
 Column=cf:id, timestamp=1486651412673, value=1002                                                                  
 100220170209144933 Column=cf:name, timestamp=1486651412673, Value=zhangsan 100220170209144933 Column=cf:onecolum N, timestamp=1486651412673, value=100220170209144933\x091002\x09zhangsan\x0925\x09chongqing 10032017020914493                                                                  
3 column=cf:addr, timestamp=1486651412673, Value=shanghai 100320170209144933 column=cf:age, timestamp=1486651412673, value=30  100320170209144933 Column=cf:id, timestamp=1486651412673,                       value=1003 100320170209144933                                                                      
 Column=cf:name, timestamp=1486651412673, Value=lisi 100320170209144933 Column=cf:onecolumn, timestamp=1486651412673, value=100320170209144933\x0910
 03\x09lisi\x0930\x09shanghai 3 Row (s) in 0.1400 seconds

A simple demo to share.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.