Hadoop generation cluster Run code case

Source: Internet
Author: User
Keywords Nbsp; dfs group transport value

Cluster a master, two Slave,ip respectively are 192.168.1.2, 192.168.1.3, 192.168.1.4&http://www.aliyun.com/zixun/aggregation/37954.             HTML >nbsp; The Hadoop version is 1.2.1

First, start Hadoop

Into the Hadoop bin directory


Second, the establishment of data files, and upload to HDFs

1, in the file directory for/home/hadoop under the establishment of folder file, and in file to establish a document HADOOP_02

Cd/home/hadoop

mkdir file

CD file


2. Write Data:


The data format is:

2012-3-1 A

2012-3-2 b

2012-3-3 C

2012-3-4 D

2012-3-5 A

2012-3-6 b

2012-3-7 C

2012-3-3 C

You can iterate and paste the data so that the amount of data is much more.

(Learn Hadoop have no data to do?) Nutch grasping, paid software crawl, simulate generation according to need,,

3, Upload HDFs

(1), HDFs if there is no input directory, create a

Hadoop Fs–mkdir Input

(2), view HDFs files

Hadoop Fs–ls

(3), upload hadoop_02 to input

Hadoop fs–put~/file/hadoop_02 Input

(4), view the input file

Hadoop fs–ls Input


4, to view the eclipse has just uploaded to the HDFs file hadoop_02, the contents are as follows:


5, create MapReduce project, write code:


The data goes to the code as follows:

Import java.io.IOException;

Import org.apache.hadoop.conf.Configuration;

Import Org.apache.hadoop.fs.Path;

Import org.apache.hadoop.io.IntWritable;

Import Org.apache.hadoop.io.Text;

Import Org.apache.hadoop.mapreduce.Job;

Import Org.apache.hadoop.mapreduce.Mapper;

Import Org.apache.hadoop.mapreduce.Reducer;

Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import Org.apache.hadoop.util.GenericOptionsParser;

public class Dedup {

Map copies the value of the input to the key of the output data and outputs directly

public static class Map extends mapper<object,text,text,text>{

private static text line=new text ()//data per line

Implementing the Map function

public void Map (Object key,text value,context context)

Throws ioexception,interruptedexception{

Line=value;

Context.write (line, New Text (""));

}

}

Reduce copies the key in the input to the key of the output data and outputs directly

public static class Reduce extends reducer<text,text,text,text>{

Implement the Reduce function

public void reduce (Text key,iterable<text> values,context context)

Throws ioexception,interruptedexception{

Context.write (Key, New Text (""));

}

}

public static void Main (string] args) throws exception{

Revisit conf = new revisit ();

That's a key word.

Conf.set ("Mapred.job.tracker", "192.168.1.2:9001");

String] ioargs=new string[]{"dedup_in", "Dedup_out"};

String] Otherargs = new Genericoptionsparser (conf,

Ioargs). Getremainingargs ();

if (otherargs.length!= 2) {

System.err.println ("Usage:data deduplication <in> <out>");

System.exit (2);

}

Job Job = new Job (conf, "Data deduplication");

Job.setjarbyclass (Dedup.class);

Set up the map, combine, and reduce processing classes

Job.setmapperclass (Map.class);

Job.setcombinerclass (Reduce.class);

Job.setreducerclass (Reduce.class);

Set Output Type

Job.setoutputkeyclass (Text.class);

Job.setoutputvalueclass (Text.class);

Setting up the input and output directories

Fileinputformat.addinputpath (Job, New Path (otherargs[0));

Fileoutputformat.setoutputpath (Job, New Path (otherargs[1));

System.exit (Job.waitforcompletion (true)? 0:1);

}

}

6, run the code

Right Key Item class


Set the input output HDFs path


The console output section is as follows:


Look at the hadoop_22 file in output, the results are as follows:


7. Turn off Hadoop


This completes the code.

Original link: http://www.cnblogs.com/baolibin528/p/4004707.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.