Hadoop generation cluster Running code case

Last Update:2018-07-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop generation cluster running code case

Cluster a master, two slave,ip are 192.168.1.2, 192.168.1.3, 192.168.1.4 Hadoop version is 1.2.1

First, start Hadoop

go to the bin directory of Hadoop

second, the establishment of data files, and upload to HDFs

1, in the file directory for the/home/hadoop folder file, and the file inside to establish the files hadoop_02

Cd/home/hadoop

mkdir file

CD File

2. Write Data:

The data format is:

2012-3-1 A

2012-3-2 b

2012-3-3 C

2012-3-4 D

2012-3-5 A

2012-3-6 b

2012-3-7 C

2012-3-3 C

you can iterate and paste the data so that the amount of data is much more

(Learn how Hadoop does not have data.) Nutch capture, pay-per-use software capture, simulate generation as needed, and

3. Uploading HDFs

(1), HDFs if there is no input directory, create a

Hadoop fs–mkdir Input

(2), view HDFs file

Hadoop fs–ls

(3), upload the hadoop_02 to input

Hadoop fs–put~/file/hadoop_02 Input

(4), view input file

Hadoop fs–ls Input

4. View the file hadoop_02 that was just uploaded to HDFs in eclipse as follows:

5. Create a MapReduce project and write code:

The data deduplication code is as follows:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import Org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import Org.apache.hadoop.io.Text;

import Org.apache.hadoop.mapreduce.Job;

import Org.apache.hadoop.mapreduce.Mapper;

import Org.apache.hadoop.mapreduce.Reducer;

import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import Org.apache.hadoop.util.GenericOptionsParser;

Public class Dedup {

The //map copies the value in the input to the key of the output data and outputs it directly

Public Static class Map extends mapper<object,text,text,text>{

private static text line=new text ();//data per row

//Implement the map function

Public void Map (Object key,text value,context Context)

throws ioexception,interruptedexception{

Line=value;

Context.write (line, New Text (""));

}

The //reduce copies the key from the input to the key of the output data and outputs it directly

Public Static class Reduce extends reducer<text,text,text,text>{

//Implement the Reduce function

Public void Reduce (Text key,iterable<text> values,context Context)

throws ioexception,interruptedexception{

Context.write (Key, New Text (""));

}

Public static void Main (string[] args) throws exception{

Configuration conf = new configuration ();

//This sentence is critical

conf.set ("Mapred.job.tracker", "192.168.1.2:9001");

string[] ioargs=new string[]{"dedup_in", "Dedup_out"};

string[] Otherargs = new Genericoptionsparser (conf,

Ioargs). Getremainingargs ();

if (otherargs.length! = 2) {

System.err.println ("Usage:data deduplication <in> <out>");

System.exit (2);

}

Job Job = new Job (conf, "Data deduplication");

Job.setjarbyclass (dedup.class);

//Set map, combine, and reduce processing classes

Job.setmapperclass (map.class);

Job.setcombinerclass (reduce.class);

Job.setreducerclass (reduce.class);

//Set Output type

Job.setoutputkeyclass (text.class);

Job.setoutputvalueclass (text.class);

//Set input and output directories

Fileinputformat.addinputpath (Job, New Path (Otherargs[0]));

Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1]));

System.exit (Job.waitforcompletion (true)? 0:1);

}

6. Running Code

Right-click Project Class

setting the input-output HDFs path

The console output section is as follows:

to view hadoop_22 files in output, the results are as follows:

7. Turn off Hadoop

This completes the code.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop generation cluster Running code case

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support