Hadoop generation cluster Running code case

Source: Internet
Author: User
Tags mkdir static class hadoop fs

Hadoop generation cluster running code case

Cluster a master, two slave,ip are 192.168.1.2, 192.168.1.3, 192.168.1.4 Hadoop version is 1.2.1

First, start Hadoop

go to the bin directory of Hadoop


second, the establishment of data files, and upload to HDFs

1, in the file directory for the/home/hadoop folder file, and the file inside to establish the files hadoop_02

Cd/home/hadoop

mkdir file

CD File


2. Write Data:


The data format is:

2012-3-1 A

2012-3-2 b

2012-3-3 C

2012-3-4 D

2012-3-5 A

2012-3-6 b

2012-3-7 C

2012-3-3 C

you can iterate and paste the data so that the amount of data is much more

(Learn how Hadoop does not have data.) Nutch capture, pay-per-use software capture, simulate generation as needed, and

3. Uploading HDFs

(1), HDFs if there is no input directory, create a

Hadoop fs–mkdir Input

(2), view HDFs file

Hadoop fs–ls

(3), upload the hadoop_02 to input

Hadoop fs–put~/file/hadoop_02 Input

(4), view input file

Hadoop fs–ls Input


4. View the file hadoop_02 that was just uploaded to HDFs in eclipse as follows:


5. Create a MapReduce project and write code:


The data deduplication code is as follows:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import Org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import Org.apache.hadoop.io.Text;

import Org.apache.hadoop.mapreduce.Job;

import Org.apache.hadoop.mapreduce.Mapper;

import Org.apache.hadoop.mapreduce.Reducer;

import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import Org.apache.hadoop.util.GenericOptionsParser;

Public class Dedup {

The //map copies the value in the input to the key of the output data and outputs it directly

Public Static class Map extends mapper<object,text,text,text>{

private static text line=new text ();//data per row

//Implement the map function

Public void Map (Object key,text value,context Context)

throws ioexception,interruptedexception{

Line=value;

Context.write (line, New Text (""));

}

}

The //reduce copies the key from the input to the key of the output data and outputs it directly

Public Static class Reduce extends reducer<text,text,text,text>{

//Implement the Reduce function

Public void Reduce (Text key,iterable<text> values,context Context)

throws ioexception,interruptedexception{

Context.write (Key, New Text (""));

}

}

Public static void Main (string[] args) throws exception{

Configuration conf = new configuration ();

//This sentence is critical

conf.set ("Mapred.job.tracker", "192.168.1.2:9001");

string[] ioargs=new string[]{"dedup_in", "Dedup_out"};

string[] Otherargs = new Genericoptionsparser (conf,

Ioargs). Getremainingargs ();

if (otherargs.length! = 2) {

System.err.println ("Usage:data deduplication <in> <out>");

System.exit (2);

    }

    

Job Job = new Job (conf, "Data deduplication");

Job.setjarbyclass (dedup.class);

    

//Set map, combine, and reduce processing classes

Job.setmapperclass (map.class);

Job.setcombinerclass (reduce.class);

Job.setreducerclass (reduce.class);

    

//Set Output type

Job.setoutputkeyclass (text.class);

Job.setoutputvalueclass (text.class);

//Set input and output directories

Fileinputformat.addinputpath (Job, New Path (Otherargs[0]));

Fileoutputformat.setoutputpath (Job, New Path (Otherargs[1]));

System.exit (Job.waitforcompletion (true)? 0:1);

  }

}

6. Running Code

Right-click Project Class


setting the input-output HDFs path


The console output section is as follows:


to view hadoop_22 files in output, the results are as follows:


7. Turn off Hadoop


This completes the code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.