Today the Hadoop authoritative Guide Weather Data sample code runs through the Hadoop cluster and records it.
Before the Baidu/google how also did not find how to map-reduce way to run in the cluster every step of the specific description, after a painful headless fly-style groping, success, a good mood ...
1 Preparing the Weather forecast data (simplified version of the data on the authoritative guide 5-9 for year,15-19 to temperature)
aaaaa1990aaaaaa0039a
bbbbb1991bbbbbb0040a
ccccc1992cccccc0040c
ddddd1993dddddd0043d
eeeee1994eeeeee0041e
aaaaa1990aaaaaa0031a
bbbbb1991bbbbbb0020a
ccccc1992cccccc0030c
ddddd1993dddddd0033d
eeeee1994eeeeee0031e
aaaaa1990aaaaaa0041a
bbbbb1991bbbbbb0040a
ccccc1992cccccc0040c
ddddd1993dddddd0043d
eeeee1994eeeeee0041e
aaaaa1990aaaaaa0044a
bbbbb1991bbbbbb0045a
ccccc1992cccccc0041c
ddddd1993dddddd0023d
eeeee1994eeeeee0041e
2 writing map-reduce functions and scheduling functions (JOB)
Simple point: The following
Package hadoop.test;
Import java.io.
Import Org.apache.hadoop.fs.Path;
Import org.apache.hadoop.io.IntWritable;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapreduce.Job;
Import Org.apache.hadoop.mapreduce.Mapper;
Import Org.apache.hadoop.mapreduce.Reducer;
Import Org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
Import Org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Maxtemperature {
Static class Maxtemperaturemapper extends Mapper<longwritable, text, text, intwritable>
{
private static final int MISSING = 9999;
public void Map (longwritable key, Text value, Context conext) throws IOException, Interruptedexception
{
String line = value.tostring ();
String year = line.substring (5, 9); The data you prepare is a simplified version of the weather forecast data.
int airtemperature = Integer.parseint (line.substring (15, 19)); The data you prepare is a simplified version of the weather forecast data.
if (airtemperature! = MISSING)
{
Conext.write (new Text (year), New Intwritable (airtemperature));
}
}
}
Static class Maxtemperaturereducer extends Reducer<text,intwritable,text,intwritable>
{
public void reduce (Text key, iterable<intwritable> Values,context Context) throws IOException, Interruptedexception
{
int maxValue = Integer.min_value;
for (intwritable value:values)
{
MaxValue = Math.max (MaxValue, Value.get ());
}
Context.write (Key, New Intwritable (MaxValue));
}
}
/**
* @param args
*/
public static void Main (string[] args) {
TODO auto-generated Method Stub
if (args.length! = 2)
{
System.err.println ("Usage:maxtemperature <input path> <output path>");
System.exit (-1);
}
try {
Job Job = new Job ();
Job.setjarbyclass (Maxtemperature.class);
Fileinputformat.addinputpath (Job, New Path (Args[0]));
Fileoutputformat.setoutputpath (Job, New Path (Args[1]));
Job.setmapperclass (Maxtemperaturemapper.class);
Job.setreducerclass (Maxtemperaturereducer.class);
Job.setoutputkeyclass (Text.class);
Job.setoutputvalueclass (Intwritable.class);
System.exit (Job.waitforcompletion (true)? 0:1);
} catch (IOException e) {
TODO auto-generated Catch block
E.printstacktrace ();
}catch (ClassNotFoundException e)
{
E.printstacktrace ();
}catch (interruptedexception e)
{
E.printstacktrace ();
}
}
}
3 Package The second-step code into a Hadooptest.jar directory, such as/home/hadoop/documents/
Then export hadoop_classpath=/home/hadoop/documents/
(Choose MainClass when packaging, do not choose as if there is an error when executing, Eclipse's export option has the MainClass option
Otherwise: the MainClass class name including the package path needs to be specified after ***.jar when running the Hadoop jar command
For example, Hadoop jar/home/hadoop/documents/hadooptest.jar hadoop.test.maxtemperature/user/hadoop/temperature output
)
4 data that will be analyzed is sent to HDFs
Hadoop dfs-put/home/hadoop/documents/temperature./temperature
5 Start execution
Hadoop jar/home/hadoop/documents/hadooptest.jar/user/hadoop/temperature Output
It's not exactly the same as the order in the book, but he is referring to the local way, and not knowing what the export hadoop_classpath=/home/hadoop/documents/is for, executing the HADOOP jar Hadooptest.jar/ User/hadoop/temperature output is no drop, specifically why, continue to explore it, first of all.
Here Hadooptest.jar is locally, the data file to be analyzed is temperature on hdfs, and the output generated is on HDFS, and outputs is a folder
hadoop@hadoop1:~$ Hadoop dfs-cat./output/part-r-00000
1990 44
1991 45
1992 41
1993 43
1994 41