Hadoop does not use HDFS in stand-alone mode, nor does it open any Hadoop daemons, and all programs run on one JVM and allow up to one reducer
Create a new Hadoop-test Java project in eclipse (especially if Hadoop requires 1.6 or more versions of JDK 1.6)
Download hadoop-1.2.1.tar.gz on Hadoop's official website http://apache.fayea.com/apache-mirror/hadoop/common/
Unzip hadoop-1.2.1.tar.gz get hadoop-1.2.1 directory
Import the jar packages under the hadoop-1.2.1 directory and the Hadoop-1.2.1\lib directory into the Hadoop-test project
Next you write a MapReduce program (the program is used to count monthly balances)
MAP:
Import java.io.IOException;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapred.MapReduceBase;
Import Org.apache.hadoop.mapred.Mapper;
Import Org.apache.hadoop.mapred.OutputCollector;
Import Org.apache.hadoop.mapred.Reporter;
public class Mapbus extends Mapreducebase
implements Mapper<longwritable, text, text, longwritable> {
@ Override public
void Map (longwritable key, Text date,
outputcollector<text, longwritable> output,
Reporter Reporter) throws IOException {
//2013-01-11,-200
String line = date.tostring ();
if (Line.contains (",")) {
string[] tmp = Line.split (",");
String month = tmp[0].substring (5, 7);
int money = integer.valueof (tmp[1]). Intvalue ();
Output.collect (New Text (month), New longwritable (Money));}}}
Reduce:
Import java.io.IOException;
Import Java.util.Iterator;
Import org.apache.hadoop.io.LongWritable;
Import Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapred.MapReduceBase;
Import Org.apache.hadoop.mapred.OutputCollector;
Import Org.apache.hadoop.mapred.Reducer;
Import Org.apache.hadoop.mapred.Reporter;
public class Reducebus extends Mapreducebase
implements Reducer<text, Longwritable, Text, longwritable> {
@Override public
void Reduce (Text month, iterator<longwritable>-money,
Outputcollector<text , longwritable> output, Reporter Reporter)
throws IOException {
int total_money = 0;
while (Money.hasnext ()) {
Total_money + = Money.next (). get ()
;
Output.collect (Month, New Longwritable (Total_money));
}
}
Main:
import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import
Org.apache.hadoop.io.Text;
Import Org.apache.hadoop.mapred.FileInputFormat;
Import Org.apache.hadoop.mapred.FileOutputFormat;
Import org.apache.hadoop.mapred.JobClient;
Import org.apache.hadoop.mapred.JobConf; public class Wallet {public static void main (string[] args) {if (args.length! = 2) {System.err.println ("param error!
");
System.exit (-1);
} jobconf jobconf = new jobconf (wallet.class);
Jobconf.setjobname ("My Wallet");
Fileinputformat.addinputpath (jobconf, New Path (Args[0]));
Fileoutputformat.setoutputpath (jobconf, New Path (Args[1]));
Jobconf.setmapperclass (Mapbus.class);
Jobconf.setreducerclass (Reducebus.class);
Jobconf.setoutputkeyclass (Text.class);
Jobconf.setoutputvalueclass (Longwritable.class);
try{jobclient.runjob (jobconf);
}catch (Exception e) {e.printstacktrace (); }
}
}
Also need to prepare the file to be analyzed, create 2 files under the E:\cygwin_root\home\input path, one file name is: 2013-01.txt, another file name is: 2013-02.txt
2013-01.txt:
2013-01-01,100
2013-01-02,-100
2013-01-07,100
2013-01-10,-100
2013-01-11,100
2013-01-21,-
2013-01-22,100
2013-01-25,-100
2013-01-27,100
2013-01-18,-100
2013-01-09,500
2013-02.txt:
2013-02-01,100
Once you have set the operating parameters, you can run the MapReduce program through the Java application, run as.
Java.io.IOException:Failed to set permissions of path:
\tmp\hadoop-linkage\mapred\staging\linkage1150562408\. Staging to 0700
The main reason for this error is that the later version of Hadoop adds a checksum to the file path, which is easier to modify and replace Hadoop-core-1.2.1.jar with Hadoop-0.20.2-core.jar.
The following is a log that the MapReduce program prints when it runs
14/02/11 10:54:16 INFO JVM. Jvmmetrics:initializing JVM Metrics with Processname=jobtracker, sessionid= 14/02/11 10:54:16 WARN mapred. Jobclient:use Genericoptionsparser for parsing the arguments.
Applications should implement Tool for the same. 14/02/11 10:54:16 WARN mapred. Jobclient:no job jar file set. User classes May is not found.
See jobconf (Class) or Jobconf#setjar (String). 14/02/11 10:54:16 INFO mapred. Fileinputformat:total input paths to process:2 14/02/11 10:54:17 INFO mapred. Jobclient:running job:job_local_0001 14/02/11 10:54:17 INFO mapred. Fileinputformat:total input paths to process:2 14/02/11 10:54:17 INFO mapred. Maptask:numreducetasks:1 14/02/11 10:54:17 INFO mapred. MAPTASK:IO.SORT.MB = 14/02/11 10:54:17 INFO mapred. Maptask:data buffer = 79691776/99614720 14/02/11 10:54:17 INFO mapred. Maptask:record buffer = 262144/327680 14/02/11 10:54:17 INFO mapred. maptask:starting flush of map output 14/02/11 10:54:18 INFO mapred. Maptask:finished spill 0 14/02/11 10:54:18 INFO mapred. TaskRunner:Task:attempt_local_0001_m_000000_0 is done. and is in the process of commiting 14/02/11 10:54:18 INFO mapred. localjobrunner:file:/e:/cygwin_root/home/input/2013-01.txt:0+179 14/02/11 10:54:18 INFO mapred.
Taskrunner:task ' Attempt_local_0001_m_000000_0 ' done. 14/02/11 10:54:18 INFO mapred. Maptask:numreducetasks:1 14/02/11 10:54:18 INFO mapred. MAPTASK:IO.SORT.MB = 14/02/11 10:54:18 INFO mapred. Maptask:data buffer = 79691776/99614720 14/02/11 10:54:18 INFO mapred. Maptask:record buffer = 262144/327680 14/02/11 10:54:18 INFO mapred. maptask:starting flush of map output 14/02/11 10:54:18 INFO mapred. Maptask:finished spill 0 14/02/11 10:54:18 INFO mapred. TaskRunner:Task:attempt_local_0001_m_000001_0 is done. and is in the process of commiting 14/02/11 10:54:18 INFO mapred. localjobrunner:file:/e:/cygwin_root/home/input/2013-02.txt:0+16 14/02/11 10:54:18 INFO mapred.
Taskrunner:task ' attempt_local_0001_m_000001_0 ' done. 14/02/11 10:54:18 INFO MapreD.LOCALJOBRUNNER:14/02/11 10:54:18 INFO mapred. Merger:merging 2 sorted segments 14/02/11 10:54:18 INFO mapred. Merger:down to the last Merge-pass, with 2 segments left of total size:160 bytes 14/02/11 10:54:18 INFO mapred. LOCALJOBRUNNER:14/02/11 10:54:18 INFO mapred. TaskRunner:Task:attempt_local_0001_r_000000_0 is done. and is in the process of commiting 14/02/11 10:54:18 INFO mapred. LOCALJOBRUNNER:14/02/11 10:54:18 INFO mapred. Taskrunner:task Attempt_local_0001_r_000000_0 is allowed to commit now 14/02/11 10:54:18 INFO mapred. fileoutputcommitter:saved output of Task ' attempt_local_0001_r_000000_0 ' to File:/e:/cygwin_root/home/output 14/02/11 10:54:18 INFO mapred. Localjobrunner:reduce > Reduce 14/02/11 10:54:18 INFO mapred.
Taskrunner:task ' Attempt_local_0001_r_000000_0 ' done. 14/02/11 10:54:18 INFO mapred. Jobclient:map 100% reduce 100% 14/02/11 10:54:18 INFO mapred. Jobclient:job complete:job_local_0001 14/02/11 10:54:18 INFO mapred. Jobclient:counters:13 14/02/11 10:54:18 INFO mapred. Jobclient:filesystemcounters 14/02/11 10:54:18 INFO mapred. jobclient:file_bytes_read=39797 14/02/11 10:54:18 INFO mapred. jobclient:file_bytes_written=80473 14/02/11 10:54:18 INFO mapred. Jobclient:map-reduce Framework 14/02/11 10:54:18 INFO mapred. Jobclient:reduce input groups=2 14/02/11 10:54:18 INFO mapred. Jobclient:combine output records=0 14/02/11 10:54:18 INFO mapred. Jobclient:map input records=12 14/02/11 10:54:18 INFO mapred. Jobclient:reduce Shuffle bytes=0 14/02/11 10:54:18 INFO mapred. Jobclient:reduce output records=2 14/02/11 10:54:18 INFO mapred. jobclient:spilled records=24 14/02/11 10:54:18 INFO mapred. Jobclient:map output bytes=132 14/02/11 10:54:18 INFO mapred. Jobclient:map input bytes=195 14/02/11 10:54:18 INFO mapred. Jobclient:combine input records=0 14/02/11 10:54:18 INFO mapred. Jobclient:map output records=12 14/02/11 10:54:18 INFO mapred.
Jobclient:reduce input records=12
After the run completes, 2 files are generated under the E:\cygwin_root\home\output path:. PART-00000.CRC and part-00000. PART-00000.CRC is a one or two binary file, is an internal file that holds the checksum of the part-00000 file; The final statistics are saved in the part-00000 file.
100
It is important to note that the output path must be deleted before each run, otherwise it will be reported
Org.apache.hadoop.mapred.FileAlreadyExistsException:
Output directory File:/e:/cygwin_root/home/output already exists
Hadoop does this check in order to avoid the last time the MapReduce program was not completed, the intermediate file generated by running the MapReduce program again overwrites the intermediate file generated by the last run.