---restore content starts---
Configuring MapReduce requires configuring two XML files on top of previous configurations one is the Yarn-site.xml one is Mapred-site.xml, which can be found under the ETC directory of the previously configured Hadoop
The configuration process below first
1, Configuration Yarn-site.xml
<configuration><!--Site Specific YARN configuration Properties--><property><name> yarn.resourcemanager.hostname</name><value>192.168.98.141</value></property>< Property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value> </property></configuration>
It is important to explain that yarn's basic idea is to separate the two main functions of jobtracker (Resource management and job scheduling/monitoring), The primary method is to create a global ResourceManager (RM) and several applicationmaster (AM) for the application. The application here refers to the traditional mapreduce job or the Dag of the job, in fact yarn similar to understand as Tomcat, on the Web project has tomcat this platform. Yarn is the same, the essence of yarn layered structure is ResourceManager. This entity controls the entire cluster and manages the allocation of the application to the underlying computing resources. Resourcemannager assign these resources to the NodeManager (yarn proxy node).
The first value configuration property is the host number that corresponds to the IP of your system configuration
Configure Mapred-site.xml
<configuration><property><name>mapreduce.framework.name</name><value>yarn</ Value></property></configuration>
So the configuration is complete.
Open the virtual machine, turn on the yarn service, enter JPS to see if there are two parts of ResourceManager NodeManager. There is a successful configuration.
Running WordCount algorithm under virtual machine
Enter the wordcount algorithm in hadoop-->share-->hadoop--mapreduce--> execution Hadoop-mapreduce-examples-2.7.3.jar
It should be noted here that the first directory after WordCount is the directory of the statistical character file, the second is the output directory, the output directory must be nonexistent before the error will be
Introduction to the work flow of the MapReduce, the rough division can be divided into the following steps
1. Code Writing
2. Job Configuration
3. Submit the Job
4. Initialize the operation
5. Assigning tasks
6. Perform Tasks
7. Update Tasks and Status
MapReduce processes data in the form of key-value pairs when processing data
1, the MapReduce framework is through the map to read the contents of the file, parse into key, value on each line of the file, parse to key, value pair <key,value> each key value pair call a map function, write their own logic, the input key, The value processing is converted to a new key, value output, and the output intermediate key value pair is passed to reduce;
2. Before reduce, there is a shuffle process to merge and sort the output of multiple map tasks
3, write the reduce function's own logic, the input key, value processing, converted to a new key, value output
4. Save the output of reduce to a file
The above is an understanding of the work flow of mapreduce after I finished my study, and the WordCount algorithm is implemented by Java code below.
Start by creating a Maven project that introduces the following dependencies in Pom.xml
</dependency> <dependency> <groupId>org.apache.hadoop</groupId> < artifactid>hadoop-common</artifactid> <version>2.7.3</version> </dependency > <dependency> <groupId>org.apache.hadoop</groupId> <artifactId> hadoop-client</artifactid> <version>2.7.3</version> </dependency> < dependency> <groupId>jdk.tools</groupId> <artifactid>jdk.tools</artifactid > <version>1.8</version> <scope>system</scope> <systempath>${ Java_home}/lib/tools.jar</systempath> </dependency>
Set up Map classes
Importjava.io.IOException;Importorg.apache.hadoop.io.IntWritable;Importorg.apache.hadoop.io.LongWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Mapper; Public classMyMapextendsmapper<longwritable, text, text, intwritable>{@Override/** where Keyin (byte offset) Vlaue (obtained data type) keyout (output data type) value (output data value type )*/ protected voidMap (longwritable key, text value, mapper<longwritable, text, text, intwritable>. Context context)throwsIOException, interruptedexception {//TODO auto-generated Method StubString line=value.tostring ();//get file contents by lineString[] Words=line.split ("");//Shard each line of content with a space for(String word:words) {context.write (NewText (Word.trim ()),NewIntwritable (1));//overflows the output of the map function into an in-memory ring buffer } }}
Create a reduce class
Importjava.io.IOException;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Reducer; Public classMyreduceextendsReducer<text, Intwritable, Text, intwritable>{@Override/** Key is the type of the map output key, and the iterator type corresponds to the value values from the map * iterators are used to implement each value in the map once processing **/ protected voidReduce (Text key, iterable<intwritable>values, Reducer<text, Intwritable, Text, Intwritable> Context context)throwsIOException, interruptedexception {//TODO auto-generated Method Stub intSum=0; //Data Processing for(intwritable intwritable:values) {sum+=Intwritable.get (); } context.write (Key,Newintwritable (sum)); }}
Create Job Class
Importorg.apache.hadoop.conf.Configuration;Importorg.apache.hadoop.conf.Configured;ImportOrg.apache.hadoop.fs.Path;Importorg.apache.hadoop.io.IntWritable;ImportOrg.apache.hadoop.io.Text;ImportOrg.apache.hadoop.mapreduce.Job;ImportOrg.apache.hadoop.mapreduce.lib.input.FileInputFormat;ImportOrg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;ImportOrg.apache.hadoop.util.Tool;ImportOrg.apache.hadoop.util.ToolRunner; Public classMyJobextendsConfiguredImplementstool{ Public Static voidMain (string[] args)throwsException {MyJob MyJob=NewMyJob (); Toolrunner.run (MyJob,NULL); } @Override Public intRun (string[] args)throwsException {//TODO auto-generated Method StubConfiguration conf=NewConfiguration ();//creating a Configuration objectConf.set ("Fs.defaultfs", "hdfs://192.168.80.142:9000"); //Assigning TasksJob job=job.getinstance (conf); Job.setjarbyclass (MyJob.class); Job.setmapperclass (MyMap.class); Job.setreducerclass (myreduce.class); Job.setoutputkeyclass (Text.class); Job.setoutputvalueclass (intwritable.class); //Creating a file input/output streamFileinputformat.addinputpath (Job,NewPath ("/hadoop/hadoop.txt")); Fileoutputformat.setoutputpath (Job,NewPath ("/hadoop/out")); Job.waitforcompletion (true); return0; }}
---restore content ends---
Big Data Learning--mapreduce Configuration and Java code implementation wordcount algorithm