Mapreduce Execution Process Analysis (based on Hadoop2.4) -- (2), mapreducehadoop2.4
4.3 Map class
Create a Map class and a map function. The map function is org. apache. hadoop. mapreduce. the Mapper class calls the map method once when processing each key-value pair. You need to override this method. The setup and cleanup methods are also available. The map method is called once when the map task starts to run, and the cleanup method is run once when the whole map task ends.
4.3.1 introduction to Map
The ER er Class is a generic class with four parameters (Input key, input value, output key, and output value ). Here, the input key is Object (row by default), the input value is Text (String type in hadoop), and the output key is Text (keyword) and the output value is IntWritable (int type in hadoop ). All the above hadoop data types are similar to java data types except that they are specially optimized for network serialization.
MapReduce has the following types similar to IntWritable:
BooleanWritable: Standard Boolean value, ByteWritable: single-byte value, DoubleWritable: double-byte value, FloatWritable: Floating Point Number, IntWritable: integer number, LongWritable: long integer number, Text: text stored in UTF8 format (similar to String in java) and NullWritable: used when the key or value in <key, value> is null.
These are the WritableComparable interfaces:
A Map task is an independent task that converts an input record set to an intermediate format record set. The map Method in the ER er Class maps the Input key-value pairs (key-value pair) to a set of key-value pairs in the intermediate format. The intermediate format record set for this conversion does not need to be of the same type as the input record set. A given input key-value pair can be mapped to zero or multiple output key-value pairs.
1 StringTokenizer itr = new StringTokenizer(value.toString());2 while (itr.hasMoreTokens()) {3 word.set(itr.nextToken());4 context.write(word, one);5 }
The input lines are parsed and separated, and saved using the Context write method. Context is an abstract internal class that implements the MapContext interface. Here, each parsed word is used as the key, and integer 1 is used as the corresponding value, indicating that the word appears once. Map is a splitting process, and reduce is a combination process. The number of Map tasks corresponds to the number of split tasks. For details about how to execute a Map task, see the next section.
4.3.2 Map Task Analysis
After the Map task is submitted to Yarn, It is started by the ApplicationMaster. The task is in the form of a YarnChild process, where the MapTask run method is executed. Both MapTask and ReduceTask are inherited Task abstract classes.
The steps for running the run method are as follows:
Step 1:
Check whether a Reduce task exists. If no Reduce task exists, the entire submitted job ends when the Map task is completed. If yes, set the current progress to 66.7% when the Map task is completed, when Sort is complete, set the progress to 33.3%.
Step 2:
Start the TaskReporter thread to update the current status.
Step 3:
Initialize the task, set the current status of the task to RUNNING, and set the output directory.
Step 4:
Determine whether the current job is a jobCleanup task, a jobSetup task, a taskCleanup task, and the corresponding processing.
Step 5:
Call the runNewMapper method to execute a specific map.
Step 6:
After the job is complete, call the done method to clean up the task, update the counter, and update the task status.
4.3.3 runNewMapper Analysis
Next let's take a look at this runNewMapper method. The Code is as follows:
1 private <INKEY,INVALUE,OUTKEY,OUTVALUE> 2 void runNewMapper(final JobConf job, 3 final TaskSplitIndex splitIndex, 4 final TaskUmbilicalProtocol umbilical, 5 TaskReporter reporter 6 ) throws IOException, ClassNotFoundException, 7 InterruptedException { 8 // make a task context so we can get the classes 9 org.apache.hadoop.mapreduce.TaskAttemptContext taskContext = new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl(job, getTaskID(), reporter);10 11 // make a mapper
org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE> mapper = (org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE>)12 ReflectionUtils.newInstance(taskContext.getMapperClass(), job);13 14 // make the input format org.apache.hadoop.mapreduce.InputFormat<INKEY,INVALUE> inputFormat = (org.apache.hadoop.mapreduce.InputFormat<INKEY,INVALUE>) 16 ReflectionUtils.newInstance(taskContext.getInputFormatClass(), job);
18 // rebuild the input split19 org.apache.hadoop.mapreduce.InputSplit split = null;20 21 split = getSplitDetails(new path(splitIndex.getSplitLocation()), splitIndex.getStartOffset());24 25 LOG.info("Processing split: " + split);26 org.apache.hadoop.mapreduce.RecordReader<INKEY,INVALUE> input = new NewTrackingRecordReader<INKEY,INVALUE> (split, inputFormat, reporter, taskContext); 27 28 job.setBoolean(JobContext.SKIP_RECORDS, isSkipping());29 org.apache.hadoop.mapreduce.RecordWriter output = null; 30 31 // get an output object32 if (job.getNumReduceTasks() == 0) {33 output = new NewDirectOutputCollector(taskContext, job, umbilical, reporter);34 } else {35 output = new NewOutputCollector(taskContext, job, umbilical, reporter);36 }37 38 org.apache.hadoop.mapreduce.MapContext<INKEY, INVALUE, OUTKEY, OUTVALUE> mapContext = new MapContextImpl<INKEY, INVALUE, OUTKEY, OUTVALUE>(job, getTaskID(), input, output, committer, reporter, split);39 org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE>.Context mapperContext = new WrappedMapper<INKEY, INVALUE, OUTKEY, OUTVALUE>().getMapContext(mapContext); 40 41 try {42 input.initialize(split, mapperContext);43 mapper.run(mapperContext);44 mapPhase.complete();45 setPhase(TaskStatus.Phase.SORT);46 statusUpdate(umbilical);47 input.close();48 input = null;49 output.close(mapperContext);50 output = null;51 } finally {52 closeQuietly(input);53 closeQuietly(output, mapperContext);54 }55 }
The main execution process of this method is:
Step 1:
Obtains the configuration information Class Object TaskAttemptContextImpl, the self-developed er instance Mapper, the user-specified InputFormat object (the default is TextInputFormat), and the corresponding part information split of the task.
The TaskAttemptContextImpl class implements the TaskAttemptContext interface, while the TaskAttemptContext interface inherits from the JobContext and Progressable interfaces, but it adds information about tasks relative to JobContext. The TaskAttemptContextImpl object can be used to obtain many classes related to task execution, such as user-defined er classes and InputFormat classes.
Step 2:
Construct a NewTrackingRecordReader object based on inputFormat. The RecordReader <K, V> real in this object is LineRecordReader, which is used to read the content in the part and pass it to the Mapper's map method for processing.
Step 3:
Create org. apache. hadoop. mapreduce. recordWriter object, as the task output. If there is no reducer, set this RecordWriter object to NewDirectOutputCollector (taskContext, job, umbilical, reporter) and directly output it to HDFS. If there is a reducer, set the RecordWriter object to NewOutputCollector (taskContext, job, umbilical, reporter) as the output.
NewOutputCollector is the map output of a job with CER Cer. The main object contained in this class is MapOutputCollector <K, V> collector, which is constructed using a reflection tool:
1 ReflectionUtils.newInstance(job.getClass(JobContext.MAP_OUTPUT_COLLECTOR_CLASS_ATTR, MapOutputBuffer.class, MapOutputCollector.class), job);
If the Reduce number is greater than 1, org. apache. hadoop. mapreduce. partitioner <K, V> (the default value is HashPartitioner. class), used to partition the mapper output data, that is, the CER to which the data is summarized. The write method of NewOutputCollector calls collector. collect (key, value, partitioner. getPartition (key, value, partitions); otherwise, set the number of partitions to 0.
Step 4:
Open the input file (construct a LineReader object to implement specific reading of the file content) and point the file pointer to the file header. Completed by the initialize method of LineRecordReader.
In fact, the content of the read object is the LineReader object in the class. This object is initialized in the initialize method and will be passed into the corresponding input stream object based on the file type of the input file (compressed or not compressed. LineReader uses the following method from the input stream object:
In. readLine (new Text (), 0, maxBytesToConsume (start ));
The method reads a row and puts it in the str object, and returns the length of the data to be read.
LineRecordReader. the nextKeyValue () method sets two objects, key and value. A key offset refers to the offset of the current row of data in the input file. (Note that this offset is not the offset corresponding to a single part, but for the offset of the entire text), value is a line of content read through the LineReader object in:
1 in.readLine(value, maxLineLength, Math.max(maxBytesToConsume(pos), maxLineLength));
If no data is readable, false is returned. Otherwise, true is returned.
In addition, getCurrentKey () and getCurrentValue () are used to obtain the current key and value. Before calling these two methods, you must call nextKeyValue () to assign a new value to the key and value. Otherwise, they will be repeated.
In this way, it is associated with the run method in org. apache. hadoop. mapreduce. Mapper.
Step 5:
Run org. apache. hadoop. mapreduce. Mapper.
1 public void run(Context context) throws IOException, InterruptedException { 3 setup(context); 5 try { 7 while (context.nextKeyValue()) { 9 map(context.getCurrentKey(), context.getCurrentValue(), context); 11 } 13 } finally { 15 cleanup(context); 17 } 19 }
Step5.1:
First, the setup method is executed to set custom parameters for easy reading in the following steps. The parameter is set in Context. This object is initialized in the runNewMapper method of the MapTask class:
1 org.apache.hadoop.mapreduce.Mapper<INKEY,INVALUE,OUTKEY,OUTVALUE>.Context3 mapperContext = new WrappedMapper<INKEY, INVALUE, OUTKEY, OUTVALUE>().getMapContext(mapContext);
The instance object of LineRecordReader and the Instance Object of NewOutputCollector are passed in. The nextKeyValue (), getCurrentValue (), and getCurrentKey () below will call the corresponding method of reader, thus implementing Mapper. the nextKeyValue () in the run method continuously obtains the key and value.
Step5.2:
The map Method in the loop is the User-Defined map. After the logic of the map method is processed, the context. write (K, V) method is used to output the computing data. The write method calls NewOutputCollector. write method. The write method calls MapOutputBuffer. collect (key, value, partitioner. the getPartition (key, value, partitions) method is used to report progress, serialize data, and cache data. It mainly involves the Spill process, which will be detailed in the next section.
Step5.3:
After reading the data, we can call the cleanup method to clean up the data. We can also use it to rewrite the cleanup method as needed.
Step 6:
Finally, the output stream is closed. close (mapperContext), this method will execute MapOutputBuffer. the flush () operation also writes the remaining data to the local file using the sortAndSpill () method, and finally calls the mergeParts () method to merge all the spill files. The sortAndSpill method is described in Section 4.3.4.
4.3.4 Spill Analysis
The Chinese meaning of Spill is overflow, and spill processing is overflow writing. How can this problem be solved? The Spill process includes the following steps: output, sorting, overwrite, and merge ,:
Each Map task continuously outputs data to a ring data structure constructed in the memory in the form of a <key, value> pair. The circular data structure is used to use the memory space more effectively and store as much data as possible in the memory.
This data structure is actually a byte array, called kvbuffer, which contains not only <key, value> data, but also some index data, A kvmeta alias is assigned to the region where the index data is stored.
kvbuffer = new byte[maxMemUsage]; bufvoid = kvbuffer.length; kvmeta = ByteBuffer.wrap(kvbuffer).order(ByteOrder.nativeOrder()).asIntBuffer(); setEquator(0); bufstart = bufend = bufindex = equator; kvstart = kvend = kvindex;
Kvmeta is an index of the Record <key, value> In kvbuffer. It is a triplet, including the start position of value, start position of key, partition value, and length of value, taking up four Int lengths, The kvmeta storage pointer kvindex jumps down four steps each time, and then fills the data of the four tuples in a pitfall. For example, if the initial position of kvindex is-4, after the first <key, value> is written, (kvindex + 0) stores the starting position of value (kvindex + 1) (kindex + 2) and (kvindex + 3) respectively. Then, kvindex jumps to the-8 position, after the second <key, value> and index are written, kvindex jumps to the-32 position.
<Key, value> the data region and the index data region are two adjacent areas in kvbuffer that do not overlap. The two are divided by a demarcation point, while the split point is changed, each Spill is updated once. The initial demarcation point is 0. The storage direction of <key, value> data is upward, and the storage direction of index data is downward ,:
The default size of kvbuffer maxMemUsage is 100 MB. Many variables are involved:
(1) kvstart is the subscript for the start of a valid record;
(2) kvindex is the next record location;
(3) When kvend starts Spill, it will be assigned the value of kvindex. When the Spill ends, its value will be assigned to kvstart. At this time, kvstart = kvend. This means that if kvstart is not kvend and the system is executing spill, otherwise, kvstart = kvend, and the system is in normal working state;
(4) bufvoid, used to indicate the end Of the actually used buffer;
(5) bufmark, used to mark the end of a record;
(6) The initial bufindex value is 0. After an Int-type key is written, bufindex increases to 4. After an Int-type value is written, bufindex increases to 8.
The amount of data between kvindex and bufindex (including the equator node) is not Spill. If the space occupied by the data is greater than or equal to the specified percentage of Spill (80% by default), The startSpill method is called to overwrite the data. The corresponding method is:
1 private void startSpill() { 2 3 assert !spillInProgress; 4 5 kvend = (kvindex + NMETA) % kvmeta.capacity(); 6 7 bufend = bufmark; 8 9 spillInProgress = true;10 11 LOG.info("Spilling map output");12 13 LOG.info("bufstart = " + bufstart + "; bufend = " + bufmark +14 15 "; bufvoid = " + bufvoid);16 17 LOG.info("kvstart = " + kvstart + "(" + (kvstart * 4) +18 19 "); kvend = " + kvend + "(" + (kvend * 4) +20 21 "); length = " + (distanceTo(kvend, kvstart,22 23 kvmeta.capacity()) + 1) + "/" + maxRec);24 25 spillReady.signal();26 27 }
This will trigger the semaphore so that the SpillThread thread waiting in the init method of the MapTask class continues to run.
1 while (true) { 3 spillDone.signal(); 5 while (!spillInProgress) { 7 spillReady.await(); 9 }10 11 try {13 spillLock.unlock();15 sortAndSpill(); 17 } catch (Throwable t) { 19 sortSpillException = t; 21 } finally { 23 spillLock.lock(); 25 if (bufend < bufstart) { 27 bufvoid = kvbuffer.length; 29 }30 31 kvstart = kvend; 33 bufstart = bufend; 35 spillInProgress = false; 37 } 39 }
Continue to call the sortAndSpill method, which is used to fl data from the buf to the disk. Kvmeta writes the <key, value> data of each partition to the file based on the sorted kvmeta. After the data corresponding to a partition is completed, the next partition is created sequentially, until all partitions are traversed (the number of partitions is the number of reducers ).
Step 1:
Calculate the size of the written file first;
1 final long size = (bufend >= bufstart3 ? bufend - bufstart5 : (bufvoid - bufend) + bufstart) +7 partitions * APPROX_HEADER_LENGTH;
Step 2:
Obtain the name of the file written to the local (non-HDFS) file with a serial number, for example, output/spill2.out. The code corresponding to the naming format is:
1 return lDirAlloc.getLocalPathForWrite(MRJobConfig.OUTPUT + "/spill"2 3 + spillNumber + ".out", size, getConf());
Step 3:
Sort the data in the [bufstart, bufend) interval in the buffer zone kvbuffe in the ascending order of partition Number partition and key. After sorting, data is aggregated in partitions and all data in the same partition is sorted by keys;
Step 4:
Construct an IFile. Writer object to spread the output to the specified file. This object supports row-level compression.
1 writer = new Writer<K, V>(job, out, keyClass, valClass, codec, spilledRecordsCounter);
If you set Combiner (actually a Reducer), the data in each partition will be aggregated once before the file is written, through combinerRunner. combine (kvIter, combineCollector) implementation, and then execute CER Cer. run method, but the output is different from the normal reducer. IFile will be called in the end. writer append method to write local files.
Step 5:
Write metadata information to the memory index data structure SpillRecord. If the index in the memory is greater than 1 MB, it is written to the file with a file name similar to output/spill2.out. index. "2" indicates the number of Spill times.
1 if (totalIndexCacheMemory >= indexCacheMemoryLimit) { 2 3 // create spill index file 4 5 Path indexFilename = 6 7 mapOutputFile.getSpillIndexFileForWrite(numSpills, partitions 8 9 * MAP_OUTPUT_INDEX_RECORD_LENGTH);10 11 spillRec.writeToFile(indexFilename, job);12 13 } else {14 15 indexCacheList.add(spillRec);16 17 totalIndexCacheMemory +=18 19 spillRec.size() * MAP_OUTPUT_INDEX_RECORD_LENGTH;20 21 }
The index file not only stores the index data, but also the crc32 validation data. The index file is not necessarily created on the disk. If the memory (1 MB by default) can be placed in the memory.
The correspondence between the out file, index file, and partition data file is as follows:
The index file information mainly includes the offset, size, and size of the partition metadata.
Step 6:
At the end of Spill, the resetSpill method is called to reset the Spill.
1 private void resetSpill() { 2 3 final int e = equator; 4 5 bufstart = bufend = e; 6 7 final int aligned = e - (e % METASIZE); 8 9 // set start/end to point to first meta record10 11 // Cast one of the operands to long to avoid integer overflow12 13 kvstart = kvend = (int)14 15 (((long)aligned - METASIZE + kvbuffer.length) % kvbuffer.length) / 4;16 17 LOG.info("(RESET) equator " + e + " kv " + kvstart + "(" +18 19 (kvstart * 4) + ")" + " kvi " + kvindex + "(" + (kvindex * 4) + ")");20 21 }
That is to say, set the intermediate location of the remaining space in kvbuffer to the new demarcation point.
4.3.5 merge
If a Map task outputs a large amount of data, it may execute Spill several times. The out file and Index file will generate a large number, which are distributed on different disks. In this case, the merge operation is required to merge these files.
Merge scans all local directories to obtain the Index file, stores the Index information in a list, and creates a file based on the list. an out file and a file. out. the Index file is used to store the final output and Index.
Each artition should have a segment list, which records the file name, start position, length, and so on of the partition data in all the Spill files. Therefore, all segments corresponding to artition are merged into a segment. When this partition corresponds to multiple segments, it is merged in batches, similar to heap sorting. The final Index data is still output to the Index file. Corresponds to the mergeParts method.
4.3.6 related configuration options
There are so many Map objects. It mainly reads data and writes it into the memory buffer. When the cache meets the conditions, it will be quickly arranged, and partition is set, and then Spill it to the local file and index file. If there is a combiner, A clustering operation is also performed before Spill. After data is run, all spill files and index files are merged. If a combiner exists, before merging, a comprehensive aggregation operation is performed after the conditions are met. The results of the map stage are stored locally (if there is reducer), rather than HDFS.
The above analysis involves the following configuration options:
Mapreduce. job. map. output. collector. class. The default value is MapTask. MapOutputBuffer;
Mapreduce. map. sort. spill. percent: Specifies the percentage of memory overflow. The default value is 0.8;
Mapreduce. task. io. sort. mb: configure the size of the memory bufer. The default value is 100 mb;
Map. sort. class configures sorting implementation class. The default value is QuickSort, which is quick sorting;
Mapreduce. map. output. compress. codec: configure the compression processing program for map output;
Mapreduce. map. output. compress: Specifies whether to enable compression for map output. The default value is false.
In the entire map/reduce process of hadoop mapreduce, are map and reduce executed on master or slaver respectively?
There are Map and Reduce Execution Code on each slave (datanode.
When a Job is submitted, the configuration files and jar files of the job are packaged and copied.
To the various datanode and perform local execution.
Translation: the working principle and computing process of the Hadoop architecture computing model MapReduce are described in detail through examples.
By the instance anaysis, it gives an detail description to the working principle and calculate process of Hadoop appseture MapReduce computation module.