Transferred from:http://www.cnblogs.com/forfuture1978/archive/2010/11/19/1882279.html
Transfer note: Originally wanted in the Hadoop Learning Summary series detailed analysis HDFs and map-reduce, but find the information, found this article, and found that Caibinbupt has been the source code of Hadoop has been detailed analysis, recommended everyone read.
Transfer from http://blog.csdn.net/HEYUTAO007/archive/2010/07/10/5725379.aspx
Reference:
1 Caibinbupt Source Code Analysis http://caibinbupt.javaeye.com/
2 Coderplay of Avaeye
http://coderplay.javaeye.com/blog/295097
http://coderplay.javaeye.com/blog/318602
3 Javen-studio Coffee House
Http://www.cppblog.com/javenstudio/articles/43073.html
A mapreduce overview
Map/reduce, a distributed computing model for large-scale data processing, was originally designed and implemented by Google engineers, and Google has publicly released its complete mapreduce paper. The definition of this is that Map/reduce is a programming model (programming models) and is a related implementation for processing and generating large-scale datasets (processing and generating large data sets). The user defines a map function to process a key/value pair to generate a batch of intermediate key/value pairs, and then defines a reduce function that merges all these intermediate values with the same key. Many real-world tasks can be expressed using this model.
Working principle of two MapReduce
The Map-reduce framework operates entirely on the basis of <key,value>, that is, the input of the data is a group of <key,value> pairs, resulting in a batch of <key,value> pairs, but sometimes they are of different types. The classes of key and value are required to support serialization (serialize) operations, so they must implement the writable interface, and the class of key must also implement the writablecomparable interface. Makes it possible for the framework to perform a sort operation on a dataset.
The execution process of a map-reduce task and the type of data input and output are as follows:
Map:<k1,v1>->list<k2,v2>
R educe:<k2,list<v2>>-><k3,v3>
The following example illustrates this process in detail.
WordCount is an example of the Hadoop comes with the goal of counting the number of words in a text file. Suppose that there are two text files below to run the Workcount program:
Hello World Bye
Hello Hadoop GoodBye Hadoop
1 Map Data entry
Hadoop uses the Linerecordreader class for text files by default to read, one line at a Key/value pair, and key for offset, value for line content.
The following is the input data for MAP1:
Key1
Value1
0
Hello World Bye
The following is the input data for MAP2:
Key1
Value1
0
Hello Hadoop GoodBye Hadoop
2 Map Output/combine input
The following is the output of the MAP1
Key2
Value2
Hello
1
World
1
Bye
1
World
1
The following is the output of the MAP2
Key2
Value2
Hello
1
Hadoop
1
GoodBye
1
Hadoop
1
3 Combine output
The Combiner class implementation combines the values of the same key, and it is also a reducer implementation.
The following is the output of the Combine1
Key2
Value2
Hello
1
World
2
Bye
1
The following is the output of the Combine2
Key2
Value2
Hello
1
Hadoop
2
GoodBye
1
4 Reduce output
The Reducer class implementation merges the values of the same key.
The following is the output of reduce
Key2
Value2
Hello
2
World
2
Bye
1
Hadoop
2
GoodBye
1
Three-MapReduce framework structure 1 role 1.1 Jobtracker
Jobtracker is a master service, and jobtracker each subtask task that dispatches the job runs on Tasktracker and monitors them, If a failed task is found, rerun it. In general, Jobtracker should be deployed on separate machines.
1.2
Tasktracker
Tasktracker is a slaver service that runs on multiple nodes. Tasktracker is responsible for executing each task directly. Tasktracker All need to run on the DataNode of HDFs,
1.3 jobclient
Each job is stored in HDFs using the Jobclient class to package the application and configuration parameters into a jar file on the client side, and submits the path to Jobtracker, and then by Jobtracker create each Task(i.e., maptask and reducetask) and distribute them to the various Tasktracker services to execute.
2 Data Structures 2.1 Mapper and Reducer
The most basic component of a MapReduce application running on Hadoop consists of a Mapper and a Reducer class, and an execution program that creates jobconf . In some applications can also include a combiner class, which is actually the implementation of Reducer .
2.2 Jobinprogress
jobclient After the job is submitted,Jobtracker creates a jobinprogress to track and dispatch the job and add it to the job queue. jobinprogress creates a corresponding batch of taskinprogress for monitoring and scheduling based on the input datasets defined in the job jar being submitted (decomposed into filesplit) Maptask, while creating a specified number of taskinprogress for monitoring and dispatching reducetask, the default is 1 reducetask.
2.3 taskinprogress
Jobtracker starts a task with each taskinprogress to Launchtask, the Task object (that is, maptask and Reducetask) is serialized to the corresponding tasktracker service,tasktracker receives the corresponding taskinprogress(This Taskinprogress implements a taskinprogressthat is used in non- jobtracker , and acts similarly) for monitoring and scheduling the Task. Starting a specific Task process is run through the taskinprogress managed taskrunner object. Taskrunner automatically loads the job jar and sets the environment variable to start a separate Java child process to execute the Task, that is, maptask or Reducetask, but they do not necessarily run in the same Tasktracker .
2.4 Maptask and Reducetask
A complete job automatically executes Mapper,combiner(executed when jobconf specifies combiner ), and Reducer, where Mapper and combiner are executed by the maptask call,Reducer is called by Reducetask ,combiner The actual implementation is also the Reducer interface class. Mapper The input datasets defined in the job jar are read in <key1,value1>, the processing is completed to generate a temporary <key2,value2> pair, if combiner is defined ,Maptask will perform a call to the combiner at Mapper to combine the values of the same key to reduce the output result set. The maptask task is completed to the reducetask process call Reducer processing, generating the final result <key3,value3> pair. This process is described in more detail in the next section.
Describes the main components in the Map/reduce framework and their relationships:
3 process
A mapredcue job is submitted through Jobclient.rubjob (job) to the master node Jobtracker, Jobtracker to the jobclient request and joins it in the job queue. Jobtracker has been waiting for jobclient to submit the job via RPC, and Tasktracker has been sending the heartbeat to Jobtracker via RPC heartbeat ask if there is any task to do, and if so, let it perform the task. If the Jobtracker job queue is not empty, the heartbeat sent by Tasktracker will get Jobtracker to distribute the task to it. This is a pull process. The tasktracker of the Slave node initiates a task on its local task after it is received. The following are brief:
The following is a detailed introduction to the process of map/reduce processing a job.
Four jobclient
This is usually the case when writing a mapreduce program:
Configuration conf = new configuration (); Read Hadoop configuration
Job Job = new Job (conf, "job name"); Instantiate a job
Job.setmapperclass (mapper type);
Job.setcombinerclass (combiner type);
Job.setreducerclass (reducer type);
Job.setoutputkeyclass (type of output key);
Job.setoutputvalueclass (type of output value);
Fileinputformat.addinputpath (Job, New path (input HDFs path));
Fileoutputformat.setoutputpath (Job, New path (output HDFs path));
Other initialization configuration
Jobclient.runjob (Job);
1 Configuring the Job
Jobconf is the interface that the user describes for a job. The following information is some of the more critical customization information in the MapReduce process:
2 Jobclient.runjob (): Runs the job and decomposes the input dataset
A mapreduce job will decompose the input dataset into a small set of datasets, based on the InputFormat implementation class defined by the user in the jobconf class, using the Jobclient class. Each small data set corresponds to creating a maptask to handle. jobclient uses the default Fileinputformat class to invoke the Fileinputformat. Getsplits () method to generate a small data set, If the data file is determined to be issplitable (), the large file will be decomposed into small filesplit, of course only the path and offset and split size of the document in HDFs. This information is packaged uniformly into the jar of jobfile.
Jobclient then uses the Submitjob (job) method to submit the job to master. SubmitJob (Job) is done through the submitjobinternal (job) method to complete the substantive job submission. The Submitjobinternal (Job) method first uploads three files to the Hadoop distribution system file System HDFs: Job.jar, Job.split, and Job.xml.
Job.xml: Job configuration, such as Mapper, combiner, type of reducer, type of input and output format, etc.
Job.jar:jar package, which contains the various classes required to perform this task, such as Mapper,reducer, and other implementations.
Job.split: Information about file chunking, such as how many blocks the data is divided, the size of the block (default 64m), and so on.
The path of these three files on HDFs is determined by the MapReduce system path Mapred.system.dir attribute + Jobid in the Hadoop-default.xml file. The Mapred.system.dir property defaults to/tmp/hadoop-user_name/mapred/system. After writing these three files, this method invokes the Jobtracker.submitjob (Job) method on the master node through RPC, at which time the job has been submitted for completion.
3 Submit Job
The Jobfile submission process is implemented through the RPC module, which is described in detail in a separate chapter. The approximate process is that the proxy interface implemented by RPC in theJobclient class calls the Jobtracker submitjob () method, and Jobtracker The Jobsubmissionprotocol interface must be implemented.
Jobtracker When the job is created, the jobclient returns a jobstatus object that records the status information of the job, such as the execution time, the scale of the map and the reduce task, and so on. jobclient will create a networkedjob runningjob object based on this Jobstatus object for timing from Jobtracker obtains statistical data for the execution process to monitor and print to the user's console.
The classes and methods associated with creating the job process are as shown in
Five
Jobtracker
As mentioned above, the job is uniformly dispatched by Jobtracker , and the specific Task is distributed to each tasktracker node for execution. The following is a detailed parsing of the execution process, starting with the Jobtracker receiving jobclient 's submission request first.
1 Jobtracker initialization Job1.1 Jobtracker.submitjob () received request
When Jobtracker receives a new job request (that is, the submitjob () function is called), a jobinprogress object is created and passed to manage and dispatch the task. jobinprogress Initializes a series of task-related parameters at the time of creation, calls to filesystem, and downloads all the task files uploaded on the jobclient to a temporary directory in the local file system. This includes the uploaded *.jar file package, the XML that records the configuration information, and the file that records the split information.
1.2 Jobtracker.jobinitthread notification of initialization thread
The Listener class Eagertaskinitializationlistener in Jobtracker is responsible for initializing task tasks. jobtracker use jobadded job to join job to eagertaskinitializationlistener A specialized management needs A list member variable jobinitqueue in the starting queue. The Resortinitqueue method is sorted according to the priority of the job. Then call the Notifyall () function, which invokes a thread Jobinitthread to initialize the job to handle. Jobinitthread receives the signal and then takes out the most up-to-top job, the job with the highest priority, calls Tasktrackermanager's Initjob final call Jobinprogress.inittasks () Perform the actual initialization work.
1.3 Jobinprogress.inittasks () initialization taskinprogress
Task tasks are divided into two types: Maptask and Reducetask, whose management objects are taskinprogress.
First jobinprogress will create a monitor object for the map. In the Inittasks () function, the rawsplit List of decomposed input data is obtained by calling Jobclient 's Readsplitfile (), and then the corresponding number of map execution management objects are created based on this list. taskinprogress. In this process, the host of all the Datanode nodes that correspond to the blocks of the rawsplit block in HDFs is also recorded, which will be filesplit when rawsplit is created. getlocations () function gets, the function calls Distributedfilesystem 's getfilecachehints () to get (this detail is explained in HDFs). Of course, if it is stored in the local file system, that is, when using LocalFileSystem , only one location is "localhost".
When these taskinprogress objects are created, the Inittasks () method passes the Createcache () method for these taskinprogress The Nonrunningmapcache object produces a map cache that does not perform a task. When the slave-side Tasktracker sends a heartbeat to master, it can take the task directly from the cache to execute.
Next, jobinprogress will create a monitoring object for reduce, which is relatively simple and is created based on the number of reduce specified in the jobconf , creating only 1 reduce tasks by default. The taskinprogress class is used to monitor and schedule the reduce task, but the construction method differs, andtaskinprogress creates a specific Maptask based on the different parameters. or reducetask. Similarly, inittasks () generates Nonrunningreducecache members through the Createcache () method.
jobinprogress after the taskinprogress is created, the Jobstatus is finally constructed and the job is logged, and then jobhistoryis called. jobinfo. logstarted () records the job's execution log. The process of initializing the job in Jobtracker is all over.
2 Jobtracker Dispatch Job
Hadoop The default scheduler is the Jobqueuetaskscheduler of the FIFO policy, which has two member variables Jobqueuejobinprogresslistener with the above said Eagertaskinitializationlistener. Jobqueuejobinprogresslistener is another listener class for Jobtracker, which contains a mapping to manage and dispatch all jobinprogress. jobadded (Job) also joins the mapping in job to Jobqueuejobinprogresslistener.
Jobqueuetaskscheduler The most important method is Assigntasks, he realizes the work scheduling. Implementation: Jobtracker received Tasktracker Heartbeat () call, first check whether the previous heartbeat response is complete, is not required to start or restart the task, if everything is normal, will handle the heartbeat. First it checks how many map and reduce tasks The tasktracker side can do, whether the number of tasks to be dispatched exceeds this number, and whether the average remaining load on the cluster's task is exceeded. If none is exceeded, assign a maptask or reducetask for this tasktracker. The Generate MAP task uses Jobinprogress's Obtainnewmaptask () method, which essentially calls the Jobinprogress Findnewmaptask () Access Nonrunningmapcache.
As explained in the above task initialization, the Createcache () method hangs on the network topology for the taskinprogress to be executed. Findnewmaptask () from the near to the far layer of the search, first of all the same node, and then looking for the same cabinet node, and then look for the same data center node, until the end of the Maxlevel layer is found. In this way, when Jobtracker gives Tasktracker the task, you can quickly find the nearest tasktracker and let it perform the task.
Finally, a task class object is generated, which is encapsulated in a lanuchtaskaction, sent back to Tasktracker, to perform the task.
The process of generating a Reduce task is similar, using the Jobinprogress.obtainnewreducetask () method, essentially the last call to Jobinprogress's Findnewreducetask () access Nonruningreducecache.
Six TaskTracker1 tasktracker load task to child process
The execution of the task is actually initiated by Tasktracker ,tasktracker periodically (default is 10 seconds, see Heartbeat_interval variable defined in the Mrconstants Class) and Jobtracker performs a communication, reports the execution status of its own task, receives instructions from Jobtracker , and so on. If you find that you need to perform a new task will also start at this point, that is, the tasktracker call Jobtracker 's Heartbeat () method, this call is implemented through the IPC layer call Proxy interface. The following one by one briefly describes each step.
1.1 Tasktracker.run () connection Jobtracker
The tasktracker startup process Initializes a series of parameters and services, and then attempts to connect Jobtracker(that is, the Intertrackerprotocol interface must be implemented), and if the connection is broken, A loop attempts to connect to Jobtrackerand reinitialize all members and parameters.
1.2 Tasktracker.offerservice () main loop
If the connection jobtracker service succeeds,Tasktracker calls the Offerservice () function into the main execution loop. This loop will communicate with Jobtracker every 10 seconds, call Transmitheartbeat (), and get heartbeatresponse information. Then call Heartbeatresponse 's getactions () function to get all the instructions passed by Jobtracker , which is a tasktrackeraction array. Then iterate over this array, if it is a new task instruction, launchtaskaction call Addtotaskqueue join to the queue to be executed, otherwise join to Taskstocleanup queue, To a taskcleanupthread thread to handle, such as executing killjobaction or killtaskaction .
1.3 tasktracker.transmitheartbeat () get jobtracker instruction
In Transmitheartbeat () function processing,Tasktracker creates a new Tasktrackerstatus object that records the execution of the current task. Check the number of currently executing tasks and the space usage of the local disk, and if you can receive a new task, set the Askfornewtask parameter of Heartbeat () to true. The jobtracker Heartbeat () method is then called through the IPC interface to send the past, Heartbeat () return value tasktrackeraction array.
1.4 Tasktracker.addtotaskqueue, give tasklauncher to deal
Tasklauncher is the thread class used to process a new task, and contains a queue taskstolaunch for the task to be run. Tasktracker.addtotaskqueue calls Tasktracker's Registertask, creates a Taskinprogress object to dispatch and monitor the task, and adds it to the Runningtasks queue. The taskinprogress is added to Taskstolaunch and Notifyall () wakes up a thread to run, and the thread pulls a pending task from the queue taskstolaunch. Call Tasktracker Startnewtask to run the task.
1.5 Tasktracker.startnewtask () Start a new task
Call Localizejob () to actually initialize the task and start execution.
1.6 Tasktracker.localizejob () initializing the job directory, etc.
The main task of this function is to initialize the working directory Workdir, then copy the job jar package from HDFs to the local file system, and call Runjar.unjar () to extract the package to the working directory. Then create a runningjob and call the Addtasktojob () function to add it to the Runningjobs monitoring queue. The Addtasktojob method adds a task to the Tasks list of the runningjob that the task belongs to. If the task belongs to a runningjob that does not exist, create it first and add it to runningjobs. When finished, call Launchtaskforjob () to start executing the task.
1.7 Tasktracker.launchtaskforjob () Perform Tasks
The task is actually started by calling Tasktracker$taskinprogress's Launchtask () function to execute.
1.8 Tasktracker$taskinprogress.launchtask () Perform Tasks
Call Localizetask () to update the jobconf file and write to the local directory before performing the task. The Taskrunner object is then created by calling the Createrunner () method of the task and calling its start () method to finally start the task-independent Java execution child process.
1.9 Task.createrunner () to create a startup runner object
The task has two implementation versions, Maptask and Reducetask, which are used to create map and reduce tasks, respectively. maptask creates Maptaskrunner to start the task subprocess, and reducetask creates Reducetaskrunner to start.
1.10 Taskrunner.start () Start child process
Taskrunner is responsible for putting a task into a process to execute it. It calls the run () function to process, and the main task is to initialize a series of environment variables that start the Java subprocess, including setting the working directory Workdir, setting the CLASSPATH environment variable, and so on. Then load the job jar package. The Jvmmanager is used to manage all the task child processes running on the tasktracker. Each process is managed by Jvmrunner, which is also in a separate thread. Jvmmanager's Launchjvm method, depending on whether the task is map or reduce, generates the corresponding Jvmrunner and is managed in the corresponding Jvmmanagerfortype process container. Jvmmanagerfortype's REAPJVM ()
Assign a new JVM process. If the Jvmmanagerfortype slot is full, look for the idle process, if the same job is directly put in, otherwise kill the process, with a new process instead. If the slot is not full, start a new child process. Generate a new process using the SPAWNNEWJVM method. SPAWNNEWJVM uses the Run method of the Jvmrunner thread, the Run method is used to generate a new process and run it, and the implementation is called Runchild.
2 Child Process Execution Maptask
The real execution vector, which is the child, which contains a main function, the process execution, will pass the relevant parameters, it will disassemble these parameters, through Gettask (Jvmid) to the parent process to obtain the task, and constructs the relevant task instance, and then uses the task's run () to start the task.
2.1 Run
The method is quite simple, and after the taskreporter of the system is configured, the Runjobcleanuptask,runjobsetuptask,runtaskcleanuptask or execution mapper are executed according to the situation. Since MapReduce now has two sets of api,maptask need to support these two sets of APIs, making Maptask execution mapper divided into Runnewmapper and runoldmapper, we analyze Runoldmapper.
2.2 Runoldmapper
The first part of Runoldmapper is to construct the Inputsplit for mapper processing, and then start to create the recordreader of mapper and finally get the input of the map. After the construction of the output of the mapper is carried out through the mapoutputcollector, but also in two cases, if there is no reducer, then, with the Directmapoutputcollector, otherwise, with Mapoutputbuffer.
After constructing the input and output of the mapper, the mapper can be executed by constructing the maprunnable configured in the configuration file. The system currently has two Maprunnable:maprunner and Multithreadedmaprunner. Maprunner is a single-threaded actuator and is relatively simple, and he uses a reflection mechanism to generate a user-defined Mapper interface implementation class as a member of his.
2.3 Maprunner's Run method
The corresponding Key,value object is created first, and then, for each pair of Inputsplit <key,value>, the user implements the Mapper interface implementation class map method, each processing a data pair, It is necessary to use Outputcollector to collect new KV pairs after each processing kv pair, spill them to the file or put them into memory for further processing, such as sorting, combine, etc.
2.4 Outputcollector
The role of Outputcollector is to collect new KV pairs that are obtained after each call to map, rather spill them to a file or into memory for further processing, such as sorting, combine, etc.
There are two subcategories of Mapoutputcollector: Mapoutputbuffer and Directmapoutputcollector. Directmapoutputcollector is used when the reduce phase is not required. If mapper follow the reduce task, the system uses Mapoutputbuffer as the output, Mapoutputbuffer uses a buffer to cache the results of the map's processing, puts it in memory, and uses several arrays to manage the buffer.
At the right time, the data in the buffer will be spill to the hard disk.
Time to write data to the hard disk:
(1) When the memory buffer cannot tolerate the next too large kv pair. Spillsinglerecord method.
(2) When the memory buffer is full. Spillthread thread.
(3) The results of the Mapper have been collect, and the final cleanup of the buffer needs to be done. Flush method.
2.5 Spillthread Thread: Spill the data in the buffer to the hard disk.
(1) Call function Sortandspill when spill is required, sort by partition and key. The default is to use the quick sort quicksort.
(2) If there is no combiner, then directly output the record, otherwise, call Combinerrunner combine, first do combin and then output.
3 Child Process Execution Reducetask
The Reducetask.run method starts and Maptask similar, including initialize () initialization, Runjobcleanuptask (), Runjobsetuptask (), Runtaskcleanuptask (). After entering the formal work, there are three main steps: Copy, Sort, Reduce.
3.1 Copy
is from the server that executes each map task, acquisition to the output file of the map. The task of copying is the responsibility of the Reducetask.reducecopier class.
3.1.1 Class Diagram:
3.1.2 Process: Start with reducecopier.fetchoutputs
(1) Ask for the task. Use the Getmapeventsthread thread. The thread's Run method calls the Getmapcompletionevents method without stopping, and the method uses RPC to invoke the getmapcompletionevents of the Taskumbilicalprotocol protocol. Method uses the owning Jobid to ask its parent tasktracker the completion status of the map task for this job (Tasktracker to ask Jobtracker and then tell it ...). )。 Returns an array of Taskcompletionevent events[]. Taskcompletionevent contains information such as TaskID and IP addresses. (2) After obtaining the information of the relevant map task execution server, there is a thread mapoutputcopier open to do the specific copy work. It will be responsible for copying the files on a map task server within a separate thread. Mapoutputcopier's run loop calls Copyoutput,copyoutput and calls Getmapoutput, using an HTTP remote copy.
(3) Getmapoutput remote copy of the content (of course, it can be local ...) ), as a Mapoutput object exists, it can also be serialized on disk in memory, which is automatically adjusted according to memory usage.
(4) At the same time, there is a memory merger thread Inmemfsmergethread and a file merger thread Localfsmerger in sync work, they will download the file (possibly in memory, simply collectively referred to as the file ...) ), to do a merge sort, in order to save time, reduce the number of input files, for the subsequent sorting work burden. The Inmemfsmergethread run loop calls Doinmemmerge, which is merged using the tool class merger, combinerrunner.combine if combine is required.
3.2 Sort
Sort work, which is equivalent to a continuation of the above sort work. It will be done after all the files have been copied. Use the tool class merger to merge all the files. After this process, a new file with all the required map task output files has been created. The map task output files, which are all from other servers, have been deleted.
3.3 Reduce
The final phase of the reduce task. He will be ready for Keyclass ("Mapred.output.key.class" or "Mapred.mapoutput.key.class"), Valueclass ("Mapred.mapoutput.value.class "or" Mapred.output.value.class ") and Comparator (" Mapred.output.value.groupfn.class "or" Mapred.output.key.comparator.class "). Finally call the Runoldreducer method. (also two sets of APIs, we analyze Runoldreducer)
3.3.1 Runoldreducer
(1) Output aspect.
It will prepare a Outputcollector collection output, unlike Maptask, this outputcollector is simpler, just open a recordwriter,collect once, write once. The biggest difference is that the incoming Recordwriter file system is basically a distributed file system, or HDFs.
(2) The input aspect, Reducetask will use the prepared Keyclass, Valueclass, Keycomparator and so on the custom class, constructs the reducer the required key type, And the iteration type of the value iterator (a key here is usually the corresponding set of values).
(3) with the input, with the output, the loop calls the custom reducer, and finally, the reduce phase is complete.
MapReduce Source Code Analysis Summary