MapReduce Source Process

Source: Internet
Author: User

1. Steps to implement partitioning: 1.1 First analyze the specific business logic, determine how many partitions 1.2 first write a class, it inherits Org.apache.hadoop.mapreduce.Partitioner This class 1.3 overrides public int Getpartition This method, according to the specific logic, read the database or configuration return the same number 1.4 in the main method set Partioner class, Job.setpartitionerclass (Datapartitioner.class)  ; 1.5 Set the number of reducer, job.setnumreducetasks (6);

2. Sort Mr By default is sorted by Key2, if you want to customize the collation, the sorted object to implement the Writablecomparable interface, implement the collation in the CompareTo method, and then use this object as K2, you can complete the sorting

The role of 3.combiner is to make a merge of outputs on the map side to reduce the amount of data transferred to reducer.

4.MR Start-up process  start-mapred.sh  hadoop-daemon.sh----Hadoop Org.apache.hadoop.mapred.JobTracker      jobtracker Call Order: Main---starttracker ---New Jobtracker first creates a scheduler in its construction method, and then creates an RPC server (intertrackerserver) Tasktracker communicates with the PRC mechanism and then calls the Offerservice method to provide services externally , start RPC Server in the Offerservice method, initialize Jobtracker, call the Start method of TaskScheduler-- Eagertaskinitializationlistener calls the Start method, and calls the Jobinitmanagerthread start method because it is a thread that calls the Jobinitmanager Run method --The Jobinitqueue task queue takes the first task, throws it into the thread pool, and then calls-->initjob's Run method--Jobtracker Initjob Method-- Jobinprogress Inittasks, maps = new Taskinprogress[nummaptasks] and reduces = new Taskinprogress[numreducetasks];

Tasktracker Call Order: Main----new Tasktracker calls the Initialize method in its construction method, Call Rpc.waitforproxy in the Initialize method to get a Jobtracker proxy object and then Tasktracker call its own Run method,--> Offerservice Method-- The Transmitheartbeat return value is (Heartbeatresponse) is the jobtracker instruction, In the Transmitheartbeat method, Intertrackerprotocol calls heartbeat to send the Tasktracker state through the RPC mechanism to Jobtracker, and the return value is the instruction of Jobtracker. Heartbeatresponse.getactions () Gets the specific instruction, then determines the specific type of the instruction, starts to perform the task Addtotaskqueue the start type instruction joins the queue, tasklauncher the task to join the task queue  ,--> Tasklauncher Run Method--Startnewtask Method--Localizejob Download resource--launchtaskforjob Start Load task--Launchtask  --Runner.start () start thread; --Taskrunner Call the Run method and launchjvmandwait start the Java child process

MapReduce Source Process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.