Introduction to Hadoop2.2.0 pseudo-distributed MapReduce

Last Update:2016-05-27 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

a concept,
MapReduce is a distributed computing model. Note: In hadoop2.x, MapReduce runs on yarn, and yarn supports a variety of operational models. Storm, Spark, and so on, any program running on the JVM can run on yarn. Mr has two phases, map and reduce, and users only need to implement the map () and reduce () two functions ( and the inputs and outputs of both functions are in the form of Key-value)Distributed computing can be implemented. The code sample is slightly.

mapreduce Design Framework: in 1.0:, the manager:Job Tracker; managed by: Task Tracker: after the 2.0: ManagerResourceManager, managed by: Node Manager. Use yarn after 2.0
The MapReduce on yarn includes more examples than the traditional mapreduce: mainly as follows

Client submitting a MapReduce job
Yarn Resource Manager, responsible for coordinating the allocation of compute resources on the cluster,
Yarn Node Manager, responsible for starting and monitoring compute containers (Container) on machines in the cluster
MapReduce Application Master,
Distributed File System HDFs, which is used to share job files with other instances.

II. Implementation of the MapReduce process:
the input and output are in HDFs, the resource data is in HDFs, and the result is also stored in HDFs.
1, there are three main stages; map-"shuffle-" reducer, 2, where the partition sorting grouping (Group BY Key value is done in shuffle, reducer will go to shuffle to fetch the data. 3. The input and output of map and reducer are key-value 1. Map Task execution Steps (brief): 1, read the input file contents, parse into key, value pair. For each line of the input file, parse to key, value pair. The value of key is the offset of the current start of the letter, the value of the current line's contents 2. Call the map function once for each key-value pair. 3, write their own logic, the input key, value processing, converted into a new key, value output. 4. Partition the output key and value. 5, for different partitions of data, according to key sorting, grouping. The value of the same key is placed in a collection. 6. (optional) The data after grouping is normalized. 2. Reduce task execution Steps (brief):1, the output of multiple map tasks, according to different partitions, through the network copy to the different reduce node. 2, the output of multiple map tasks are merged, sorted. Write the reduce function's own logic, the input key, value processing, converted to a new key, value output. 3. Save the output of reduce to a file

3, Shuffle implementation process (Overview): Because of the length of the article in the end.

third, the process of MapReduce in Hadoop1.0: (There are some differences in 2.0)
1. After writing the jar package, the client runs a Hadoop jar command to execute the main method in the MapReduce task, and this main method constructs a Job object that actually holds a jobclient, Jobclient holds ResourceManager proxy object, so jobclient can communicate with ResourceManager, after communication, ResourceManager will give Jobclient a Jobid and a path to store the jar package (relatively fixed),2, the client will ResourceManager to his jar package path as a prefix, jobid as a suffix two paths together as the only Path, 3. The client writes the task jar package to the HDFs file system based on the path the stitching gets. The client has a FileSystem object (that is, a tool class provided by Hadoop) that can be used by the client to write the jar package to HDFs, which by default will write 10 copies of the source package: Hadoop-mapreduce-client-core-2.2.0.jar/mapred-default.xmlIn the mapreduce.client.submit.file.replicationparameter), which increases the efficiency of the NodeManager to read the jar package to HDFs (other data is written by default of three copies). After the program runs, the jar package is deleted 4, the client will include the task information: Jobid and submit to the HDFs location and some configuration information, etc., through the RPC method parameters submitted to ResourceManager, So ResourceManager will have the description of the task 5, ResourceManager get the description of the task, the information is initialized into its scheduler (Hadoop has a variety of schedulers), 6, ResourceManager view data (compute resources) How big, decide how many mapper and how many reducer to start, then put the task in the dispatch (Hadoop has a variety of scheduler), 7, after the younger brother NodeManager can be through the heartbeat mechanism to pick up the task 8, Brother NodeManager back to the task after picking up the task jar package, 9, after downloading the jar package, NodeManager will also start a Java subprocess yarnchild (run Mapper task or reduce task in this child process), After the mapper data is parsed into key-value in the child process, it is passed to reduce,10, and the reduce computation is completed before the data is written back to HDFs. Iv. MapReduce in the hadoop2.x Introduction to each part: ResourceManager, RM: Manage resource managers for resource use on the cluster: Application Master, AM: Manage the Application Manager for the task declaration cycle on the cluster: Application Server Ma and Resource Manager RM negotiate compute resources for the cluster: container ( Container, each container has a specific memory on-line), the process of running a particular application on those containers, the container is run by the Node Manager on the cluster node Node ManagerMonitoring to ensure that the resources used by the application do not exceed those allocated to him. Nodemagnager: Manage resources and tasks on each node, with two main roles, regularly RMReport the resource usage of the node and the individual ContainerThe state of operation; receive and process AMThe start of the task is stopped and so on. Each mapreduce job applied has a dedicated application master, he runs during the run of the application, and the MapReduce task is in the task container ( Container), these containers are run by the resource Manager ResourceManagerAssigned and by the node Manager NodeManagerTo manage.
hadoop2.0 Execution Process:
First, the job submission; Step 1-Step 4 The job's submit () method creates an internal Jobsubmiter instance and calls its submitjobinternal () to poll for the progress of the job every second, and if it finds a change since the last report, reports the progress to the console, after the job is completed, If the job counter is displayed successfully, the failure will cause the job to fail with the error logged to the console. Specific as follows:

- Get the new job ID from Explorer, in yarn nomenclature it has an application. (Step 2)
- Check the output description of the job, for example, if the program does not specify an output directory or the specified output directory already exists, the job is not committed and the error is returned to the MapReduce program
- Calculates the input shard of the job, if it cannot be computed, for example, because the input path does not exist, the job cannot commit and returns the error to the MapReduce program
- The job client checks the output description of the job, computes the input shard, and copies the job resource (including jar, configuration, and shard information) to HDFs, saving 10 copies by default. (Step 3)
- Finally, the job is submitted by calling the Submitapplication () method on the resource manager. (Step 4)

Second, the initialization of the operation; Step 5-Step 7

- After the resource manager receives the Submitapplication () message that called it, it passes the request to the Scheduler (Schedule), the scheduler assigns a container, and the resource Manager starts the application master process in the container under the management of the Node Manager
- The application master of the MapReduce job is a Java application whose main class is Mrappmaster He initializes the job by creating multiple thin objects to keep track of the progress of the job. Because he will accept progress from the task and complete the report (step 6)
- Next application Master accepts input shards computed from the client in the shared file system HDFs and creates a map task object for each shard and multiple reduce task objects determined by the Mapreduce.job.reduces property. (Step 7)
- Application Master decides how to run the various tasks that make up the mapreduce job, and if the job is small, choose to run it on the same JVM. Tasks at this point are called uberized or Uber tasks. The Uber task specifically refers to a task that is less than 10 mapper and has only one reducer and the input size is less than a single HDFs block.

iii. Assignment of tasks;Step 8

- If the job does not work together for the Uber task, application master will request the container for all map tasks and reduce tasks in the job to the resource manager ResourceManager. Step 8
- The following: The request to attach the heartbeat information includes the data localization information for each map task, especially the host and the corresponding rack information, the scheduler uses this information to make scheduling decisions, use the return value of the heartbeat to communicate with it, ideally, assign his task to the data localization node, However, if this is not possible, the scheduler will prefer to use rack-localized allocations relative to non-localized allocations.

IV. Implementation of the mandate; Step 9-Step One

- Once the scheduler for the Resource Manager assigns a container to the task, application master starts the container by communicating with the node manager NodeManager. Step 9a, Step 9b
- The task is performed by a Java program with the main class Yarnchild , before it runs, localizing the resources required by the task, including the job's configuration, jar files, and all files from the distributed HDFs cache, step 10
- Last run the map task or the reduce task. Step 11

v. Progress and status updates

- When running under yarn, the task reports progress and status (including the technology) to application master every three seconds through the umbilical interface as a aggregation view of the job (aggregate view). The process client queries every second (can be set) once application Master has received progress updates, which are typically shown to the user,

Vi. completion of Operations

- In addition to the application Master query progress, the client checks the job's completion every 5 seconds by invoking the WaitForCompletion (), and the interval of the query can be set through the property.
- After the job is finished application master and the task container cleans up its working state,Outputcommitter 's job cleanup method is called, and the job history server saves the job information for the user to query when needed.

attached 1:shuffle process:
Mapper Stage 1.A slice corresponds to a mapper, each mapper has a ring buffer in memory (a space in memory default 100M), to store the output of mapper, when the buffer holds the data reached 80%, At this point, a thread starts to overflow the buffer's data to disk (note that mapper is still writing output data to the buffer) until the data in the buffer reaches 100%, and the map is blocked until the process of writing the disk is complete. 2.When the memory buffer writes data to the disk, the data is first partitioned, without a custom partition class (about partition is not discussed here), is implemented in its own default partition class Hashpartitioner (Getpartition (): return (Key.hashcode () & integer.max_value)% NumreducetasksThe number of numreducetasks for reduce, which can be set in the program itself) will not give reduce the data evenly. (where the number of reducer can be determined in the program itself job.setnumreducetasks (reducernumber); 。 three reduce is launched in aR) 3.The data will have the corresponding partition after partition, then the same partition data will be grouped together according to the partition number and deposited into small file by the size of the area code ( spill file), the background thread then sorts the data in each partition according to the KEY2 rules (if Key2 is the text type, sorted in dictionary order, if the Key2 is longwritable type, then the natural number order, if the custom type is defined by the class the class must inherit writablecomparable, which must be sortableOrdered collation) because Mapper writes data to the memory buffer faster than the buffer writes data to the disk, when the buffer is full, the mapper is blocked until the contents of the current buffer are written to a small file on disk ( spill fileIn Then turn on mapper to continue writing data to the memory buffer, and when the data in the buffer reaches 80%, a thread will be opened again to write data to the buffer on disk Another small file (spill files)In that is, each time the memory buffer reaches the overflow threshold, a small file is created spill ）。 A total of three small files (other mapper may have more small files spill file) These small files are partitioned and sorted by area code, each small (spill file) has three partitions, and the data in each partition is sorted by Key2. 4, Combiner before writing the disk, if there is a combiner, it will run on the sorted output, making Mapp's output more compact to reduce the data written to disk and the data to be passed to reducer. 5.The last small file is also merged into a large file, because of the merger so the order will certainly be disrupted, so the large files need to first by partition number to put the data together, the same partition of the data is merged, each partition and then by Key2 Sort, the final large file is also a partitioned and sorted. Finally mapper to his superiors to report, at this time Mapper task is completed,
Reducer StageReducer to take the initiative to get data on the mapper side ( How to fetch? Who to take? See belowReducer the partition of the output file by means of HTTP. Opened three reducer ( The only display is the No. 0 number reducer.), so that number NO. 0 will fetch the data from partition No. 0 in the large file generated by all mapper ( 4 mapper That is Orther maps is 3 because the number of merges in Reducer is 4), number 1th reducer to fetch data from the partition 1th in all large files, and 2nd reducer will fetch the data of partition 2nd in all large files. （ problem: Because there are many machines in the cluster that will run a lot of mapper, and each mapper will produce a large file, how does the reduce know where to fetch the file data? See below) Number No. 0 reducer Gets the partition data from a large file from 4 map ( the 4 mege in the figureThe data is then merged into a large file and sorted, and the final data is computed to reducer,reducer and the results are written back to HDFs.
Reducer How to get Data: Version 1.0 In hadoop1.0, the manager of MapReduce Jobtrackerc it is necessary to monitor all tasks and allocate resources, that is, it determines which machine the mapper runs on, for example, if 1000 MapReduce are submitted, then these mapreduce are handed over to JobtrackerC Management. In 2.0, the function is split and the allocation of resources is given to ResourceManager, the monitoring of resources is given to Applicationmaster, at this point, if you submit 1000 MapReduce, each mapreduce will have a applicationmaster. Brother's process called Tasktracker,tasktracker through the heartbeat process to jobtracker pick up the task, pick up the task will start a javachild child process, And the process name is called child (not yarnchild) a machine can run multiple child processes. Run mapper or reducer in child.
For the child process that runs mapper, when the child runs out of mapper, the resulting data will be placed on the disk of the machine, about the size of mapper output data, storage location, the specific location of each partition and other mapping description information, will report to its superior leader Tasktracker, Then Tasktracker will report to Jobtracker. Other child running Mapper will also mapper the data generated by the mapping information to the superior report, and eventually Jobtracker get all the mapper generated data mapping information. （ communication between the use of RPC）
Reducer fetching data running reducer a background thread in the child will constantly ask Jobtracker (over their leader tasktracker to improve efficiency), through the RPC mechanismGets the mapping of the mapper output data to learn which disk to download data from, andThe download data is via the HTTP protocol, because reducer may fail, so Tasktracker did not delete them from the disk when the first reducer retrieved the map output, instead Tasktracker waits, Reducer after the completion of the implementation of Jobtracker report, until Jobtracker informed Tasktracker You can delete the map output, which is performed after the job is completed.
Reducer How to get data: 2.x version The 2.0 provides a platform yarn that runs on yarn as long as the rules of yarn are met. Yarn's manager ResourceManager, yarn is managed by the manager NodeManagerRun NodemapperThe machine on the process will start when the task is received. YarnchildProcess (multiple Yarnchild can be started on a NodeManager node), but NodeManager is not responsible for managing Yarnchild,yarnchild management is determined by the MrappmasterProcess is managed, each MapReduce task corresponds to a MrappmasterProcess. The NodeManager is only responsible for managing the state of the current node (that is, the machine running NodeManager), such as memory usage, CPU usage, and so on.
Assuming that there are three machines running NodeManager with a large amount of data, three yarnchild processes need to be started ( run a yarnchild on each machine, if only one machine is running NodeManager three yarnchild processes are started on this machine at this timeTwo of them run mapper a run reducer,Mrappmaster are randomly assigned toOne of three yarnchild, and only one machine will run MrappmasterProcess for monitoring yarnchild that belong to the same MapReduce task. Because it is the communication between different machines, at this time the mrappmaster process needs to monitor the yarnchild process on other machines through RPC, and when the mapper run is completed, the output data will be placed on the native disk after the Yarnchild is run. The mapping information for the data is then reported to MrappmasterProcess, while the yarnchild running reducer will MrappmasterQuery the mapping information for the data.
Attachment 2:number of mapper starts: The key now is to determine the number of mapper, that is, the number of files being sliced. After the file is uploaded to HDFs, it is first physically segmented by block storage (default 64M per block size in 2.x, 128m,1.x), and then the block is logically sliced and sliced into multiple slices. By default, the size of the slice is equal to the size of the block 128M, to make the tile size smaller than the size of the block, at this point, maxsize to make the tile size larger than the size of the block, then modify the size of the mixsize. A slice is equal to a block size, which helps to improve efficiency. This allows the mapper to fetch data directly from the Datamanage on its own machine. For example, if the current MapReduce input file has two: A.txt size is 130m;b.txt size is 2K. 3 Mapper are started at this time. A.txt will start two, B.txt will start one.

Introduction to Hadoop2.2.0 pseudo-distributed MapReduce

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More