First, the basic conceptIn MapReduce, an application that is ready to commit execution is called a job, and a unit of work that is divided from one job to run on each compute node is called a task. In addition, the Distributed File System (HDFS) provided by Hadoop is responsible for the data storage of each node and achieves high throughput data reading and writing.Hadoop is a master/slave (Master/slave) architecture for distributed storage and distributed computing. On a fully configured cluste
Hadoop version 1.0.3Problem Description:As the number of daily Mr Jobs increases, users often block when submitting jobs, which means that Jobtracker has been congested. This situation began to occur frequently, we adjust the number of RPC handler threads on the Jobtracker side, and periodically to the Jobtracker stack information analysis, if the RPC handler thr
Analysis of jobtracker restart job recovery process
1. Related configuration items of job recovery
Configuration item
Default Value
Description
Mapred. jobtracker. Restart. Recover
False
If the value is true, the job running before JT restart can be restored after jobtracker restart. If the value is false, the job needs to be re-run.
Check the hadoop-root-jobtracker log file in the logs directory.
2014-02-26 19:56:06, 782 FATAL org. apache. hadoop. mapred. JobTracker: java. lang. IllegalArgumentException: Does not contain a valid host
: Port authority: local
At org.apache.hadoop.net. NetUtils. createSocketAddr (NetUtils. java: 164)
At org.apache.hadoop.net. NetUtils. createSocketAddr (NetUtils. java: 130)
At org. apache. hadoop. mapred.
The implementation of MapReduce in Hadoop is also based on the Master/slave master-slave structure. Jobtracker played the role of Master, and Tasktracker played the role of slave. Master is responsible for accepting the job submitted by the client and then dispatching each subtask task on the job to run on slave and monitor them. If all failed tasks are found to run again, slave is responsible for executing each task directly.When Hadoop starts,
An overview:
(1) Hadoop MapReduce uses the master/slave structure. *master: The entire cluster is the only global manager, functions include: job management, status monitoring and task scheduling, namely, the Jobtracker in MapReduce. *slave: Responsible for the execution of the task and the return of the task status, that is, the Tasktracker in MapReduce.
Two Jobtracker analysis:
Jobtracker and Tasktracker
Jobtracker corresponds to Namenode
Tasktracker corresponds to Datanode
Datanode and Namenode are for data storage.
Jobtracker and Tasktracker are for mapreduce execution.
Several key concepts in MapReduce, MapReduce can be divided into such a few execution clues as a whole:
Jobclient,jobtrac
This afternoon, when my colleague submitted a query using hive, an execution error was thrown:
Open the jobtracker Management page and find that the number of Running jobs is zero, and the tasktracker heartbeat is normal, this exception makes me think that jobtracker may stop the service (generally, the number of jobs running in the cluster is very small), So I manually submitted a mapred task for testing,
Jobtracker corresponds to NameNodeTasktracker corresponds to DataNodeDataNode and Namenode are for data storage.Jobtracker and Tasktracker are for mapreduce execution.A few of the main concepts in MapReduce, MapReduce can be divided into such a number of execution clues:Jobclient,jobtracker and Tasktracker.1. Jobclient will use the Jobclient class to package the application's configured parameters into a ja
The Org.apache.hadoop.mapred.JobTracker class is a separate process with its own main function. Jobtracker is the core location for submitting and running Mr Tasks in a networked environment.
The main method has two major code lines:
Creating Jobtracker Objects
Jobtracker tracker = Starttracker (new jobconf ());
Start each service, including some important se
Assuming Namenode on HADOOP1, Jobtracker on HADOOP2The node where the 1.1 Namenode is located is represented by the value of the fs.default.name of the configuration file Core-site.xml.The value or the hdfs://hadoop1:9000The Jobtracker node is represented by the Mapred.job.tracker value of the configuration file MAPRED-SITE.MLX.Value modified to http://hadoop2:90011.2 Execute command on HADOOP1 hadoop-daemo
To make it easy to customize the presentation in the management interface of Hadoop (Namenode and Jobtracker), the management interface of Hadoop is implemented using proxy servlet.First of allThe constructors in Org.apache.hadoop.http.HttpServer public httpserver (string name, string bindaddress, int Port,boolean findport, Configuration conf, accesscontrollist adminsacl,connector Connector), add the following code to specify the resource bundle and U
Listener initializes job,jobtracker corresponding Tasktracker heartbeat, dispatcher assigns task's source level analysis
Jobtracker and Tasktracker after the start (jobtracker START process Source-level analysis, Tasktracker START process Source-level analysis), Tasktracker communication through the heartbeat and Jobtracker
Anyone familiar with Jobtracker knows that during Job initialization, EagerTaskInitializationListener locks JobInProgress and then performs InitTask. For details, please refer to the code. One step here is to write initial data to hdfs and flush it, while the Update Thread of Fairscheduler updates Resources in the resource pool, it holds the exclusive lock of JobTracker and Fairscheduler and then computes t
MapReduce Job submission process source level analysis (c) has already explained that the user finally calls the Jobtracker.submitjob method to submit the job to Jobtracker. The core submission method for this method is the Jobtracker.addjob (Jobid Jobid, Jobinprogress Job) method, which submits the job to the scheduler (AddJob by default) Listener Jobqueuejobinprogresslistener and Eagertaskinitializationlistener (this article only discusses the defau
Newlisp tracks jobtracker status, newlispjobtracker
The basic idea is to use newlisp to regularly download the jobtracker page, use regular expressions to parse the table elements in html, and then obtain the latest mapreduce status.
Each time the status data is obtained, it is stored in the mysql database, and tableau is used to present the status of the mapreduce cluster with reports.
This is the data of
to fundamentally address the performance bottlenecks of the old MapReduce framework, and to promote the longer-term development of the Hadoop framework, starting with the 0.23.0 release, Hadoop's MapReduce framework was completely refactored and changed radically.
the new Hadoop MapReduce framework is named MapReduceV2 or Yarn,Yarn's reconstruction of Mapreducev1, the fundamental idea is to separate the Jobtracker two main functions into a separate
Two JTCDH4.2.0) OOM problems occurred in the previous phase, leading to an error in the ETL process. Because most of the cluster parameters that were just taken over were default, the CMS related to the JVM parameters of JT was modified, at the same
framework.
2.1 Overall Structure
2.1.1 Mapper and reducer
The most basic components of mapreduce applications running on hadoop include a er and a reducer class, as well as an execution program for creating jobconf, and a combiner class in some applications, it is also the implementation of reducer.
2.1.2 jobtracker and tasktracker
They are all scheduled by one master service jobtracker and multiple slaver
/reduce framework of hadoop is also implemented based on this principle. The following describes the main components and relationships of the MAP/reduce framework.2.1 Overall Structure 2.1.1 Mapper and reducer
The most basic components of mapreduce applications running on hadoop include a er and a reducer class, as well as an execution program for creating jobconf, and a combiner class in some applications, it is also the implementation of reducer.2.1.2 jobt
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.