CDH Version: 5.10.0IDE Environment: Win7 64-bit MyEclipse2015Spark mode: YarnCommit mode: Yarn-clientBefore the same IDE environment, to the alone mode Spark submission task, has been very smooth, today, measured spark on yarn mode, the submission can only be yarn-client mode, the other basic unchanged, just changed mode, resulting in the following error:Java.io.
1) IntroductionFor MRV1, there are obvious shortcomings in the support of expansibility, reliability, resource utilization and multi-framework, and then the next generation of MapReduce's computational framework MapReduce Version2 is born. There is a big problem in MRV1 is that the resource management and job scheduling are thrown to the jobtracker, resulting in a serious single point bottleneck problem, all MRV2 mainly at this point of improvement, he has the resource management module built in
The company's recent spark cluster was migrated from the original standalone to spark on yarn, when migrating related programs, found that the adjustment is still some, the following is a partial shell command submitted in two versions, from the command can see the difference, the difference is mainly spark on Yarn does not work the same way, resulting in a different way of submitting it.The script for the
, Applicationmaster and NodeManager three parts.Let's explain these three parts in detail,First ResourceManager is a center of service, it does the thing is to dispatch, start each Job belongs to the Applicationmaster, another monitoring applicationmaster the existence of the situation. Careful readers will find that the tasks inside the Job are monitored, restarted, and so on. This is the reason why Appmst exists.ResourceManager is responsible for the scheduling of jobs and resources. Receive J
Basic Structure of Yarn
Composed of master and slave, one ResourceManager corresponds to multiple nodemanagers;
Yarn consists of client, ResourceManager, nodemanager, and applicationmaster;
The client submits and kills tasks to ResourceManager;
Applicationmaster is completed by the corresponding application. Each application corresponds to an applicationmaster. applicationmaster applies for resources from R
The recent move from Hadoop 1.x to Hadoop 2.x has also reduced the code on the platform by converting some Java programs into Scala, and, in the implementation process, the deployment of some spark-related yarn is based on the previous Hadoop 1.x partial approach, There is basically no need to deploy this on the Hadoop2.2 + version. The reason for this is Hadoop YARN Unified resource Management.On the Spark
Execute the following command under Hadoop 2.7.2 cluster:Spark-shell--master Yarn--deploy-mode ClientThe following error has been burst:Org.apache.spark.SparkException:Yarn application has already ended! It might has been killed or unable to launch application master.On the Yarn WebUI view the cluster status of the boot, log is displayed as:Container [pid=28920,containerid=container_1389136889967_0001_01_00
MRv1 Disadvantages
1, Jobtracker easily exist single point of failure
2, Jobtracker Burden, not only responsible for resource management, but also for job scheduling; When you need to handle too many tasks, it can cause too much resource consumption.
3, when the MapReduce job is very many, will cause the very big memory cost, inTasktracker end, the number of MapReduce task as a resource representation is too simple , not taking into account CPU and memory footprint, if two large memory consumpt
YARN is the MapReduce V2 version. It has many advantages over MapReduce V1:1. The task of Jobtracker was dispersed. Resource management tasks are the responsibility of the explorer, and job initiation, run, and monitoring tasks are responsible for the application topics distributed across the cluster nodes. This greatly reduces the problem of Jobtracker single point bottleneck and single point risk in MapReduce V1, and greatly improves the scalability
In the official introduction there is such a sentence:
Yarn is a package manager for your code. It allows to use and share code with other developers from around. Yarn does this quickly, securely, and reliably so don ' t ever have to worry.
The key meaning is fast, safe and reliable. The package you downloaded will not be downloaded again. And make sure you work in different systems.
Quick Install
MacOS
Yet Another Resource negotiator Introduction
Apache Hadoop with MapReduce is the backbone of distributed data processing. With its unique horizontal expansion of the physical cluster architecture and the fine processing framework originally developed by Google, Hadoop has exploded in the new field of large data processing. Hadoop also developed a rich variety of application ecosystems, including Apache Pig (a powerful scripting language) and Apache Hive (a data warehouse solution with a similar
The source code for Hadoop 2.0 implements two yarn application, one is MapReduce, and the other is a sample program for how to write application----Distributedshell, It can be considered to be the Yarn Workcount sample program.
Distributedshell function, like its name, distributed shell execution, a string of shell commands submitted by the user or a shell script, controlled by Applicationmaster, assigned
HA-Federation-HDFS + Yarn cluster deployment mode
After an afternoon's attempt, I finally set up the cluster, and it didn't feel much necessary to complete the setup. So I should study it and lay the foundation for building the real environment.
The following is a cluster deployment of Ha-Federation-hdfs + Yarn.
First, let's talk about my Configuration:
The four nodes are started respectively:
1. bkjia117:
The Hadoop project that I did before was based on the 0.20.2 version, looked up the data and learned that it was the original Map/reduce model.Official Note:1.1.x-current stable version, 1.1 release1.2.x-current beta version, 1.2 release2.x.x-current Alpha version0.23.x-simmilar to 2.x.x but missing NN HA.0.22.x-does not include security0.20.203.x-old Legacy Stable Version0.20.x-old Legacy VersionDescription0.20/0.22/1.1/CDH3 Series, original Map/reduce model, stable version0.23/2.X/CDH4 series,
Learn the difference between mapreduceV1 (previous mapreduce) and mapreduceV2 (YARN) We need to understand MapreduceV1 's working mechanism and design ideas first.First, take a look at the operation diagram of the MapReduce V1The components and functions of the MapReduce V1 are:Client: Clients, responsible for writing MapReduce code and configuring and submitting jobs.Jobtracker: Is the core of the entire MapReduce framework, similar to the Dispatcher
, NodeManager:Is the framework agent on each node, primarily responsible for launching the containers required by the application, monitoring the use of resources (memory, CPU, disk, network, etc.) and reporting them to the scheduler.3, Applicaionmanager:It is primarily responsible for receiving jobs , negotiating to get the first container to perform applicationmaster and providing services to restart failed AM container.4, Applicationmaster:Responsible for all work within a job life cycle, sim
Configuration recommendations:
1.In MR1, The mapred. tasktracker. Map. Tasks. Maximum and mapred. tasktracker. Reduce. Tasks. Maximum properties dictated how many map and reduce slots each tasktracker had.
These properties no longer exist in yarn. instead, yarn uses yarn. nodemanager. resource. memory-MB and yarn. nod
Introduced
The Apache Hadoop yarn is added to the Hadoop Common (core libraries) as a subproject of Hadoop, Hadoop HDFS (storage) and Hadoop MapReduce (the MapReduce implementation), it is also the top project of Apache.
In Hadoop 2.0, each client submits various MapReduce applications to the MapReduce V2 framework running on yarn. In Hadoop 1.0, each client submits a maprecude application to the MapReduc
Spark version: spark-1.1.0-bin-hadoop2.4 (download: http://spark.apache.org/downloads.html)
For more information about the server environment, see the previous blogNotes on configuration of hbase centos production environment
(Hbase-R is ResourceManager; hbase-1, hbase-2, hbase-3 is nodemanager)
1. installation and configuration (yarn-cluster mode Documentation Reference: http://spark.apache.org/docs/latest/running-on-yarn.html)
Run the program in
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.