Topic Center

Contact Sales

Home > Others

Spark on Yarn

Last Update:2015-01-10 Source: Internet

Author: User

Tags apache mesos hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The recent move from Hadoop 1.x to Hadoop 2.x has also reduced the code on the platform by converting some Java programs into Scala, and, in the implementation process, the deployment of some spark-related yarn is based on the previous Hadoop 1.x partial approach, There is basically no need to deploy this on the Hadoop2.2 + version. The reason for this is Hadoop YARN Unified resource Management.

On the Spark official website

The spark application runs on the cluster as a separate collection of processes, and in your main program (called the driver) is Sparkcontext objects. In particular, in order to run on a cluster, sparkcontext can be connected to several types of cluster managers (Spark's own separate cluster manager or Mesos/yarn), which can allocate resources between applications. Once connected, spark needs the thread pool nodes on the cluster, which are the working processes that perform the computation and storage of application data. It then sends your application code (a file that is defined in a jar or Python and is routed to Sparkcontext) to the thread pool. Finally, the Sparkcontext sends a task to let the thread pool run. (So it is sent to the other nodes via Sparkcontext, where you just need to get sparkcontext, OK). about this architecture there are a few places to swim: 1. Each app has its own thread pool process that maintains and runs tasks across multiple threads as the entire application runs. The advantage of this is that the application is isolated from each other, that is, the scheduling aspect (each driver dispatches its own task) is also in the execution aspect (the tasks of different applications run on different JVMs). However, this also means that data cannot be shared between different spark applications (Sparkcontext instances) without the data being written to an additional storage system. 2. For potential cluster managers, spark is not known. As long as it requires the process of the thread pool and the communication between them, it is relatively easy to run even on cluster managers (for example, Mesos/yarn) that also support other applications. 3. Because the driver dispatches the task on the cluster, it should run close to the working node, preferably within the same LAN. If you want to send a request to a remote cluster, a good choice is to open an RPC for the driver so that it commits the operation to the nearest location rather than running a drive far from the working node. Cluster Management type system currently supports cluster management in 3: (1) Singleton mode A simple cluster management, which includes a very easy to build a cluster of Spark (2) Apache Mesos mode A common cluster management, Patterns that can run Hadoop MapReduce and Service Applications (3) resource management mode in Hadoop yarn mode Hadoop 2.0 In fact, the Amazon Spark's EC2 startup script in EC2 (Amazon Elastic Compute Cloud) makes it easy to start a singleton mode. A recommended way to publish code to a cluster publish code to a cluster is through SparkcontExt constructor, this constructor can generate a Jar file list (Java/scala) or an. egg file and a. zip package file (Python) for the work node. You can also execute Sparkcontext.addjar and AddFile to dynamically create a send file. Monitor Each driver has a web UI, typically on port 4040, you can see information about the tasks that are running, the size of the program and the storage space, and so on. You can enter a simple URL in the browser to access the:http://< driver node >::4040. The monitor can also guide the description of other monitoring information. (If you are using Spark YARN mode, only run spark to see the UI page, you stop, the log data is gone, but you can persist the log). Task Scheduler spark can be allocated by resource in the application (cluster management level) and application (if there are multiple calculation instructions in the same sparkcontext).

Spark on Yarn

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

install apache spark on ubuntu hortonworks yarn yarn separator yarn warehouse tsc yarn unravel yarn sublime yarn

OpenGL Series Tutorial Eight: OpenGL vertex buffer Object (VBO) 07-26

Methods for generating various waveform files Vcd,vpd,shm,fsdb 02-11

Mac Ping:sendto:Host is down Ping does not pass other people'... 09-01

Solution to the problem that WordPress cannot be opened after... 12-05

(SOLR is successfully installed on the office machine accordi... 12-07

Webmaster resources (site creation required) 12-07

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark on Yarn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support