Several problems encountered by spark on Yarn

Last Update:2014-08-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 Overview

In the on Yarn mode of spark, resource allocation is handed over to the ResourceManager of yarn for management. However, the current spark version and Application Log can only be viewed through the yarn logs command of yarn.

If you do not pay attention to some small details during the deployment and running of spark application, some problems may occur.

2 firewall

If the spark package and configuration file are deployed, the on Yarn mode cannot be run. The logs on the nodemanager end indicate connection refused and cannot connect to the client node where the driver is located, however, port 80 of the client can be accessed normally! At the same time, similar information appears in the log:

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

The memory is enough, but the resource cannot be obtained! Check the firewall. The client only enables access to port 80, and the rest are disabled! If a connection is rejected when your program is running, check the firewall configuration first!

3. Specify the spark driver program host

After spark is deployed, run the computation PI example in yarn-cluster mode and yarn-client mode respectively.

Some configuration files of spark are not configured except some basic attributes. The two running modes are different when the results are run. Yarn-cluster mode can run normally, and yarn-client mode always fails to run. Check the logs of ResourceManager and nodemanager and find that the applicationmaster is always not found in the program. This is strange! In addition, the port opened by the driver program on the client is rejected at the nodemanager end! Other Mr tasks other than spark can be executed normally.

Check the client configuration file and find that in the/etc/hosts file of the client, one client IP address corresponds to multiple hosts, and the driver program will retrieve the last host by default, for example, it is hostb, but it is another host and Hosta configured on the nodemanager end, so the program cannot be accessed. In order not to affect other programs using the client's host list, the property spark is used here in the spark profile spark-defaults.conf. driver. in yarn-client mode.

After the above configuration is complete, it is found that the yarn-cluster mode cannot run again! Think about the reason, it must be the ghost of the above configuration parameter. After the comment is made, the yarn-cluster mode can continue to run. The reason is that in yarn-cluster mode, the spark entry function runs on the client, but other driver functions run in applicationmaster, the above configuration is equivalent to specifying the address of applicationmaster. In fact, applicationmaster is randomly specified by ResourceManager in yarn-master mode.

4. View yarn logs

In the test environment, you can use yarn logs-applicationid XXX to view the log of the end application. However, the following message is always displayed when you run the preceding command to view logs in another environment:

Logs not available at/tmp/nm/remote/logs/hadoop/logs/application_xxx_xxx

Log aggregation has not completed or is not enabled.

The corresponding nodemanger directory does not find the log file. But/tmp/nm/remote/logs is the directory specified in the yarn-site.xml, this is correct, what is the reason? Does yarn log aggregation not work?

Go to nodemanager to view the logs of the corresponding application:

2014-08-04 09:14:47,513 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Starting aggregate log-file for app application_xxx_xxx at /tmp/nm/remote/logs/spark/logs/application_xxx_xxx/hostB.tmp2014-08-04 09:14:47,525 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_xxx_xxx_01_000007. Current good log dirs are /data/nm/log2014-08-04 09:14:47,526 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_xxx_xxx_000001. Current good log dirs are /data/nm/log2014-08-04 09:14:47,526 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /data/nm/log/application_xxx_xxx2014-08-04 09:14:47,607 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Finished aggregate log-file for app application_xxx_xxx

It can be seen that log aggregation does work, but why cannot it be viewed through commands! Suddenly, you can see "/tmp/nm/remote/logs/spark/logs/application_xxx_xxx/hostb" in the log. TMP ", there is a problem with the Log Path. When you use the yarn logs command to view the log, hadoop users are used, and spark users are used to submit and execute the actual spark application, the yarn logs Command finds the path of the current user by default, which is why the log cannot be viewed. Switch to the spark user and check the log!

Section 5

The problem is serious, and the application cannot run directly, but the cause of the problem is relatively small. In the final analysis, the environment is complicated during deployment, not careful enough! Try again! In the future, we will continue to update related problems, so that you can easily meet similar problems!

-------------------------------------------------------------------------------

If you have read this blog and want to learn more, click[Recommended]

If you want to repost this blog,Please specify the source

If you have any comments or suggestions for this article, please leave a message.

Thank you for reading this article. Please follow up on my blog

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Several problems encountered by spark on Yarn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Several problems encountered by spark on Yarn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support