Several problems encountered by spark on Yarn

Source: Internet
Author: User
1 Overview

In the on Yarn mode of spark, resource allocation is handed over to the ResourceManager of yarn for management. However, the current spark version and Application Log can only be viewed through the yarn logs command of yarn.

If you do not pay attention to some small details during the deployment and running of spark application, some problems may occur.

2 firewall

If the spark package and configuration file are deployed, the on Yarn mode cannot be run. The logs on the nodemanager end indicate connection refused and cannot connect to the client node where the driver is located, however, port 80 of the client can be accessed normally! At the same time, similar information appears in the log:

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

The memory is enough, but the resource cannot be obtained! Check the firewall. The client only enables access to port 80, and the rest are disabled! If a connection is rejected when your program is running, check the firewall configuration first!

3. Specify the spark driver program host

After spark is deployed, run the computation PI example in yarn-cluster mode and yarn-client mode respectively.

Some configuration files of spark are not configured except some basic attributes. The two running modes are different when the results are run. Yarn-cluster mode can run normally, and yarn-client mode always fails to run. Check the logs of ResourceManager and nodemanager and find that the applicationmaster is always not found in the program. This is strange! In addition, the port opened by the driver program on the client is rejected at the nodemanager end! Other Mr tasks other than spark can be executed normally.

Check the client configuration file and find that in the/etc/hosts file of the client, one client IP address corresponds to multiple hosts, and the driver program will retrieve the last host by default, for example, it is hostb, but it is another host and Hosta configured on the nodemanager end, so the program cannot be accessed. In order not to affect other programs using the client's host list, the property spark is used here in the spark profile spark-defaults.conf. driver. in yarn-client mode.

After the above configuration is complete, it is found that the yarn-cluster mode cannot run again! Think about the reason, it must be the ghost of the above configuration parameter. After the comment is made, the yarn-cluster mode can continue to run. The reason is that in yarn-cluster mode, the spark entry function runs on the client, but other driver functions run in applicationmaster, the above configuration is equivalent to specifying the address of applicationmaster. In fact, applicationmaster is randomly specified by ResourceManager in yarn-master mode.

4. View yarn logs

In the test environment, you can use yarn logs-applicationid XXX to view the log of the end application. However, the following message is always displayed when you run the preceding command to view logs in another environment:

Logs not available at/tmp/nm/remote/logs/hadoop/logs/application_xxx_xxx

Log aggregation has not completed or is not enabled.

The corresponding nodemanger directory does not find the log file. But/tmp/nm/remote/logs is the directory specified in the yarn-site.xml, this is correct, what is the reason? Does yarn log aggregation not work?

Go to nodemanager to view the logs of the corresponding application:

2014-08-04 09:14:47,513 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Starting aggregate log-file for app application_xxx_xxx at /tmp/nm/remote/logs/spark/logs/application_xxx_xxx/hostB.tmp2014-08-04 09:14:47,525 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_xxx_xxx_01_000007. Current good log dirs are /data/nm/log2014-08-04 09:14:47,526 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Uploading logs for container container_xxx_xxx_000001. Current good log dirs are /data/nm/log2014-08-04 09:14:47,526 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting path : /data/nm/log/application_xxx_xxx2014-08-04 09:14:47,607 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Finished aggregate log-file for app application_xxx_xxx

 

It can be seen that log aggregation does work, but why cannot it be viewed through commands! Suddenly, you can see "/tmp/nm/remote/logs/spark/logs/application_xxx_xxx/hostb" in the log. TMP ", there is a problem with the Log Path. When you use the yarn logs command to view the log, hadoop users are used, and spark users are used to submit and execute the actual spark application, the yarn logs Command finds the path of the current user by default, which is why the log cannot be viewed. Switch to the spark user and check the log!

Section 5

The problem is serious, and the application cannot run directly, but the cause of the problem is relatively small. In the final analysis, the environment is complicated during deployment, not careful enough! Try again! In the future, we will continue to update related problems, so that you can easily meet similar problems!

-------------------------------------------------------------------------------

If you have read this blog and want to learn more, click[Recommended]

If you want to repost this blog,Please specify the source

If you have any comments or suggestions for this article, please leave a message.

Thank you for reading this article. Please follow up on my blog

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.