Spark Learning notes-using spark history Server

Last Update:2015-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When running the spark application, driver provides a webui to give the application run information, but the WebUI closes the port as the application completes, meaning that the application's history cannot be viewed after the spark application has finished running. The Spark history server was created to deal with this situation by configuring the spark application to write the application's run information to the specified directory after running the application, and spark history The server can load and Web-run information for users to browse.

to use the history server, you need to configure the following parameters (configured in conf/spark-defaults.conf ) for the client submitting the application:

spark.eventLog.enabled  True
Spark.eventLog.dir      HDFs://hadoop1:8000/sparklogs   Spark.yarn.historyServer.address    hadoop1:18080

Enter $spark_home/sbin path

./start-all.sh.

Note: The console display will fail to start

[Email protected]:/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin$./start-history-server.sh starting Org.apache.spark.deploy.history.HistoryServer, logging to/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.history.historyserver-1-node4.outfailed to launch Org.apache.spark.deploy.history.HistoryServer:at Org.apache.spark.deploy.history.FsHistoryProvider.<init> (Fshistoryprovider.scala: A)      ... 6morefull Log in/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.history.historyserver-1-node4.out

Find the log file and find an error Logging directory must be specified
workaround : When starting the historyserver, you need to add parameters to indicate where log is stored, for example, we configured the spark-default.conf in the storage path is Hdfs://hadoop1:8000/ Sparklogs
There are two ways to solve the problem
1. Change the start command to

start-history-server.sh hdfs://node4:9000/directory

2. Start command unchanged, add in conf/spark-env.sh

Export spark_history_opts="-dspark.history.ui.port=18080-dspark.history.retainedapplications=3- Dspark.history.fs.logdirectory=hdfs://node4:9000/directory "

This way, after you start Historyserver, you can see the Web page by opening http://node4:18080 in the browser

Attached: Configuring Parameters in Conf/spark-defaults.conf

History server-related configuration parameter description

1) spark.history.updateInterval
Default value: 10
The interval, in seconds, for updating log-related information

2) spark.history.retainedApplications
Default value: 50
The number of application history is saved in memory, and if this value is exceeded, the old application information is deleted and the page needs to be rebuilt when the deleted app information is accessed again.

3) Spark.history.ui.port
Default value: 18080
Historyserver's web Port

4) spark.history.kerberos.enabled
Default value: False
It is useful to use Kerberos to log on to access Historyserver, which is helpful for persistent tiers in HDFs on secure clusters, and if set to True, configure the following two properties

5) Spark.history.kerberos.principal
Default value: The Kerberos Principal name for Historyserver

6) Spark.history.kerberos.keytab
Kerberos keytab file location for Historyserver

7) spark.history.ui.acls.enable
Default value: False
Whether the ACL is checked when the user is authorized to view the application information. If enabled, only the application owner and the user specified by Spark.ui.view.acls can view the application information;

8) spark.eventLog.enabled
Default value: False
Whether spark events are logged for the application to refactor after completion WebUI

9) Spark.eventLog.dir
Default value: File:///tmp/spark-events
The path that holds information about the log, either an HDFs path beginning with hdfs://or a local path at the beginning of the file://, need to be created in advance

)spark.eventLog.compress
Default value: False
Whether to compress the record spark event, if spark.eventLog.enabled is true, the default is to use snappy

The configuration that begins with Spark.history is configured in spark-env.sh spark_history_opts, with the Spark.eventlog beginning with the spark-defaults.conf

Spark Learning notes-using spark history Server

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Learning notes-using spark history Server

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support