Spark Learning notes-using spark history Server

Source: Internet
Author: User

When running the spark application, driver provides a webui to give the application run information, but the WebUI closes the port as the application completes, meaning that the application's history cannot be viewed after the spark application has finished running. The Spark history server was created to deal with this situation by configuring the spark application to write the application's run information to the specified directory after running the application, and spark history The server can load and Web-run information for users to browse.

to use the history server, you need to configure the following parameters (configured in conf/spark-defaults.conf ) for the client submitting the application:
spark.eventLog.enabled  True
Spark.eventLog.dir HDFs://hadoop1:8000/sparklogs Spark.yarn.historyServer.address hadoop1:18080

Enter $spark_home/sbin path

./start-all.sh.

Note: The console display will fail to start

[Email protected]:/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin$./start-history-server.sh starting Org.apache.spark.deploy.history.HistoryServer, logging to/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.history.historyserver-1-node4.outfailed to launch Org.apache.spark.deploy.history.HistoryServer:at Org.apache.spark.deploy.history.FsHistoryProvider.<init> (Fshistoryprovider.scala: A)      ... 6morefull Log in/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.history.historyserver-1-node4.out

Find the log file and find an error Logging directory must be specified
workaround : When starting the historyserver, you need to add parameters to indicate where log is stored, for example, we configured the spark-default.conf in the storage path is Hdfs://hadoop1:8000/ Sparklogs
There are two ways to solve the problem
1. Change the start command to

start-history-server.sh hdfs://node4:9000/directory

2. Start command unchanged, add in conf/spark-env.sh

Export spark_history_opts="-dspark.history.ui.port=18080-dspark.history.retainedapplications=3- Dspark.history.fs.logdirectory=hdfs://node4:9000/directory "

This way, after you start Historyserver, you can see the Web page by opening http://node4:18080 in the browser

Attached: Configuring Parameters in Conf/spark-defaults.conf

History server-related configuration parameter description

1) spark.history.updateInterval
Default value: 10
The interval, in seconds, for updating log-related information

2) spark.history.retainedApplications
Default value: 50
The number of application history is saved in memory, and if this value is exceeded, the old application information is deleted and the page needs to be rebuilt when the deleted app information is accessed again.

3) Spark.history.ui.port
Default value: 18080
Historyserver's web Port

4) spark.history.kerberos.enabled
Default value: False
It is useful to use Kerberos to log on to access Historyserver, which is helpful for persistent tiers in HDFs on secure clusters, and if set to True, configure the following two properties

5) Spark.history.kerberos.principal
Default value: The Kerberos Principal name for Historyserver

6) Spark.history.kerberos.keytab
Kerberos keytab file location for Historyserver

7) spark.history.ui.acls.enable
Default value: False
Whether the ACL is checked when the user is authorized to view the application information. If enabled, only the application owner and the user specified by Spark.ui.view.acls can view the application information;

8) spark.eventLog.enabled
Default value: False
Whether spark events are logged for the application to refactor after completion WebUI

9) Spark.eventLog.dir
Default value: File:///tmp/spark-events
The path that holds information about the log, either an HDFs path beginning with hdfs://or a local path at the beginning of the file://, need to be created in advance

)spark.eventLog.compress
Default value: False
Whether to compress the record spark event, if spark.eventLog.enabled is true, the default is to use snappy

The configuration that begins with Spark.history is configured in spark-env.sh spark_history_opts, with the Spark.eventlog beginning with the spark-defaults.conf

Spark Learning notes-using spark history Server

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.