When running the spark application, driver provides a webui to give the application run information, but the WebUI closes the port as the application completes, meaning that the application's history cannot be viewed after the spark application has finished running. The Spark history server was created to deal with this situation by configuring the spark application to write the application's run information to the specified directory after running the application, and spark history The server can load and Web-run information for users to browse.
to use the history server, you need to configure the following parameters (configured in conf/spark-defaults.conf ) for the client submitting the application:
spark.eventLog.enabled True
Spark.eventLog.dir HDFs://hadoop1:8000/sparklogs Spark.yarn.historyServer.address hadoop1:18080
Enter $spark_home/sbin path
./start-all.sh.
Note: The console display will fail to start
[Email protected]:/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin$./start-history-server.sh starting Org.apache.spark.deploy.history.HistoryServer, logging to/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.history.historyserver-1-node4.outfailed to launch Org.apache.spark.deploy.history.HistoryServer:at Org.apache.spark.deploy.history.FsHistoryProvider.<init> (Fshistoryprovider.scala: A) ... 6morefull Log in/usr/local/spark/spark-1.1.0-bin-hadoop2.4/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.history.historyserver-1-node4.out
Find the log file and find an error Logging directory must be specified
workaround : When starting the historyserver, you need to add parameters to indicate where log is stored, for example, we configured the spark-default.conf in the storage path is Hdfs://hadoop1:8000/ Sparklogs
There are two ways to solve the problem
1. Change the start command to
start-history-server.sh hdfs://node4:9000/directory
2. Start command unchanged, add in conf/spark-env.sh
Export spark_history_opts="-dspark.history.ui.port=18080-dspark.history.retainedapplications=3- Dspark.history.fs.logdirectory=hdfs://node4:9000/directory "
This way, after you start Historyserver, you can see the Web page by opening http://node4:18080 in the browser
Attached: Configuring Parameters in Conf/spark-defaults.conf
History server-related configuration parameter description
1) spark.history.updateInterval
Default value: 10
The interval, in seconds, for updating log-related information
2) spark.history.retainedApplications
Default value: 50
The number of application history is saved in memory, and if this value is exceeded, the old application information is deleted and the page needs to be rebuilt when the deleted app information is accessed again.
3) Spark.history.ui.port
Default value: 18080
Historyserver's web Port
4) spark.history.kerberos.enabled
Default value: False
It is useful to use Kerberos to log on to access Historyserver, which is helpful for persistent tiers in HDFs on secure clusters, and if set to True, configure the following two properties
5) Spark.history.kerberos.principal
Default value: The Kerberos Principal name for Historyserver
6) Spark.history.kerberos.keytab
Kerberos keytab file location for Historyserver
7) spark.history.ui.acls.enable
Default value: False
Whether the ACL is checked when the user is authorized to view the application information. If enabled, only the application owner and the user specified by Spark.ui.view.acls can view the application information;
8) spark.eventLog.enabled
Default value: False
Whether spark events are logged for the application to refactor after completion WebUI
9) Spark.eventLog.dir
Default value: File:///tmp/spark-events
The path that holds information about the log, either an HDFs path beginning with hdfs://or a local path at the beginning of the file://, need to be created in advance
)spark.eventLog.compress
Default value: False
Whether to compress the record spark event, if spark.eventLog.enabled is true, the default is to use snappy
The configuration that begins with Spark.history is configured in spark-env.sh spark_history_opts, with the Spark.eventlog beginning with the spark-defaults.conf
Spark Learning notes-using spark history Server