Spark initiates history task recording process, error Logging directory must be specified resolved

Source: Internet
Author: User

I recently installed spark stand-alone mode on my computer, spark started without any problems, but when I started spark history, I had an error message as follows:

Spark assembly have been built with Hive, including DataNucleus jars on Classpathspark Command:/usr/local/java/jdk1.7.0_67 /BIN/JAVA-CP::/usr/local/spark/conf:/usr/local/spark/lib/spark-assembly-1.1.0-hadoop2.4.0.jar:/usr/local/spark /lib/datanucleus-core-3.2.2.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.1.jar:/usr/local/spark/lib/ datanucleus-rdbms-3.2.1.jar:/usr/local/hadoop/etc/hadoop-xx:maxpermsize=128m-dspark.akka.loglifecycleevents= true-xms512m-xmx512m org.apache.spark.deploy.history.historyserver========================================15/01 /07 15:08:55 INFO history. Historyserver:registered signal handlers for [term, HUP, int]15/01/07 15:08:55 INFO Spark. Securitymanager:changing View ACLs to:root,15/01/07 15:08:55 INFO Spark. Securitymanager:changing Modify ACLs to:root,15/01/07 15:08:55 INFO Spark. SecurityManager:SecurityManager:authentication Disabled; UI ACLs Disabled; Users with view Permissions:set (root,); Users with modify Permissions:set (root,) Exception IN Thread "main" java.lang.reflect.InvocationTargetException at Sun.reflect.NativeConstructorAccessorImpl.newInstanc        E0 (Native Method) at Sun.reflect.NativeConstructorAccessorImpl.newInstance ( At Sun.reflect.DelegatingConstructorAccessorImpl.newInstance ( at J Ava.lang.reflect.Constructor.newInstance ( at org.apache.spark.deploy.history.historyserver$ . Main (historyserver.scala:187) at Org.apache.spark.deploy.history.HistoryServer.main (Historyserver.scala) caused        By:java.lang.IllegalArgumentException:Logging directory must be specified. At org.apache.spark.deploy.history.fshistoryprovider$ $anonfun $2.apply (fshistoryprovider.scala:41) at org.apache.spark.deploy.history.fshistoryprovider$ $anonfun $2.apply (fshistoryprovider.scala:41) at Scala. Option.getorelse (option.scala:120) at Org.apache.spark.deploy.history.FsHistoryProvider.<init> (fshistoryprovider.scala:41) ... 6 more/usr/local/spark/sbin/. /logs/spark-root-org.apache.spark.deploy.history.historyserver-1-macor.out (END)
The error is that log directory is not specified, looking for a half-day do not know how to set up, in addition, in this article "Spark History server Process Analysis" has the corresponding source code analysis, through learning, know is Spark.history.fs.logDirectory parameter not specified, but I still do not know how to set Ah! Later in this article, "Spark History Server configuration Use", you know that there are two ways to solve this:

1. Specify the value of Spark.history.fs.logDirectory when starting the spark history process as follows: hdfs://localhost:9000/sparkhistorylogs

2. Configuring in Conf/spark-defaults.conf

History server-related configuration parameter description

1) spark.history.updateInterval
Default value: 10
The interval, in seconds, for updating log-related information

2) spark.history.retainedApplications
Default value: 50
The number of application history is saved in memory, and if this value is exceeded, the old application information is deleted and the page needs to be rebuilt when the deleted app information is accessed again.

Default value: 18080
Historyserver's web Port

4) spark.history.kerberos.enabled
Default value: False
It is useful to use Kerberos to log on to access Historyserver, which is helpful for persistent tiers in HDFs on secure clusters, and if set to True, configure the following two properties

5) Spark.history.kerberos.principal
Default value: The Kerberos Principal name for Historyserver

6) Spark.history.kerberos.keytab
Kerberos keytab file location for Historyserver

7) spark.history.ui.acls.enable
Default value: False
Whether the ACL is checked when the user is authorized to view the application information. If enabled, only the application owner and the user specified by Spark.ui.view.acls can view the application information;

Default value: False
Whether spark events are logged for the application to refactor after completion WebUI

Default value: File:///tmp/spark-events
The path that holds information about the log, either an HDFs path beginning with hdfs://or a local path at the beginning of the file://, need to be created in advance

Default value: False
Whether to compress the record spark event, if spark.eventLog.enabled is true, the default is to use snappy

The configuration that begins with Spark.history is configured in spark_history_opts, with the Spark.eventlog beginning with the spark-defaults.conf


spark.eventLog.enabled  truespark.eventLog.dir      hdfs://localhost:9000/ EventlogsTrue

Export spark_history_opts= "-Dspark.history.ui.port=7777-dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://localhost:9000/sparkhistorylogs "

Parameter description:

spark.history.ui.port=7777 Adjust the port number of the WebUI access to 7777, the ports can be modified according to their own

Spark.history.fs.logdirectory=hdfs://localhost:9000/sparkhistorylogs The property is configured, the The specified path is no longer required to be displayed at, depending on the actual modification

SPARK.HISTORY.RETAINEDAPPLICATIONS=3 Specifies the number of saved application history, and if this value is exceeded, the old application information will be deleted

The parameter is not required to start after adjusting the parameters.

Start-history-server. SH
This article is used in the memo and learning, if not, please advise!



Spark initiates history task recording process, error Logging directory must be specified resolved

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.