Due to the licene restrictions, not put into the default build, so on the official website downloaded binary files do not contain the Gangla module, if needed, need to compile themselves. When using MAVEN to compile spark, we can add the -Pspark-ganglia-lgpl
option to package the ganglia related classes into the spark-assembly-x.x.x-hadoopx.x.x.jar
command as follows:
./make-distribution.sh--tgz-phadoop-2.4 -pyarn-dskiptests dhadoop.version=2.4. 0 -PSPARK-GANGLIA-LGPL
You can also compile with SBT
spark_hadoop_version=2.4. 0 spark_yarn=true SPARK_GANGLIA_LGPL=true SBT/SBT Assembly
After the dependencies are done, we need to add the configuration to the $SPARK_HOME/conf/metrics.properties
file:
*.sink.ganglia.class=org.apache.spark.metrics.sink. gangliasink*.sink.ganglia.host=master*.sink.ganglia.port=8080*.sink.ganglia.period= Ten*.sink.ganglia.unit=seconds*.sink.ganglia.ttl=1*.sink.ganglia.mode= Multicastmaster. Source.jvm.class=org.apache.spark.metrics.source. jvmsourceworker. Source.jvm.class=org.apache.spark.metrics.source. jvmsourcedriver. Source.jvm.class=org.apache.spark.metrics.source. jvmsourceexecutor. Source.jvm.class=org.apache.spark.metrics.source.jvmsource
All nodes are configured as such.
After the match, or in the Http://master/ganglia monitoring
Reference: http://www.iteblog.com/archives/1347
http://www.iteblog.com/archives/1341
Ubuntu 14.10 under ganglia monitor spark cluster