Reference:
http://spark.incubator.apache.org/docs/latest/
Http://spark.incubator.apache.org/docs/latest/spark-standalone.html
http://www.yanjiuyanjiu.com/blog/20130617/
1. Installing the JDK
2. Install Scala 2.9.3
Spark 0.7.2 relies on Scala 2.9.3 and we have to install Scala 2.9.3.
Download the scala-2.9.3.tgz and save it to your home directory (already on sg206).
$ TAR-ZXF scala-2.9.3.tgz
$ sudo mv Scala-2.9.3/usr/lib
$ sudo vim/etc/profile
# Add the following lines at the end
Export scala_home=/usr/lib/scala-2.9.3
Export path= $PATH: $SCALA _home/bin
# Save and exit Vim
#make The Bash profile take effect immediately
Source/etc/profile
# test
$ scala-version
3.building Spark
Cd/home
TAR-ZXF spark-0.7.3-sources.gz
CD spark-0.7.3
SBT/SBT package (requires git environment yum install git)
4. Configuration Files
spark-env.sh
############
Export scala_home=/usr/lib/scala-2.9.3
Export spark_master_ip=172.16.48.202
Export spark_worker_memory=10g
#############
Slaves
Add from node IP to slaves configuration file
5. Start and stop
Bin/start-master.sh-starts a master instance on the machine the script is executed on.
Bin/start-slaves.sh-starts a Slave instance on all machine specified in the Conf/slaves file.
Bin/start-all.sh-starts both a master and a number of slaves as described above.
Bin/stop-master.sh-stops The master that is started via the bin/start-master.sh script.
Bin/stop-slaves.sh-stops The slave instances that were started via bin/start-slaves.sh.
Bin/stop-all.sh-stops both the master and the slaves as described above.
6. Browse Master's Web UI (default http://localhost:8080). This is where you should be able to see all the word nodes, as well as their CPU count and memory information.