1. Installing the JDK
2. Install Scala 2.10
spark-1.0.2 relies on Scala 2.10, we have to install Scala 2.10.
Download the scala-2.10.*.tgz and save it to your home directory (already on sg206).
$ TAR-ZXVF scala-2.10.*.tgz
$ sudo mv Scala-2.10.*.tgz/usr/lib
$ sudo vim ~/.bash_profile
# Add the following lines at the end
Export scala_home=/usr/lib/scala-2.10.*.tgz
Export path= $PATH: $SCALA _home/bin
# Save and exit Vim
#make The Bash profile take effect immediately
SOURCE ~/.bash_profile
# test
$ scala-version
3.building Spark
Cd/home
TAR-ZXF spark-0.7.3-sources.gz
CD spark-0.7.3
SBT/SBT package (requires git environment yum install git)
or download spark-1.0.2-bin-hadoop2.tgz
4. Configuration files
spark-env.sh
############
Export scala_home=/usr/lib/scala-2.9.3
Export spark_master_ip=172.16.48.202
Export spark_worker_memory=10g
Export java_home=***
#############
Slaves
Add from node IP to slaves configuration file
5. Start and stop
Bin/start-master.sh-starts a master instance on the machine the script is executed on.
Bin/start-slaves.sh-starts a Slave instance on all machine specified in the Conf/slaves file.
Bin/start-all.sh-starts both a master and a number of slaves as described above.
Bin/stop-master.sh-stops The master that is started via the bin/start-master.sh script.
Bin/stop-slaves.sh-stops The slave instances that were started via bin/start-slaves.sh.
Bin/stop-all.sh-stops both the master and the slaves as described above.
6. Browse Master's Web UI (default http://localhost:8080). This is where you should be able to see all the word nodes, as well as their CPU count and memory information.
7. Test:
Connecting Spark:spark-shell--master spark://192.168.148.42:7077
Enter the command:
var file = Sc.textfile ("")
Val info = file.filter (line = line.contains ("info"))
Info.count ()
Command test
Spark-submit--master spark://192.168.148.42:7077 examples/src/main/python/pi.py 10
Write program test:
Import Org.apache.spark.sparkconf;import Org.apache.spark.api.java.javardd;import Org.apache.spark.api.java.javasparkcontext;import org.apache.spark.api.java.function.Function; Public classadminoperation { Public Static voidMain (String []args) {sparkconf conf=NewSparkconf (). Setappname ("atest"). Setmaster ("spark://192.168.148.42:7077"); Javasparkcontext SC=Newjavasparkcontext (conf); Javardd<String> file = Sc.textfile ("Hdfs://huangcun-hbase1:8020/test/test.txt"); Javardd<String> errors = File.filter (NewFunction<string, boolean>() { PublicBoolean Call (String s) {returnS.contains ("ERROR"); } }); //Count all the errorsErrors.Count (); }}
Spark-submit--master spark://192.168.148.42:7077 Spark-test.jar--class adminoperation
Additional submission parameters are attached:
Spark-submit--master spark://bjjr-fanweiwe1.360buyad.local:7077--class adminoperation--executor-memory 20G-- Total-executor-cores C:\Users\Administrator.BJXX-20140806JH\spark-test.jar--jars 依赖的库文件
Spark Standalone mode installation