Spark is an open-source cluster computing environment similar to Hadoop, but it has been very hot lately. The installation steps are described below.
1 Installing Scala
1.1 I chose 2.11.4,http://www.scala-lang.org/download/.
1.2 Extract to Folder
TAR-XZVF scala-2.11. 4
1.3 Setting environment variables
sudo nano/etc/profile
Export scala_home=/home/liucc/software/spark/scala-2.11. 4 export PATH=$PATH:$SCALA _home/bin
1.4 Check if the installation is successful
Scala-version
2 Installing Spark
2.1 Download the compiled spark, I read the http://www.aboutyun.com/thread-8160-1-1.html written by the blogger, and I chose hadoop2.2.0.
Download pre-compiled spark, (32-bit, 64-OK)HADOOP1 installation PackageLink: Http://pan.baidu.com/s/1c0kZMLE Password: d4omhadoop2 installation package Link: Http://pan.baidu.com/s/1kT3czFD Password: elpg2.2 extract to the appropriate directory
TAR-XZVF spark-1.0. 0-bin-hadoop2.tgz
2.3 Setting Spark_home
Export spark_examples_jar=/home/liucc/software/spark/spark-1.0. 0/examples/target/scala-2.11. 4/spar$export spark_home=/home/liucc/software/spark/spark-1.0. 0
Note: Spark_examples_jar's settings are excerpt from PIG2: This step is actually the most critical, unfortunately, the official documents and online blogs, have not mentioned this point. I accidentally saw these two posts, Running SPARKPI, Null pointer exception when Running./run Spark.examples.SparkPi Local, just to make up this step, You can't run SPARKPI until you're alive.
2.4 Configure spark, go to the Conf directory, configure the spark-env.sh file
CP Spark-env.sh.template Spark-env. Shnano Spark-env.sh
Export JAVA_HOME=/USR/DEV/JDK1. 7. 0_51export scala_home=/home/liucc/software/spark/scala-2.11. 4
2.5 This configuration is complete, you can start the test, enter the SPRK directory, you can see the corresponding interface through http://centos.host1:8080/
sbin/start-master.sh
3 Test, run the first example on spark: WordCount interacting with Hadoop
3.1 Uploading files to Hadoop
3.2 Enter the Spark-shell, enter the spark root directory
Bin/spark-shell
3.3 Enter Scala statements, unfamiliar words, you can go to study, there is a public class https://class.coursera.org/progfun-005, very good
Val file=sc.textfile (" file to be counted ")
Val Count=file.flatmap (line=>line. Split("")). Map (Word=> (Word,1)). Reducebykey (_+_)
Count.collect ()
If that's true, you'll see the result.
3.4 Results can be uploaded to Hadoop
Count.saveastextfile (" directories to save ")
---------------------------------------------------------------------------------------
Complete, of course, you can also use Java testing in Eclipse, many online
Install spark under Ubuntu 14.10