Development readiness:
jdk1.8.45
spark-2.0.0-bin-hadoop2.7 (Leave a copy of Windows and Linux)
Linux systems (CentOS or others)
Spark installation Environment
hadoop-2.7.2 (a Linux copy)
Hadoop installation Environment
The development environment is built in the following steps:
1. Download scala-sdk-4.4.1-vfinal-2.11-win32.win32.x86_64.tgz
2. Unzip the tarball and run the eclipse directly inside
3. Create Scala project and create a Scala class WordCount
4. Right-click the project properties, add spark-2.0.0-bin-hadoop2.7 below all the libraries, can be customized library to put in:
5. Edit the code as follows:
Import Org.apache.spark._import sparkcontext._object WordCount { def main (args:array[string]) { if ( Args.length! = 3) { println ("Usage is org.test.WordCount <master> <input> <output>") return< c4/>} val sc = new Sparkcontext (args (0), "WordCount", system.getenv ("Spark_home"), Seq (System.getenv ("Spark_ Test_jar ")) val textfile = Sc.textfile (args (1)) val result = Textfile.flatmap (line = Line.split (" \\s+ ")) C9/>.map (Word = (word, 1)). Reducebykey (_ + _) Result.saveastextfile (args (2))} }
6. Right-click class to export the jar file:
7. Execute on the Spark deployment path (you can find the master address of Spark via the Spark's log):
./spark-submit--num-executors 1--executor-memory 1g--class WordCount--master spark://10.130.41.59:7077 Spark-wordcount-in-scala.jar spark://10.130.41.59:7077 hdfs://hadoop:9000/user/hadoop/input hdfs://hadoop:9000/ User/hadoop/outspark
8. Parameter resolution:
Can execute./spark-submit--help Get Help
Spark Development under eclipse (already in practice)