1. Install Scala, download link Https://downloads.lightbend.com/scala/2.12.3/scala-2.12.3.msi
Create the system variable Scala_home to C:\Program Files (x86) \scala, and then add%scala_home%\bin to the system path variable
Then open the cmd window and run Scala, you should see the following information
2. Install JDK, download link, http://www.oracle.com/technetwork/java/javase/downloads/index.html, spark2.2 request jdk8, so download
Java SE 8u144
Create the system variable Java_home to C:\Program files\java\jdk1.8.0_144 and add%java_home%\bin to the system path variable
Create system variable classpath, content is%java_homt%\lib;%java_homt%\lib\tools.jar;
Open the cmd window, enter Java-version,
3. Install Spark, download link http://spark.apache.org/downloads.html
Click the 4th step of the link to download, and then extract the contents of the content into the C:\Spark folder,
Create system variable spark_home, content C:\Spark\bin, add%spark_home% and%spark_home%\sbin to system variable path
4. Install Hadoop winutils, download link https://github.com/steveloughran/winutils, select the Hadoop version number you want, For example 2.8.1, you only need to download Winutils.exe and then copy to the C:\Hadoop\bin folder.
Create system variable Hadoop_home, C:\Hadoop, add%hadoop_home%\bin to PATH variable
5. Open cmd as Administrator , run Spark-shell, run Winutils.exe chmod 777–r If you encounter errors such as access rights C:\tmp\hive
6. Open cmd as Administrator , run Spark-shell, you should see the following interface
Most importantly, you have to see spark context available as ' sc ' (master = local[*], app id = local-1507235397368). words
7. Spark Hello World Example
After the scala> prompt, enter run
Val textfile = Sc.textfile (file:///Spark/README.md)
Val tokenizedfiledata = Textfile.flatmap (Line=>line.split (""))
Val countprep = Tokenizedfiledata.map (word=> (word,1))
Val counts = Countprep.reducebykey ((accumvalue, newvalue) =>accumvalue+newvalue)
var sortedcounts = Counts.sortby (kvpair=>kvpair._2,false)
Sortedcounts.saveastextfile (File:///OutputData/ReadMeWordCount)
Open the C drive, you should see the Outputdata folder, inside the Readmewordcount folder, inside the contents of
View files part-00000 and part-00001, which are the number of occurrences of each word in the readme.md file.
Install Spark under Windows