(Revised according to the latest situation)
Without a doubt, spark has become the hottest big data tool, this article details the way to install SPARKR, allowing you to use it locally within 5 minutes.
? Environmental Requirements : Java 7+, R, and Rstudio
Rtools (: https://cran.r-project.org/bin/windows/Rtools/)
first step: Download Spark
? In the browser open http://spark.apache.org/, click on the Right green button "Downloadspark"
You will see the following page:
? Follow the 1 to 3 above to create the download link.
In the "2. Choose a packagetype option, select a pre-built type (for example).
Because we are going to run locally under Windows, choose Pre-built Package Forhadoop 2.6 and later.
in the "3. Choose a download Type "select Direct Download".
Once selected, a download link is available at 4. Downloadspark " created well. ?
Download the zip file to your computer.
Step Two: Unzip the installation file
Uncompress to Path "C:/apache/spark-1.4.1″
? Step three: Run with the command line (this step needs to be configured to complete r and other environment variables before it takes effect, skip this step if you don't need a command-line window)
Open the Command Line window (Start-Search box, enter cmd) to change the path:
Input command".\bin\sparkR"
?成功后会看到一些日志,大约15s后,一切顺利的话,会有"Welcometo sparkr!"
Set Environment variables:
Right-click on "My Computer" and select "Properties":
? Select "Advanced System Settings"
Click "Environment Variables" to find the path in the "Systemvariables" below and add "C:\ProgramData\Oracle\Java\javapath;"
Fourth step: Running in Rstudio?
" c:/apache/spark-1.6.1 " ). Libpaths (C (File.path (sys.getenv("spark_home""R") "lib"),. libpaths ()))
#注意把spark -1.6.1 directory under the R directory Sparkr into the library of R, or can not directly install SPARKR package?
The library address for R can be viewed in the following ways:
. Libpaths () The new Lib library is installed by default in the first address (default address)
#load the Sparkr librarylibrary (SPARKR) # Create A spark context and a SQL contextsc<-Sparkr.init (master ="Local") SqlContext<-Sparkrsql.init (SC) #create a sparkr dataframedf<-Createdataframe (SqlContext, faithful) head (DF) # Create A simple local data.framelocaldf<-Data.frame (Name=c ("John","Smith","Sarah"), Age=c ( +, at, -) # Convert Local data frame to a Sparkr DATAFRAMEDF<-createdataframe (SqlContext, Localdf) # Print its schemaprintschema (DF) # root#|--Name:string(Nullable =true)# |--Age:Double(Nullable =true) # Create a DataFrame froma JSON filepath<-File.path (Sys.getenv ("Spark_home"),"Examples/src/main/resources/people.json") Peopledf<-jsonfile (sqlcontext, Path) Printschema (PEOPLEDF) # Register ThisDataFrame asa table.registertemptable (PEOPLEDF,"people") # SQL statements can be run byusingThe SQL methods provided by Sqlcontextteenagers<-SQL (SqlContext,"SELECT name from people WHERE age >= and <=") # Call Collect toGeta local data.frameteenagerslocaldf<-Collect (Teenagers) # Print the teenagersinchOur datasetprint (teenagerslocaldf) # Stop the Sparkcontext nowsparkr.stop ()
?
?
# Another example wordcount--------------# source http: / / www.cnblogs.com/hseagle/p/3998853.html sc <-sparkr.init (master= local , " Rwordcount ) lines <- Textfile (SC, readme.md "
—————— the "textfile" function cannot be used after the sparkR1.4, SPARKR must load the data through SqlContext, as follows:
People <- read.df (sqlcontext,"./examples/src/main/resources/people.json" " JSON " )
In addition, CSV, parquet hive data, and so on are supported.
words<-FlatMap (lines, function (line) {Strsplit (line," ")[[1]]}) WordCount<-lapply (words, function (word) {list (word),1L)}) counts<-Reducebykey (WordCount,"+",2L) Output<-Collect (counts) for(WordCountinchoutput) {Cat (wordcount[[1]],": ", wordcount[[2]],"\ n")}
? Original address: http://www.r-bloggers.com/installing-and-starting-sparkr-locally-on-windows-os-and-rstudio/
Resources
1. Installing http://blog.csdn.net/jediael_lu/article/details/45310321
2. Installing http://thinkerou.com/2015-05/How-to-Build-Spark-on-Windows/
3. Emblem Shanghai Lang's blog: http://www.cnblogs.com/hseagle/p/3998853.html
4. Learning http://www.r-bloggers.com/a-first-look-at-spark/?
5. Learning http://www.danielemaasit.com/getting-started-with-sparkr/
6.?? Error Resolution: http://stackoverflow.com/questions/10077689/ R-cmd-on-windows-7-error-r-is-not-recognized-as-an-internal-or-external-comm
7.SparkR Official Guide Http://spark.apache.org/docs/latest/sparkr.html#from-local-data-frames (Chinese version: http://www.iteblog.com/ archives/1385)
"Go + fix" is installed locally under windows and Rstudio Sparkr