"Go + fix" is installed locally under windows and Rstudio Sparkr

Source: Internet
Author: User

(Revised according to the latest situation)

Without a doubt, spark has become the hottest big data tool, this article details the way to install SPARKR, allowing you to use it locally within 5 minutes.

? Environmental Requirements : Java 7+, R, and Rstudio
Rtools (: https://cran.r-project.org/bin/windows/Rtools/)

first step: Download Spark

? In the browser open http://spark.apache.org/, click on the Right green button "Downloadspark"

You will see the following page:

? Follow the 1 to 3 above to create the download link.
In the "2. Choose a packagetype option, select a pre-built type (for example).

Because we are going to run locally under Windows, choose Pre-built Package Forhadoop 2.6 and later.

in the "3. Choose a download Type "select Direct Download".

Once selected, a download link is available at 4. Downloadspark " created well. ?

Download the zip file to your computer.

Step Two: Unzip the installation file

Uncompress to Path "C:/apache/spark-1.4.1″

? Step three: Run with the command line (this step needs to be configured to complete r and other environment variables before it takes effect, skip this step if you don't need a command-line window)

Open the Command Line window (Start-Search box, enter cmd) to change the path:

Input command".\bin\sparkR"

?成功后会看到一些日志,大约15s后,一切顺利的话,会有"Welcometo sparkr!"

Set Environment variables:

Right-click on "My Computer" and select "Properties":

? Select "Advanced System Settings"

Click "Environment Variables" to find the path in the "Systemvariables" below and add "C:\ProgramData\Oracle\Java\javapath;"

Fourth step: Running in Rstudio?

" c:/apache/spark-1.6.1 " ). Libpaths (C (File.path (sys.getenv("spark_home""R")  "lib"),. libpaths ()))

#注意把spark -1.6.1 directory under the R directory Sparkr into the library of R, or can not directly install SPARKR package?

The library address for R can be viewed in the following ways:
. Libpaths () The new Lib library is installed by default in the first address (default address)

#load the Sparkr librarylibrary (SPARKR) # Create A spark context and a SQL contextsc<-Sparkr.init (master ="Local") SqlContext<-Sparkrsql.init (SC) #create a sparkr dataframedf<-Createdataframe (SqlContext, faithful) head (DF) # Create A simple local data.framelocaldf<-Data.frame (Name=c ("John","Smith","Sarah"), Age=c ( +, at, -) # Convert Local data frame to a Sparkr DATAFRAMEDF<-createdataframe (SqlContext, Localdf) # Print its schemaprintschema (DF) # root#|--Name:string(Nullable =true)# |--Age:Double(Nullable =true) # Create a DataFrame froma JSON filepath<-File.path (Sys.getenv ("Spark_home"),"Examples/src/main/resources/people.json") Peopledf<-jsonfile (sqlcontext, Path) Printschema (PEOPLEDF) # Register ThisDataFrame asa table.registertemptable (PEOPLEDF,"people") # SQL statements can be run byusingThe SQL methods provided by Sqlcontextteenagers<-SQL (SqlContext,"SELECT name from people WHERE age >= and <=") # Call Collect toGeta local data.frameteenagerslocaldf<-Collect (Teenagers) # Print the teenagersinchOur datasetprint (teenagerslocaldf) # Stop the Sparkcontext nowsparkr.stop ()

?

?

 # Another example wordcount--------------# source http:  / / www.cnblogs.com/hseagle/p/3998853.html  sc  <-sparkr.init (master= local  , "   Rwordcount   ) lines  <- Textfile (SC,  readme.md   "  
—————— the "textfile" function cannot be used after the sparkR1.4, SPARKR must load the data through SqlContext, as follows:
People <- read.df (sqlcontext,"./examples/src/main/resources/people.json" " JSON " )
In addition, CSV, parquet hive data, and so on are supported.
words<-FlatMap (lines, function (line) {Strsplit (line," ")[[1]]}) WordCount<-lapply (words, function (word) {list (word),1L)}) counts<-Reducebykey (WordCount,"+",2L) Output<-Collect (counts) for(WordCountinchoutput) {Cat (wordcount[[1]],": ", wordcount[[2]],"\ n")}

 

? Original address: http://www.r-bloggers.com/installing-and-starting-sparkr-locally-on-windows-os-and-rstudio/

Resources

1. Installing http://blog.csdn.net/jediael_lu/article/details/45310321

2. Installing http://thinkerou.com/2015-05/How-to-Build-Spark-on-Windows/

3. Emblem Shanghai Lang's blog: http://www.cnblogs.com/hseagle/p/3998853.html

4. Learning http://www.r-bloggers.com/a-first-look-at-spark/?

5. Learning http://www.danielemaasit.com/getting-started-with-sparkr/

6.?? Error Resolution: http://stackoverflow.com/questions/10077689/ R-cmd-on-windows-7-error-r-is-not-recognized-as-an-internal-or-external-comm

7.SparkR Official Guide Http://spark.apache.org/docs/latest/sparkr.html#from-local-data-frames (Chinese version: http://www.iteblog.com/ archives/1385)

"Go + fix" is installed locally under windows and Rstudio Sparkr

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.