Spark Development Environment Configuration

Source: Internet
Author: User

In the future, spark, mapreduce, and MPI may be concentrated on the same platform, with their respective focus not needed. This is equivalent to a set of cloud computing and high-performance computing. They complement each other and look at the spark basics. Now let's look at the development environment, the main focus is to look at the source code. Recently, the Apache Spark Source Code Reading series is quite good. The specific environment configuration is not too complex, specific can look at the https://github.com/apache/spark

1. Download Code

 
Git cloneHttps://github.com/apache/spark.git

2. directly build spark

I am based on hadoop2.2.0, so the execution is as follows:

 
Spark_hadoop_version = 2.2.0 spark_yarn = true SBT/SBT assembly

3, specific use reference https://github.com/apache/spark

Interactive Scala Shell

The easiest way to start using spark is through the scala shell:

 
./Bin/spark-shell

Try the following command, which shoshould return 1000:

Scala> SC. parallelize (1 to 1000). Count ()
Interactive Python Shell

Alternatively, if you prefer python, you can use the python shell:

 
./Bin/pyspark

And run the following command, which shoshould also return 1000:

 
>>> SC. parallelize (range (1000). Count ()
Example programs

Spark also comes with several sample programs inExamplesDirectory. To run one of them, use./Bin/run-example <class> [Params]. For example:

 
./Bin/run-example sparkpi

Will run the PI example locally.

You can set the master environment variable when running examples to submit examples to a cluster. this can be a mesos: // or spark: // URL, "yarn-cluster" or "yarn-client" to run on yarn, and "local" to run locally with one thread, or "local [N]" to run locally with n threads. you can also use an abbreviated class name if the class is inExamplesPackage. For instance:

 
Master = spark: // host: 7077./bin/run-example sparkpi

Usage of the example programs print usage help if no Params are given.

Running Tests

Testing first requires building spark. Once spark is built, tests can be run using:

 
./SBT Test


Use ide to install intellij idea and Scala plug-in

Download the tar.gz package of ideafrom ideaofficial website and unzip it. Run idea and install the scala plug-in.

In the source code root directory, run the following command:

 
./SBT gen-idea

The idea project file is generated. Use idea and clickFile-> open project, BrowseIncubator-SparkFolder, open the project, and you can modify the spark code.

 

Reference: https://github.com/apache/spark

Http://cn.soulmachine.me/blog/20140130/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.