In the future, spark, mapreduce, and MPI may be concentrated on the same platform, with their respective focus not needed. This is equivalent to a set of cloud computing and high-performance computing. They complement each other and look at the spark basics. Now let's look at the development environment, the main focus is to look at the source code. Recently, the Apache Spark Source Code Reading series is quite good. The specific environment configuration is not too complex, specific can look at the https://github.com/apache/spark
1. Download Code
Git cloneHttps://github.com/apache/spark.git
2. directly build spark
I am based on hadoop2.2.0, so the execution is as follows:
Spark_hadoop_version = 2.2.0 spark_yarn = true SBT/SBT assembly
3, specific use reference https://github.com/apache/spark
Interactive Scala Shell
The easiest way to start using spark is through the scala shell:
./Bin/spark-shell
Try the following command, which shoshould return 1000:
Scala> SC. parallelize (1 to 1000). Count ()
Interactive Python Shell
Alternatively, if you prefer python, you can use the python shell:
./Bin/pyspark
And run the following command, which shoshould also return 1000:
>>> SC. parallelize (range (1000). Count ()
Example programs
Spark also comes with several sample programs inExamplesDirectory. To run one of them, use./Bin/run-example <class> [Params]. For example:
./Bin/run-example sparkpi
Will run the PI example locally.
You can set the master environment variable when running examples to submit examples to a cluster. this can be a mesos: // or spark: // URL, "yarn-cluster" or "yarn-client" to run on yarn, and "local" to run locally with one thread, or "local [N]" to run locally with n threads. you can also use an abbreviated class name if the class is inExamplesPackage. For instance:
Master = spark: // host: 7077./bin/run-example sparkpi
Usage of the example programs print usage help if no Params are given.
Running Tests
Testing first requires building spark. Once spark is built, tests can be run using:
./SBT Test
Use ide to install intellij idea and Scala plug-in
Download the tar.gz package of ideafrom ideaofficial website and unzip it. Run idea and install the scala plug-in.
In the source code root directory, run the following command:
./SBT gen-idea
The idea project file is generated. Use idea and clickFile-> open project, BrowseIncubator-SparkFolder, open the project, and you can modify the spark code.
Reference: https://github.com/apache/spark
Http://cn.soulmachine.me/blog/20140130/