標籤:class blog code http tar com
以後spark,mapreduce,mpi可能三者集於同一平台,各自的側重點有所不用,相當於雲端運算與高效能運算的集合,互補,把spark的基礎看了看,現在把開發環境看看,主要是看源碼,最近Apache Spark源碼走讀系列挺好的,看了些。具體環境配置不是太複雜,具體可以看https://github.com/apache/spark
1、代碼下載
git clone https://github.com/apache/spark.git
2、直接構建spark
我是基於hadoop2.2.0的,因此執行如下:
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
3、具體使用參考https://github.com/apache/spark
Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
./bin/spark-shell
Try the following command, which should return 1000:
scala> sc.parallelize(1 to 1000).count()
Interactive Python Shell
Alternatively, if you prefer Python, you can use the Python shell:
./bin/pyspark
And run the following command, which should also return 1000:
>>> sc.parallelize(range(1000)).count()
Example Programs
Spark also comes with several sample programs in the examples directory. To run one of them, use./bin/run-example <class> [params]. For example:
./bin/run-example SparkPi
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit examples to a cluster. This can be a mesos:// or spark:// URL, "yarn-cluster" or "yarn-client" to run on YARN, and "local" to run locally with one thread, or "local[N]" to run locally with N threads. You can also use an abbreviated class name if the class is in the examples package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
Running Tests
Testing first requires building Spark. Once Spark is built, tests can be run using:
./sbt/sbt test
使用IDE,安裝 Intellj Idea,並安裝scala外掛程式
去idea官網下載idea的tar.gz包,解壓就行。運行idea,安裝scala外掛程式。
在源碼根目錄,使用如下命令
./sbt/sbt gen-idea
就產生了idea專案檔。使用 idea,點擊File->Open project,瀏覽到 incubator-spark檔案夾,開啟項目,就可以修改Spark代碼了。
具體參考:https://github.com/apache/spark
http://cn.soulmachine.me/blog/20140130/