Before you learn spark any technology, be sure to understand spark correctly, for reference: understanding spark correctly
Here's an environment for configuring spark with Python on Mac OS
First, install Python
spark2.2.0 need python version is python2.6+ or python3.4+
can refer to:
Http://jingyan.baidu.com/article/7908e85c78c743af491ad261.html
Ii. Download the spark compilation package and configure the environment variables
1, in the official website: http://spark.apache.org/downloads.html Download version: SPARK-2.2.0-BIN-HADOOP2.6.TGZ Package
On a local disk, then unzip it.
2. Set Environment variables:
CD ~
VI. bash_profile
Export spark_home=/users/tangweiqun/desktop/bigdata/spark/spark-2.2.0-bin-hadoop2.6
Export path= $PATH: $SCALA _home/bin: $M 2_home/bin: $JAVA _home/bin: $SPARK _home/bin
SOURCE . Bash_profile
3, need to spark_home under the bin directory of the file execution chmod 744./*, otherwise it will report insufficient permissions error
Window machine should not do this step
Third, installation Pycharm
1, from the official website: https://www.jetbrains.com/pycharm/download/Download, and then the idiot installation
Iv. writing wordcount.py and running successfully
1. Create a project
FILE-to-New Project
2, to Pycharm configuration Pythonpath
Run---Edit configurations, configured as follows
650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M00/07/21/wKiom1nDwb6DpKB9AAK8NBUaD6I447.png-wh_500x0-wm_ 3-wmp_4-s_2628805509.png "style=" Float:none; "title=" 3333333333.png "alt=" Wkiom1ndwb6dpkb9aak8nbuad6i447.png-wh_ "/>
650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M02/A5/D2/wKioL1nDwYqg_PxeAAP_gd5f8LE726.png-wh_500x0-wm_ 3-wmp_4-s_3686106793.png "style=" Float:none; "title=" 444444444444.png "alt=" Wkiol1ndwyqg_pxeaap_gd5f8le726.png-wh _50 "/>
650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M01/07/21/wKiom1nDwcDAfyCIAAEj-F9f7lM398.png-wh_500x0-wm_ 3-wmp_4-s_2982977777.png "style=" Float:none; "title=" 555555555555.png "alt=" Wkiom1ndwcdafyciaaej-f9f7lm398.png-wh _50 "/>
Click on the "+" above, then fill in:
Pythonpath=/users/tangweiqun/desktop/bigdata/spark/spark-2.1.0-bin-hadoop2.6/python/:/users/tangweiqun/desktop /bigdata/spark/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip
The dependencies associated with Python in the upcoming Spark installation package Plus
3, Py4j-some-version.zip and the Pyspark.zip Add to Project
In order to be able to see the source code, we need to link the project source code, associated with the following ways:
650) this.width=650; "Src=" https://s3.51cto.com/wyfs02/M00/A5/D3/wKioL1nDwzrA94m5AAQ9Z0Rno-w970.png-wh_500x0-wm_ 3-wmp_4-s_3688969279.png "style=" Float:none; "title=" 6666666.png "alt=" wkiol1ndwzra94m5aaq9z0rno-w970.png-wh_50 "/ >
650) this.width=650; "Src=" https://s4.51cto.com/wyfs02/M01/07/22/wKiom1nDw3Gz6BZhAAMLPZx3DrI953.png-wh_500x0-wm_ 3-wmp_4-s_759117673.png "style=" Float:none; "title=" 7777777.png "alt=" wkiom1ndw3gz6bzhaamlpzx3dri953.png-wh_50 "/ >
Click + Add Content root to add the two zip packages under/users/tangweiqun/desktop/bigdata/spark/spark-2.1.0-bin-hadoop2.6/python/lib
4. Write spark word count and run it successfully
Create a python text wordcount.py, the contents are as follows:
From pyspark import sparkcontext, sparkconfimport osimport shutilif __name_ _ == "__main__": conf = sparkconf (). Setappname ("AppName"). SetMaster ("local") sc = sparkcontext (conf=conf) sourcedatardd = sc.textfile ("File:///Users/tangweiqun/test.txt") wordsRDD = Sourcedatardd.flatmap (Lambda line: line.split ()) keyvaluewordsrdd = wordsrdd.map (lambda s: (s, 1)) wordCountRDD = Keyvaluewordsrdd.reducebykey (lambda a, b: a + b) outputpath = "/users/tangweiqun/wordcount" if os.path.exists (OutputPath): shutil.rmtree (OutputPath) Wordsrdd.saveastextfile ("file://" + outputpath) print wordcountrdd.collect ()
Right-click to run successfully
A detailed and systematic understanding of the spark core RDD related APIs can be found in the following:Spark core RDD API Rationale
spark2.x deep into the end series five Python development Spark environment configuration