Pycharm Integrated Pyspark on Mac

Source: Internet
Author: User
Tags pyspark

Prerequisites :

1. Spark is already installed. Mine is spark2.2.0.

2. There is already a Python environment, and my side uses python3.6.

First, install the py4j

Using PIP, run the following command:

  

Install py4j

Using Conda, run the following command:

Install py4j

Second, create a project using Pycharm.

Select the python environment during the creation process. After entering, click run--"Edit configurations--" environment variables.

Add Pythonpath and Spark_home, where Pythonpath is the Python directory in the Spark installation path and Spark_home is the spark installation directory.

Then click OK and go to the first page of the Apply,ok.

Third, point preferences--"Project structure--" Add Content Root

Add the Py4j-0.10.4-src.zip and Pyspark.zip inside the lib in the Python directory in the Spark installation path. Then Apply,ok.

Four, write Pyspark WordCount test a bit. My side is using the Pyspark streaming program.

The code is as follows:

wordcount.py

 fromPysparkImportSparkcontext fromPyspark.streamingImportStreamingContext#Create a local streamingcontext with working thread and batch interval of 1 secondSC= Sparkcontext ("Local[2]","Networdcount") SSC= StreamingContext (SC, 1)#Create a DStream that would connect to Hostname:port, like localhost:9999Lines= Ssc.sockettextstream ("localhost", 9999)#Split each line into wordswords= Lines.flatmap (LambdaLine:line.split (" "))#Count Each word in each batchPairs = Words.map (LambdaWord: (Word, 1)) Wordcounts= Pairs.reducebykey (LambdaX, Y:x +y)#Print The first ten elements of each RDD generated in this DStream to the consoleWordcounts.pprint () Ssc.start ( )#Start the computationSsc.awaittermination ()#Wait for the computation to terminate

Run the following command to the terminal first:

9999

You can then right-click in the Pycharm to run it. Then, in the above command line, enter a word with a space split:

I enter as follows:

A B a D D D D

Then press ENTER. You can see that the following results are output in Pycharm:

-A:-------------------------------------- -----('b'1) ('d' 4 ) ('a'2)

At this point, complete.

Pycharm Integrated Pyspark on Mac

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.