WIN10 Anaconda3 in virtual environment python_version=3.5.3 configuration Pyspark_

WIN10 Anaconda3 in virtual environment python_version=3.5.3 configuration Pyspark__python

Last Update:2018-07-28 Source: Internet

Author: User

Tags virtual environment pyspark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Preface

After a day of cultivation, deeply disgusting, in the virtual environment to configure the Pyspark flower error, because I really do not want to uninstall the 3.6 version of Python, so hard just a day, finally found the configuration method, and configuration success, do not complain, start: 2. Demand Environment

Anaconda3 (mine is the newest version of Anaconda4.3.1 (64-bit)) 3. Install the virtual environment

1, create a Python virtual environment.
Use the Conda create-n your_env_name python=3.5 (2.7, 3.6, and so on) Anaconda command to create a Python version of x.x, a virtual environment named Your_env_name. The Your_env_name file can be found under the Anaconda installation directory Envs file.
2. Activate the virtual environment
Activate Your_env_name
3. Install Virtual Environment Package
Conda Install-n Your_env_name
So a name is Your_env_name virtual environment is built, its Python version is version 3.5, can be entered through CMD, and then enter activate Your_env_name in the output Python and then check to see

4. Resource Preparation

Spark_2.2.0_bin-hadoop2.7.tgz

According to their own requirements version, you can download a different version of Spark, I use the spark-2.0.1-bin-hadoop2.7, after decompression, note that the path must not take a space, such as the C:\Program files on. Here I put the D:\spark, then add the environment variable. There are two ways to
:
1, add environment variables directly to path: D:\spark\spark-2.0.1-bin-hadoop2.7\bin, and D:\spark\spark-2.0.1-bin-hadoop2.7\ Sbin are added to the environment variable.
2, create a new spark_home environment variable first, the value is D:\spark\spark-2.0.1-bin-hadoop2.7, then add%spark_home%\bin and%spark_home%\sbin to Path
If you want to see if you are adding success, you need to add the virtual environment to the path variable, then move up the path to the original python=3.6, and then enter Pyspark in CMD before the following situation occurs. Note: If the path to the virtual environment is not found in front of the original path, the path is only to see if the installation is successful, it has nothing to do with success in Pycharm, and you can change back.

There is a need for Hadoop in the spark dependency environment, so it is also necessary to download the corresponding version of Hadoop, which I download is about hadoop2.7+ resources without having to install the full Hadoop But need hadoop.dll,winutils.exe and so on. Download the corresponding version of hadoop2.7.1 according to the spark version of the download.
Unpack, add environment variables (like Pyspark add environment variables, there are also two methods, after my test, as if there is not enough resources, anyway, I use the error, so I chose to download the bin file from the original resources, and then replace the previous hadoop2.7.1 inside the bin file.
To call Pyspark in Pycharm, you need to load the package. Copy the Pyspark folder under the D:\spark\spark-2.0.1-bin-hadoop2.7\python folder to D:\ProgramData\Anaconda3\envs\tensorflow\Lib\ site-packages** ( Note: This place is the package path to be copied to the virtual environment ) 4.1 Key

Note that, the next step is to determine whether we can complete the configuration ,
1, in the D:\spark\spark-2.0.1-bin-hadoop2.7\bin folder to find the Pyspark file, and then open with notepad++.
2. Find export Pyspark_python and turn this place into export Pyspark_python=python3
3, save ... Done
4, restart (you can skip this step, the following verification, if the validation or flower-type error (2) can be on this step, become a big shot, and then can be a good tilt with Pyspark.
So far, our preparations have finally been done, and, it's not easy ... In fact, so far we have been installed to complete, surprised not to surprise, the idea is not unexpected, did not think so simple, ha ha ha, then we can verify the. 5. Verify

Here, citing the example of Sharon,

New wordcount.py file, write code
Note that setappname, setmaster must not take spaces, otherwise there will be "Java Gateway process exited before sending the driver its port number" error

Import SYS from
operator import add from

pyspark import sparkcontext


if __name__ = = "__main__":
    sc = Spark Context (appname= "Pythonwordcount")
    lines = sc.textfile (' words.txt ')
    counts = Lines.flatmap (Lambda x:x.split (' ')) \
                  . Map (lambda x: (x, 1)) \
                  . Reducebykey (add)
    output = Counts.collect ()
    for (word, count) in output:
        print '%s:%i '% (word, count)

    Sc.stop ()

Where the contents of the Words.txt are

Good bad cool
Hadoop spark mlib
good spark mlib cool
spark bad

Then run, and you'll find —————————————————————————— the error is not, hahaha. Although do not know what you reported wrong, certainly the error, this is the virtual environment installed Pyspark Charming place, the flower-type error, do not worry, that is because we do not add the project environment variables, in the MAP environment variable location to add Spark_home path, and then run

Appear

That means we're done, la la la la ........

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More