spark2.0 Implementation of IPYTHON3.5 development, and configure Jupyter,notebook to reduce the difficulty of Python development
1, spark2.0 installation will not say, there are many online, will not give me a message.
2, we in the spark2.0 under the development of Python, do not need to install Python, direct installation anaconda can be.
3, Anaconda download address: https://www.continuum.io/downloads, here to provide 3.5 and 2.7, taking into account future learning, I download the latest 3.5, the interface is as follows:
4, through the SCP to copy it to the Linux system, I copied it to the home directory under the downloads, you can be placed according to their preferences: the interface is as follows:
5, enter the installation command installation command, as shown in the following figure:
6, carriage return, as shown in the following figure:
7, enter Yes, as shown in the following figure:
8, input return, the default installation can, of course, can also be modified, where the default installation, as shown in the following figure:
9, this time you see, Anaconda already have python3.5, so do not need to install, as shown in the following figure:
10, wait a moment, installation complete as shown in the following figure:
11. Anaconda default environment variable you see the previous picture is in the home directory./BASHRC inside, we vim this file, found that the environment variable has been configured to complete, as shown in the following figure:
12, this time we first run the Pyspark, look at the effect, we found is 2.6.6, not python3.5, but can be started, as shown in the following figure:
13, we in order to use Anaconda, so here in the. bashrc file configuration, the command is as follows:
Export Pyspark_python=/root/anaconda3/bin
Export ipython= "1"
The interface is shown in the following illustration:
14, this time we entered the Spark Bin directory execution./pyspark, look at the interface, as shown in the following figure:
15, we found the error, Ipython and ipython_opts has been in spark2.0+ above, so we delete ipython= "1", source a bit. BASHRC, reboot the virtual machine again, this time we are executing. Pyspark, the interface is shown in the following illustration:
16. The configuration is successful.
17, Anaconda integrated Ipython in order to facilitate our development, we still remember just the error of the place, the new version has been renamed the Ipython, good then we will be the error prompted to configure the two parameters to configure the Pyspark_driver_ PYTHON and pyspark_driver_python_opts, commands are as follows:
Export Pyspark_driver_python=jupyter
Export pyspark_driver_python_opts= "notebook–notebookapp.open_browser=false–notebookapp.ip= ' *" –NotebookApp.port= 8880 "
The interface is as follows:
At this point in the reboot./pyspark, the startup interface has changed, as shown in the following figure:
We can at this time in the browser can be in the red box prompts the information input, such as I enter: 192.168.85.100:8880, the interface as shown in the following image:
Wow, accidentally we put the jupyter configuration, about the use of jupyter, we can see the officer net, I do not explain, so based on Python learning Spark is not convenient.
Summary: spark2.0 integrated PYTHON development is mainly to install the Anaconda, and then configure the Pyspark_driver_python and pyspark_driver_python_opts parameters, Next we will begin to learn spark development.