Environment:
Spark 2.0.0,anaconda2
1.spark Ipython and Notebook installation configuration
Method One: This method can enter Ipython notebook through the webpage, the other open terminal can enter Pyspark
If equipped with anaconda can be directly the following way to obtain the Ipython interface of the landing, do not install anaconda reference the bottom of the link to install their own Ipython-related packages.
VI ~/.BASHRC
Export Pyspark_driver_python=ipython
Export pyspark_driver_python_opts= "Notebook--notebookapp.open_browser=false--notebookapp.ip= ' *"- notebookapp.port=8880 "
SOURCE ~/.BASHRC
Reboot Pyspark
Appear
Ting a notebook with Pyspark
On the driver host, choose a directory notebook_directory to run the notebook. Notebook_directory contains the IPYNB files that represent the different notebooks that can be served.
In Notebook_directory, run Pyspark with your desired runtime options. You should to output like the following:
Reference:
Ipython and Jupyter on Spark 2.0.0
Http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html
Method Two:
Method two use Ipython can, but jupyter have a problem, do not know is not a other
It is also possible to launch the Pyspark shell in IPython, the enhanced Python interpreter. Pyspark works with IPython 1.0.0 and later. To use IPython, set the Pyspark_driver_python variable to IPython when running Bin/pyspark:
$ pyspark_driver_python=ipython./bin/pyspark
To use the Jupyter notebook (previously known as The IPython Notebook),
$ pyspark_driver_python=jupyter./bin/pyspark
You can customize the Ipython or Jupyter commands by setting pyspark_driver_python_opts.
root@py-server:/server/bin# Pyspark_driver_python=ipython $SPARK _home/bin/pyspark
Python 2.7.12 | Anaconda 4.1.1 (64-bit) | (Default, June 2 2016, 17:42:40)
Type "Copyright", "credits" or "license" for the more information.
IPython 4.2.0--an enhanced Interactive Python.
? -> Introduction and Overview of IPython ' s features.
%quickref-> Quick Reference.
Help-> the Python ' s own Help system.
Object? -> details about ' object ', use ' object?? ' for extra details.
Setting default log level to "WARN".
To adjust logging level use Sc.setloglevel (Newlevel).
16/08/03 22:24:56 WARN util. nativecodeloader:unable to load Native-hadoop library for your platform ... using Builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__/. __/\_,_/_//_/\_\ Version 2.0.0
/_/
Using Python version 2.7.12 (default, June 2 2016 17:42:40)
Sparksession available as ' spark '.
In [1]:
2. Use:
Open http://notebook_host:8880/in a browser.
For example: http://spark01:8880/
New->python Open the Python interface
Shift+enter or Shift+return Execute command
Attention:
After setting Ipython, Pyspark can only use Ipython unless the environment variable is restored
3. Test examples
References: "Spark for Python Developers"
file_in change your own file, if it is local to use the sentence, HDFs on the default, modify the specific address can be.