Configure Ipython Nodebook Run Python Spark Program 1.1, install Anaconda
Anaconda's official website is https://www.anaconda.com, download the corresponding version;
1.1.1, download Anaconda
$ cd /opt/local/src/$ wget -c https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
1.1.2, Installation Anaconda
# 参数 -b 表示 batch -p 表示指定安装目录$ bash Anaconda3-5.2.0-Linux-x86_64.sh -p /opt/local/anaconda -b
1.1.3, configuring Anaconda related environment variables
- Configuring Environment variables
$ tail -n 8 ~/.bashrc# Anaconda3export ANACONDA_PATH=/opt/local/anacondaexport PATH=$ANACONDA_PATH/bin:$PATH# PySparkexport PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipythonexport PYSPARK_PYTHON=$ANACONDA_PATH/bin/python
- Enabling environment variables
$ source ~/.bashrc
$ python --versionPython 3.6.5 :: Anaconda, Inc.
1.2, in Ipython Notebook use pySpark1.2.1, create working directory
$ mkdir ~/ipynotebook$ cd ~/ipynotebook
1.2.2, Ipython Notebook run Pyspark
$ Pyspark_driver_python=ipython pyspark_driver_python_opts= "notebook" Pyspark[terminalipythonapp] WARNING | Subcommand ' Ipython notebook ' is deprecated and'll be removed in the future versions. [Terminalipythonapp] WARNING | Likely want to use ' Jupyter notebook ' in the Future[i 14:21:56.030 Notebookapp] Jupyterlab Beta preview extension load Ed From/opt/local/anaconda/lib/python3.6/site-packages/jupyterlab[i 14:21:56.030 NotebookApp] Jupyterlab Application directory Is/opt/local/anaconda/share/jupyter/lab[i 14:21:56.037 Notebookapp] Serving notebooks from local Directory:/home/hadoop/ipynotebook[i 14:21:56.037 Notebookapp] 0 active kernels[i 14:21:56.037 Notebookapp] the Jupyter Notebook is running at:[i 14:21:56.037 Notebookapp] http://localhost:8888/?token= 5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d[i 14:21:56.037 Notebookapp] Use CONTROL-C to stop this server and shut D Own all kernels (twice to skip confirmation). [C 14:21:56.040 Notebookapp] Copy/paste this URL to your Browser when do connect for the first time, to login with a token:http://localhost:8888/?token=5b68718fdabe44 88decf07703a3bd76bf46d5dc733a6617d&token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d[i 14:21:56.683 Notebookapp] Accepting one-time-token-authenticated connection from 127.0.0.1
will automatically open the http://localhost:8888 page via the default browser
- Write a program on Ipython Notebook
1.2.3, Ipython Notebook runs in Hadoop Yarn Pyspark
$ Pyspark_driver_python=ipython pyspark_driver_python_opts= "Notebook" hadoop_conf_dir=/opt/local/hadoop/etc/ Hadoop master=yarn-client Pyspark[terminalipythonapp] WARNING | Subcommand ' Ipython notebook ' is deprecated and'll be removed in the future versions. [Terminalipythonapp] WARNING | Likely want to use ' Jupyter notebook ' in the Future[i 14:50:48.149 Notebookapp] Jupyterlab Beta preview extension load Ed From/opt/local/anaconda/lib/python3.6/site-packages/jupyterlab[i 14:50:48.149 NotebookApp] Jupyterlab Application directory Is/opt/local/anaconda/share/jupyter/lab[i 14:50:48.157 Notebookapp] Serving notebooks from local Directory:/home/hadoop/ipynotebook[i 14:50:48.157 Notebookapp] 0 active kernels[i 14:50:48.157 Notebookapp] the Jupyter Notebook is running at:[i 14:50:48.157 Notebookapp] http://localhost:8888/?token= 8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45[i 14:50:48.157 Notebookapp] Use CONTROL-C to stop this server and shut D Own all kernels (twice to skip confirmation). [C 14:50:48.161 Notebookapp] Copy/paste this URL into your browser when you connect for the first time, to login with a Token:http://localho st:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45&token= 8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45
- Write a program on Ipython Notebook
$ yarn application -list18/06/24 14:53:06 INFO client.RMProxy: Connecting to ResourceManager at node/192.168.20.10:8032Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URLapplication_1529805293111_0001 PySparkShell SPARK hadoop default RUNNING UNDEFINED 10% http://node:4040
1.2.4, Ipython Notebook run Alone in Spark stand Pyspark
$ /opt/local/spark/sbin/start-master.sh$ /opt/local/spark/sbin/start-slaves.sh$ jps13249 Jps13027 Master13188 Worker
$ Pyspark_driver_python=ipython pyspark_driver_python_opts= "Notebook" master=spark://node:7077 PYSPARK-- Num-executors 1--total-executor-cores 1--executor-memory 512m [Terminalipythonapp] WARNING | Subcommand ' Ipython notebook ' is deprecated and'll be removed in the future versions. [Terminalipythonapp] WARNING | Likely want to use ' Jupyter notebook ' in the Future[i 15:11:59.211 Notebookapp] Jupyterlab Beta preview extension load Ed From/opt/local/anaconda/lib/python3.6/site-packages/jupyterlab[i 15:11:59.212 NotebookApp] Jupyterlab Application directory Is/opt/local/anaconda/share/jupyter/lab[i 15:11:59.230 Notebookapp] Serving notebooks from local Directory:/home/hadoop/ipynotebook[i 15:11:59.230 Notebookapp] 0 active kernels[i 15:11:59.230 Notebookapp] the Jupyter Notebook is running at:[i 15:11:59.230 Notebookapp] http://localhost:8888/?token= 1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea[i 15:11:59.230 Notebookapp] Use CONTROL-C to stop this server and shut D Own all kernels (twice to skip confirmation). [C 15:11:59.233 Notebookapp] Copy/paste this URL into your browser when you connect for the first time, to login with a Token:http://localho st:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea&token= 1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea[i 15:12:02.594 Notebookapp] Accepting one-time-token-authenticated connection from 127.0.0.1
- Write a program on Ipython Notebook
- View the Spark Standalone Web UI Interface
1.3. Summary
Start Ipython Notebook, first enter the working directory of Ipython Notebook, as ~/ipynotebook
this is determined according to the actual situation;
1.3.1, Local start Ipython Notebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark#### 或者PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*]
1.3.2, Hadoop YARN start Ipython Notebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark#### 或者PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
1.3.2, Spark stand Alone start Ipython Notebook
Configure Ipython Nodebook run Python Spark program