配置Ipython Nodebook 運行 Python Spark 程式

來源:互聯網
上載者:User

標籤:hadoop   lis   表示   AC   環境變數   MF   /etc   load   The   

配置Ipython Nodebook 運行 Python Spark 程式1.1、安裝Anaconda

Anaconda的官網是https://www.anaconda.com,下載對應的版本;

1.1.1、下載Anaconda
$ cd /opt/local/src/$ wget -c https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
1.1.2、安裝Anaconda
# 參數 -b 表示 batch -p 表示指定安裝目錄$ bash Anaconda3-5.2.0-Linux-x86_64.sh -p /opt/local/anaconda -b
1.1.3、配置Anaconda相關環境變數
  • 配置環境變數
$ tail -n 8 ~/.bashrc# Anaconda3export ANACONDA_PATH=/opt/local/anacondaexport PATH=$ANACONDA_PATH/bin:$PATH# PySparkexport PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipythonexport PYSPARK_PYTHON=$ANACONDA_PATH/bin/python
  • 啟用環境變數
$ source ~/.bashrc
  • 驗證
$ python --versionPython 3.6.5 :: Anaconda, Inc.
1.2、在Ipython Notebook 使用pySpark1.2.1、建立工作目錄
$ mkdir  ~/ipynotebook$ cd ~/ipynotebook
1.2.2、Ipython Notebook 運行pySpark
  • 運行Ipython Notebook
$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future[I 14:21:56.030 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab[I 14:21:56.030 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab[I 14:21:56.037 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook[I 14:21:56.037 NotebookApp] 0 active kernels[I 14:21:56.037 NotebookApp] The Jupyter Notebook is running at:[I 14:21:56.037 NotebookApp] http://localhost:8888/?token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d[I 14:21:56.037 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).[C 14:21:56.040 NotebookApp]     Copy/paste this URL into your browser when you connect for the first time,    to login with a token:        http://localhost:8888/?token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d&token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d[I 14:21:56.683 NotebookApp] Accepting one-time-token-authenticated connection from 127.0.0.1

會自動通過預設的瀏覽器開啟http://localhost:8888 頁面

  • 在IPython Notebook 上編寫程式

1.2.3、Ipython Notebook 在Hadoop Yarn 運行pySpark
  • 運行Ipython Notebook
$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future[I 14:50:48.149 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab[I 14:50:48.149 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab[I 14:50:48.157 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook[I 14:50:48.157 NotebookApp] 0 active kernels[I 14:50:48.157 NotebookApp] The Jupyter Notebook is running at:[I 14:50:48.157 NotebookApp] http://localhost:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45[I 14:50:48.157 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).[C 14:50:48.161 NotebookApp]     Copy/paste this URL into your browser when you connect for the first time,    to login with a token:        http://localhost:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45&token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45
  • 在IPython Notebook 上編寫程式

  • 在YARN查看任務
$ yarn application -list18/06/24 14:53:06 INFO client.RMProxy: Connecting to ResourceManager at node/192.168.20.10:8032Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1                Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URLapplication_1529805293111_0001          PySparkShell                   SPARK        hadoop     default             RUNNING           UNDEFINED              10%                    http://node:4040
1.2.4、Ipython Notebook 在Spark Stand Alone 運行pySpark
  • 啟動Spark Stand Alone
$ /opt/local/spark/sbin/start-master.sh$ /opt/local/spark/sbin/start-slaves.sh$ jps13249 Jps13027 Master13188 Worker
  • 運行Ipython Notebook
$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://node:7077 pyspark --num-executors 1 --total-executor-cores 1 --executor-memory 512m [TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future[I 15:11:59.211 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab[I 15:11:59.212 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab[I 15:11:59.230 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook[I 15:11:59.230 NotebookApp] 0 active kernels[I 15:11:59.230 NotebookApp] The Jupyter Notebook is running at:[I 15:11:59.230 NotebookApp] http://localhost:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea[I 15:11:59.230 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).[C 15:11:59.233 NotebookApp]     Copy/paste this URL into your browser when you connect for the first time,    to login with a token:        http://localhost:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea&token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea[I 15:12:02.594 NotebookApp] Accepting one-time-token-authenticated connection from 127.0.0.1
  • 在IPython Notebook 上編寫程式

  • 查看Spark Standalone Web UI 介面
1.3、總結

啟動啟動Ipython Notebook,首先進入Ipython Notebook的工作目錄,如~/ipynotebook這個根據實際的情況確定;

1.3.1、Local 啟動Ipython Notebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark#### 或者PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*]
1.3.2、Hadoop YARN 啟動Ipython Notebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark#### 或者PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
1.3.2、Spark Stand Alone 啟動Ipython Notebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://node:7077 pyspark --num-executors 1 --total-executor-cores 1 --executor-memory 512m 

配置Ipython Nodebook 運行 Python Spark 程式

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.