配置Ipython Nodebook 运

发布时间:2019-08-26 07:19:43编辑:auto阅读(1677)

    配置Ipython Nodebook 运行 Python Spark 程序

    1.1、安装Anaconda

    Anaconda的官网是https://www.anaconda.com,下载对应的版本;

    1.1.1、下载Anaconda

    $ cd /opt/local/src/
    $ wget -c https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh

    1.1.2、安装Anaconda

    # 参数 -b 表示 batch -p 表示指定安装目录
    $ bash Anaconda3-5.2.0-Linux-x86_64.sh -p /opt/local/anaconda -b

    1.1.3、配置Anaconda相关环境变量

    • 配置环境变量
    $ tail -n 8 ~/.bashrc
    
    # Anaconda3
    export ANACONDA_PATH=/opt/local/anaconda
    export PATH=$ANACONDA_PATH/bin:$PATH
    
    # PySpark
    export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipython
    export PYSPARK_PYTHON=$ANACONDA_PATH/bin/python
    • 启用环境变量
    $ source ~/.bashrc
    • 验证
    $ python --version
    Python 3.6.5 :: Anaconda, Inc.

    1.2、在Ipython Notebook 使用pySpark

    1.2.1、创建工作目录

    $ mkdir  ~/ipynotebook
    $ cd ~/ipynotebook

    1.2.2、Ipython Notebook 运行pySpark

    • 运行Ipython Notebook
    $ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
    [TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
    [TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
    [I 14:21:56.030 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
    [I 14:21:56.030 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
    [I 14:21:56.037 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook
    [I 14:21:56.037 NotebookApp] 0 active kernels
    [I 14:21:56.037 NotebookApp] The Jupyter Notebook is running at:
    [I 14:21:56.037 NotebookApp] http://localhost:8888/?token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d
    [I 14:21:56.037 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    [C 14:21:56.040 NotebookApp] 
    
        Copy/paste this URL into your browser when you connect for the first time,
        to login with a token:
            http://localhost:8888/?token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d&token=5b68718fdabe4488decf07703a3bd76bf46d5dc733a6617d
    [I 14:21:56.683 NotebookApp] Accepting one-time-token-authenticated connection from 127.0.0.1

    会自动通过默认的浏览器打开http://localhost:8888 页面

    • 在IPython Notebook 上编写程序

    配置Ipython Nodebook 运行 Python Spark 程序

    1.2.3、Ipython Notebook 在Hadoop Yarn 运行pySpark

    • 运行Ipython Notebook
    $ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
    [TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
    [TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
    [I 14:50:48.149 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
    [I 14:50:48.149 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
    [I 14:50:48.157 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook
    [I 14:50:48.157 NotebookApp] 0 active kernels
    [I 14:50:48.157 NotebookApp] The Jupyter Notebook is running at:
    [I 14:50:48.157 NotebookApp] http://localhost:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45
    [I 14:50:48.157 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    [C 14:50:48.161 NotebookApp] 
    
        Copy/paste this URL into your browser when you connect for the first time,
        to login with a token:
            http://localhost:8888/?token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45&token=8fe2c599dc39a23104dd6a058a0e05de3d9e88cfeda71b45
    
    • 在IPython Notebook 上编写程序

    配置Ipython Nodebook 运行 Python Spark 程序

    • 在YARN查看任务
    $ yarn application -list
    18/06/24 14:53:06 INFO client.RMProxy: Connecting to ResourceManager at node/192.168.20.10:8032
    Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1
                    Application-Id      Application-Name        Application-Type          User       Queue               State         Final-State         Progress                        Tracking-URL
    application_1529805293111_0001          PySparkShell                   SPARK        hadoop     default             RUNNING           UNDEFINED              10%                    http://node:4040

    1.2.4、Ipython Notebook 在Spark Stand Alone 运行pySpark

    • 启动Spark Stand Alone
    $ /opt/local/spark/sbin/start-master.sh
    
    $ /opt/local/spark/sbin/start-slaves.sh
    
    $ jps
    13249 Jps
    13027 Master
    13188 Worker
    • 运行Ipython Notebook
    $ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://node:7077 pyspark --num-executors 1 --total-executor-cores 1 --executor-memory 512m 
    [TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions.
    [TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook` in the future
    [I 15:11:59.211 NotebookApp] JupyterLab beta preview extension loaded from /opt/local/anaconda/lib/python3.6/site-packages/jupyterlab
    [I 15:11:59.212 NotebookApp] JupyterLab application directory is /opt/local/anaconda/share/jupyter/lab
    [I 15:11:59.230 NotebookApp] Serving notebooks from local directory: /home/hadoop/ipynotebook
    [I 15:11:59.230 NotebookApp] 0 active kernels
    [I 15:11:59.230 NotebookApp] The Jupyter Notebook is running at:
    [I 15:11:59.230 NotebookApp] http://localhost:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea
    [I 15:11:59.230 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    [C 15:11:59.233 NotebookApp] 
    
        Copy/paste this URL into your browser when you connect for the first time,
        to login with a token:
            http://localhost:8888/?token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea&token=1972eb523fea28d541985df7ed2ce55cc2bfada7e31eb9ea
    [I 15:12:02.594 NotebookApp] Accepting one-time-token-authenticated connection from 127.0.0.1
    • 在IPython Notebook 上编写程序

    配置Ipython Nodebook 运行 Python Spark 程序

    • 查看Spark Standalone Web UI 界面
      配置Ipython Nodebook 运行 Python Spark 程序

    1.3、总结

    启动启动Ipython Notebook,首先进入Ipython Notebook的工作目录,如~/ipynotebook这个根据实际的情况确定;

    1.3.1、Local 启动Ipython Notebook

    PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
    #### 或者
    PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*]

    1.3.2、Hadoop YARN 启动Ipython Notebook

    PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
    #### 或者
    PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_DIR=/opt/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client

    1.3.2、Spark Stand Alone 启动Ipython Notebook

    PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://node:7077 pyspark --num-executors 1 --total-executor-cores 1 --executor-memory 512m 

关键字