Docker on Spark

Source: Internet
Author: User

    1. Pull Mirror from Docker warehouse
      Docker Pull sequenceiq/spark:1.4.0

    2. Building a Docker image
      Docker build–rm-t sequenceiq/spark:1.4.0.
      The-t option is the tag of the Sequenceiq/spark image you want to build, just like ubuntu:13.10 – rm option is to tell Docker to delete the temporary container after the build is complete, Dockerfile each line of instructions will create a temporary container, generally you do not need these temporary generated container

    3. Run mirror

      • If using Boot2docker make sure your VM have more than 2GB memory
      • In your/etc/hosts file add $ (boot2docker IP) as host ' sandbox ' to make it easier to access your sandbox UI
      • Open yarn UI ports when running container

Docker run-it-p 8088:8088-p 8042:8042-h sandbox sequenceiq/spark:1.4.0 Bash
Or
Docker run-d-H sandbox sequenceiq/spark:1.4.0-d

    • If you want to do interactive operations (such as shell scripts), then we must use the-I-T parameter to interact with the container for data. However, when interacting with a container through a pipeline, you do not need to use the-t parameter
    • -H to set hostname
    • If you use-p or-p, the container will open some ports to the host, as long as the other side can connect to the host, you can connect to the inside of the container. When using-P, Docker will randomly find an unoccupied port between 49153 and 65535 in the host to bind to the container. You can use Docker port to find this random bound port.

    • If you append-d=true or-D after the Docker run, the container will run in background mode. At this point, all I/O data can only be interacted over network resources or shared volume groups. Because the container no longer listens to this Terminal command-line window that you perform Docker run. But you can re-attach to the container's reply by executing the Docker attach. It is important to note that the – RM option is not available when the container is running in background mode.

    • -P 8,088:8,088 This port is ResourceManager or cluster,-p 8,042:8,042 This port is NodeManager port

      1. Version
        Hadoop 2.6.0 and Apache Spark v1.4.0 on Centos

      2. Test
        There is and deploy modes that can be used to launch Spark applications on YARN.

        • Yarn-client mode
In Yarn-cluster mode, theSpark driver runs inside anApplicationMaster process which isManaged byYARN on  theCluster and  theClient can go away AfterInitiating the Application. Estimating Pi (yarn-cluster mode):# Execute the following command which should write the ' Pi is roughly 3.1418 ' into the logs# Note You must specify--files argument on cluster mode to enable metricsSpark-submit--class org.apache.spark.examples.SparkPi \--files $SPARK _home/conf/metrics.properties \--master yarn-cluster \--driver-memory 1g \--executor-memory 1g \--executor-cores 1 \$SPARK _home/lib/spark-examples-1.4. 0-hadoop2. 6. 0. jar
    • Yarn-cluster mode
# execute the the following command which should print the "Pi is roughly 3.1418" to the screenspark-submit --class org.apache.spark.examples.SparkPi \--master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1 $SPARK_HOME/lib/spark-examples-1.4.0-hadoop2.6.0.jar

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Docker on Spark

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.