- Pull Mirror from Docker warehouse
Docker Pull sequenceiq/spark:1.4.0
Building a Docker image
Docker build–rm-t sequenceiq/spark:1.4.0.
The-t option is the tag of the Sequenceiq/spark image you want to build, just like ubuntu:13.10 – rm option is to tell Docker to delete the temporary container after the build is complete, Dockerfile each line of instructions will create a temporary container, generally you do not need these temporary generated container
Run mirror
- If using Boot2docker make sure your VM have more than 2GB memory
- In your/etc/hosts file add $ (boot2docker IP) as host ' sandbox ' to make it easier to access your sandbox UI
- Open yarn UI ports when running container
Docker run-it-p 8088:8088-p 8042:8042-h sandbox sequenceiq/spark:1.4.0 Bash
Or
Docker run-d-H sandbox sequenceiq/spark:1.4.0-d
- If you want to do interactive operations (such as shell scripts), then we must use the-I-T parameter to interact with the container for data. However, when interacting with a container through a pipeline, you do not need to use the-t parameter
- -H to set hostname
If you use-p or-p, the container will open some ports to the host, as long as the other side can connect to the host, you can connect to the inside of the container. When using-P, Docker will randomly find an unoccupied port between 49153 and 65535 in the host to bind to the container. You can use Docker port to find this random bound port.
If you append-d=true or-D after the Docker run, the container will run in background mode. At this point, all I/O data can only be interacted over network resources or shared volume groups. Because the container no longer listens to this Terminal command-line window that you perform Docker run. But you can re-attach to the container's reply by executing the Docker attach. It is important to note that the – RM option is not available when the container is running in background mode.
-P 8,088:8,088 This port is ResourceManager or cluster,-p 8,042:8,042 This port is NodeManager port
Version
Hadoop 2.6.0 and Apache Spark v1.4.0 on Centos
Test
There is and deploy modes that can be used to launch Spark applications on YARN.
In Yarn-cluster mode, theSpark driver runs inside anApplicationMaster process which isManaged byYARN on theCluster and theClient can go away AfterInitiating the Application. Estimating Pi (yarn-cluster mode):# Execute the following command which should write the ' Pi is roughly 3.1418 ' into the logs# Note You must specify--files argument on cluster mode to enable metricsSpark-submit--class org.apache.spark.examples.SparkPi \--files $SPARK _home/conf/metrics.properties \--master yarn-cluster \--driver-memory 1g \--executor-memory 1g \--executor-cores 1 \$SPARK _home/lib/spark-examples-1.4. 0-hadoop2. 6. 0. jar
# execute the the following command which should print the "Pi is roughly 3.1418" to the screenspark-submit --class org.apache.spark.examples.SparkPi \--master yarn-client --driver-memory 1g --executor-memory 1g --executor-cores 1 $SPARK_HOME/lib/spark-examples-1.4.0-hadoop2.6.0.jar
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Docker on Spark