Spark on k8s trial steps

Last Update:2018-07-17 Source: Internet

Author: User

Tags throw exception k8s

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

background:

Spark 2.3.0 began to support the use of k8s as a resource management native dispatch Spark. The use of k8s native scheduling spark mainly has the following advantages: The use of k8s primary scheduling, no longer requires two-level scheduling, direct use of the K8s native scheduling module, to achieve the mixed with other applications; resource isolation: Tasks can be committed to the specified namespace, In this way, the qouta limit of k8s native can be reused to realize the limitation of task resources; Resource allocation: You can specify resource limits for each spark task, and tasks are more isolated; user-defined: Users can play their own application in the spark base mirror, more flexible and convenient ; trial Condition: a k8s 1.7 version of the cluster, because spark on k8s task is actually in the cluster in the form of custom resources and custom controller, so you need a 1.7+ Version of the k8s cluster, with the need to start k8s DNS and RBAC. Download spark2.3.0 version https://www.apache.org/dyn/closer.lua/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz trial steps: to make a mirror:

The following is the base image, which contains the spark and the official exemples, and the trial use of this article is the official exemple.

cd/path/to/spark-2.3.0-bin-hadoop2.7
Docker build-t <your.image.hub/yourns>/spark:2.3.0-f kubernetes/ Dockerfiles/spark/dockerfile.
Docker Push <your.image.hub/yourns>/spark:2.3.0

Users can put their own application and the base mirror together and set the path to start main class and application to implement the user application task submission. Task submission:

Bin/spark-submit \
    --master k8s://<k8s apiserver address> \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.container.image=<your.image.hub/yourns>/spark:2.3.0 \
    local:///opt/spark/examples/ Jars/spark-examples_2.11-2.3.0.jar

More default parameter configurations refer to: 1.spark running on k8s
Note the following pits: Spark Exemples is compiled with jdk1.8, if prompted during startup unsupported Major.minor Version 52.0 Please replace the JDK version; Spark-submit default will go to ~/.kube/config to load the cluster configuration, so please put K8s cluster config in the directory; Spark driver to start the error error:could Not find or Load main class Org.apache.spark.examples.SparkPi
Spark the local://of the boot parameter should be followed by your own spark application the path in the container; Spark driver Boot Throw exception caused by:java.net.UnknownHostException:kubernetes.default.svc:Try again, please ensure the network interoperability between the k8d let node; Spark D River Boot Throw Exception System:serviceaccount:default:default "Cannot get pods in the namespace" default, permission problem, execute two commands:
Kubect L Create rolebinding Default-view--clusterrole=view--serviceaccount=default:default--namespace=defalut and
KUBECTL Create rolebinding default-admin--clusterrole=admin--serviceaccount=default:default--namespace=default Then you can task execution:

Spark Demo ran up, you can see Spark-submit equivalent to a controller, used to manage a single spark task, will first create the service and driver the task, after driver run, will start Exeuctor, The number of--conf spark.executor.instances=5 specified parameters, after completion, submit will automatically delete Exeuctor, driver will be cleaned with the default GC mechanism. Reference:

Spark running on k8s
Issue #34377

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More