background:
Spark 2.3.0 began to support the use of k8s as a resource management native dispatch Spark. The use of k8s native scheduling spark mainly has the following advantages: The use of k8s primary scheduling, no longer requires two-level scheduling, direct use of the K8s native scheduling module, to achieve the mixed with other applications; resource isolation: Tasks can be committed to the specified namespace, In this way, the qouta limit of k8s native can be reused to realize the limitation of task resources; Resource allocation: You can specify resource limits for each spark task, and tasks are more isolated; user-defined: Users can play their own application in the spark base mirror, more flexible and convenient ; trial Condition: a k8s 1.7 version of the cluster, because spark on k8s task is actually in the cluster in the form of custom resources and custom controller, so you need a 1.7+ Version of the k8s cluster, with the need to start k8s DNS and RBAC. Download spark2.3.0 version https://www.apache.org/dyn/closer.lua/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz trial steps: to make a mirror:
The following is the base image, which contains the spark and the official exemples, and the trial use of this article is the official exemple.
cd/path/to/spark-2.3.0-bin-hadoop2.7
Docker build-t <your.image.hub/yourns>/spark:2.3.0-f kubernetes/ Dockerfiles/spark/dockerfile.
Docker Push <your.image.hub/yourns>/spark:2.3.0
Users can put their own application and the base mirror together and set the path to start main class and application to implement the user application task submission. Task submission:
Bin/spark-submit \
--master k8s://<k8s apiserver address> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=<your.image.hub/yourns>/spark:2.3.0 \
local:///opt/spark/examples/ Jars/spark-examples_2.11-2.3.0.jar
More default parameter configurations refer to: 1.spark running on k8s
Note the following pits: Spark Exemples is compiled with jdk1.8, if prompted during startup unsupported Major.minor Version 52.0 Please replace the JDK version; Spark-submit default will go to ~/.kube/config to load the cluster configuration, so please put K8s cluster config in the directory; Spark driver to start the error error:could Not find or Load main class Org.apache.spark.examples.SparkPi
Spark the local://of the boot parameter should be followed by your own spark application the path in the container; Spark driver Boot Throw exception caused by:java.net.UnknownHostException:kubernetes.default.svc:Try again, please ensure the network interoperability between the k8d let node; Spark D River Boot Throw Exception System:serviceaccount:default:default "Cannot get pods in the namespace" default, permission problem, execute two commands:
Kubect L Create rolebinding Default-view--clusterrole=view--serviceaccount=default:default--namespace=defalut and
KUBECTL Create rolebinding default-admin--clusterrole=admin--serviceaccount=default:default--namespace=default Then you can task execution:
Spark Demo ran up, you can see Spark-submit equivalent to a controller, used to manage a single spark task, will first create the service and driver the task, after driver run, will start Exeuctor, The number of--conf spark.executor.instances=5 specified parameters, after completion, submit will automatically delete Exeuctor, driver will be cleaned with the default GC mechanism. Reference:
Spark running on k8s
Issue #34377