Apache Spark 2.3 joins support native kubernetes and new feature documentation downloads

Source: Internet
Author: User
Tags documentation joins value of pi k8s

Guide to Questions

1. What is kubernetes.
2. Try new features in the Kubernetes cluster and how to implement it.
3. Watch the spark resource created on the cluster and how to operate it.

We need to know before we start.

What is Kubernetes
Kubernetes (usually written as "k8s") is the first open source container cluster management project that was ultimately contributed by Google Design and development to Cloud Native Computing Foundation. It is designed to provide a platform between host clusters that automates deployment, expands, and enables application containers to operate. Kubernetes typically works with Docker container tools and consolidates multiple host clusters running Docker containers.


Introduced

The open source community has been dedicated over the past year to support kubernetes data processing, analytics and machine learning workloads. New extension features in kubernetes, such as custom resources and custom controllers, can be used to create deep integration with individual applications and frameworks.

Traditionally, data processing workloads are already running in specialized settings such as the Yarn/hadoop stack. However, a unified control layer for all workloads on the kubernetes can simplify cluster management and increase resource utilization.




Apache Spark 2.3, with native kubernetes support, combines the large-scale data-processing framework with two famous Open-source projects; and Kubernetes.

The Apache Spark is an essential tool for data scientists, providing a powerful platform for various applications ranging from large-scale data conversion to analysis to machine learning. Data scientists consistently adopt containers to improve their workflows by implementing benefits such as dependency packaging and creating reproducible artifacts. Given that Kubernetes is the de facto standard for managing container environments, it is appropriate to support the Kubernetes API in Spark.

Specifically, the local spark application in Kubernetes acts as a custom controller that creates kubernetes resources to respond to requests made by the Spark scheduler. Instead of deploying Apache Spark in standalone mode in Kubernetes, the local approach provides fine-grained management of spark applications, improves resiliency, and integrates seamlessly with logging and monitoring solutions. The community is also exploring advanced use cases, such as managing streaming workloads and leveraging service grids such as Istio.


To try it on your kubernetes cluster, simply download the official Apache Spark 2.3 release binaries. For example, here we describe a simple spark application that calculates the mathematical constant pi between three spark executing programs, each running in a separate pane. Note that this requires running a cluster of kubernetes 1.7 or later, configured to access its KUBECTL client, and the RBAC rules required by the default namespace and service account.

[Bash Shell] Plain text view copy code

?

01 02 03 04 05 06 07 08 09 10 11-12 $ kubectl cluster-info kubernetes Master is running at https://xx. YY.ZZ.WW $ bin/spark-submit \--master k8s:// https://xx. yy.zz.ww \--deploy-mode cluster \--name spark-pi \--class Org.apache.spark.examples.SparkP i \--conf spark.executor.instances=5 \--conf spark.kubernetes.container.image=<spark-image> \--con F spark.kubernetes.driver.pod.name=spark-pi-driver \ Local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.J Ar



To view the spark resources created on the cluster, you can use the following KUBECTL commands in a separate terminal window.

[Bash Shell] Plain text view copy code

?

1 2 3 4 5 $ kubectl Get pods-l ' spark-role in (driver, executor) '-W NAME READY STATUS restarts age spark-pi-         Driver 1/1 Running 0 14s spark-pi-da1968a859653d6bab93f8e6503935f2-exec-1 0/1 Pending 0 0s ...



The results can be streamed by running during job execution:

[Bash Shell] Plain text view copy code

?

1 $ kubectl logs-f Spark-pi-driver



When the application completes, you should see the calculated value of pi in the driver log.

in Spark 2.3, we first support spark applications written in Java and Scala, and support resource localization from a variety of data sources, including Http,gcs,hdfs. We also pay close attention to the failure and restoration semantics of the spark performer, laying a solid foundation for future development. Start using the Open source document (https://spark.apache.org/docs/latest/running-on-kubernetes.html) immediately.


Participate

There's a lot of exciting work to do in the near future. We are actively studying such functions as dynamic resource allocation, dependency clustering, support for PYSPARK&SPARKR, support for the kerberized HDFs cluster, and client-side mode and the interactive execution environment of popular notebooks. For those who fall in love with Kubernetes's way of managing applications declaratively, we are also committed to kubernetes operator Spark-submit, which allows users to declaratively specify and submit spark applications.

We just started. We hope you will participate in and help us to further develop the project.

to join the Spark-dev and Spark-user mailing lists [https://spark.apache.org/community.html].
The Apache Spark jira[https://issues.apache.org/jira/issues/?jql=project+%3d+spark+and+component+ under the Kubernetes component %3d+kubernetes] to ask the question.
attend our SIG meeting at 10 O ' Wednesday morning [https://github.com/kubernetes/community/tree/master/sig-big-data].
Thank you very much, Apache Spark and kubernetes contributors are distributed across multiple organizations (Google,databricks,red Hat,palantir,bloomberg,cloudera,pepperdata, Datalayer,hyperpilot and so on), they spent hundreds of hours to finish the work. We look forward to seeing more people contribute to the project and help it develop further.



Document Download:

Describe

Pdf
57 page

Content



Link: Https://pan.baidu.com/s/1y4P2jYZ3aFHxSk3MwWa1MA Password: q1y7

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.