Working log--build spark cluster based on k8s

Source: Internet
Author: User
Tags pyspark k8s
using k8s to build spark cluster

These days try to build spark cluster in k8s, trample some pit, share with everybody.

Spark's component introduction can refer to official documentation
A brief introduction to the large data biosphere This article is based on the k8s official example

Specific reference GitHub k8s FAQ image Pull Problem

This method requires access to the Gcr.io download mirror (VPN is generally required in China), it should be noted that GCR.IO/GOOGLE_CONTAINERS/SPARK:1.5.2_V1 mirror can not be used Index.tenxcloud.com/google_ Containers/spark Replace, the "Docker:filesystem Layer verification Failed" error occurs when the mirror is fetched after the replacement.

You can modify the mirror used by Zeppelin-controller.yaml to INDEX.TENXCLOUD.COM/GOOGLE_CONTAINERS/ZEPPELIN:V0.5.6_V1 WebUI service usage issues

The kubectl proxy–port=8001 directive in the document can only listen to 127.0.0.1 proxy requests, not the test environment and the virtual machine environment, because the IP address used is not 127.0.0.1.
Use Kubectl proxy–port=8001–address=\ Pyspark example Run error at this time

There is a problem with the data source in the example, which can be run using local files, such as "Sc.textfile ("/opt/spark/licenses/* "). Map (Lambda S:len (S.split ()). SUM ()" Zeppelin WebUI Usage Issues

The same can only be accessed through localhost or 127.0.0.1, by configuring the Zeppelin service type as Nodeport. Refer to the Zeppelin-service.yaml in Spark-20160427.zip.
Using Zeppelin-service.yaml to create the Zeppelin service, you can specify the port by Spec.ports.nodePort, and the port is random when not specified. Use the Kubectl describe Svc zeppelin|grep nodeport command to view the port. Access any node in the browser Ip:nodeport Access Zeppelin WebUI. Click "Create New Note" To enter the note Name.

In the new page, do the following:

%pyspark
Print sc.textfile ("/opt/spark/licenses/*"). Map (Lambda S:len (S.split ()). SUM ()

The example counts the number of words for all files in the local/opt/spark/licenses/directory, and then zeppelin the execution results after a few seconds. build based on Tenxcloud Mirror Image Library

According to the k8s source code in the examples/spark/under the Yaml file to build, all Yaml files copied to the working directory.

Modify Spark-master-controller.yaml and Spark-worker-controller.yaml:
* Spec.template.spec.containers.command are modified to "/start.sh"
* Spec.template.spec.containers.images modified to index.tenxcloud.com/google_containers/spark-master:1.5.2_ respectively V1 and INDEX.TENXCLOUD.COM/GOOGLE_CONTAINERS/SPARK-WORKER:1.5.2_V1

The mirror used by Zeppelin-controller.yaml is modified to INDEX.TENXCLOUD.COM/GOOGLE_CONTAINERS/ZEPPELIN:V0.5.6_V1

Once the modifications are complete, start by following the steps in the official K8s example. a simple spark-driver

Because Zeppelin mirrors are very large, pull takes a lot of time. You can use the following spark-driver.yaml to create a simple spark-driver:

Kind:replicationcontroller
apiversion:v1
metadata:
  name:spark-driver 
Spec:
  replicas:1
  Selector:
    component:spark-driver
  Template:
    metadata: Labels
      :
        component:spark-driver
    Spec:
      containers:
        -Name:spark-driver
          image:index.tenxcloud.com/google_containers/spark-driver : 1.5.2_V1 Resources
          :
            requests:
              cpu:100m

Once created, it can be accessed using KUBECTL exec <spark-driver-podname>-it pyspark.

YAML Configuration Reference here

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.