International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Working log--build spark cluster based on k8s

Last Update:2018-07-17 Source: Internet

Author: User

Tags pyspark k8s

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

using k8s to build spark cluster

These days try to build spark cluster in k8s, trample some pit, share with everybody.

Spark's component introduction can refer to official documentation
A brief introduction to the large data biosphere This article is based on the k8s official example

Specific reference GitHub k8s FAQ image Pull Problem

This method requires access to the Gcr.io download mirror (VPN is generally required in China), it should be noted that GCR.IO/GOOGLE_CONTAINERS/SPARK:1.5.2_V1 mirror can not be used Index.tenxcloud.com/google_ Containers/spark Replace, the "Docker:filesystem Layer verification Failed" error occurs when the mirror is fetched after the replacement.

You can modify the mirror used by Zeppelin-controller.yaml to INDEX.TENXCLOUD.COM/GOOGLE_CONTAINERS/ZEPPELIN:V0.5.6_V1 WebUI service usage issues

The kubectl proxy–port=8001 directive in the document can only listen to 127.0.0.1 proxy requests, not the test environment and the virtual machine environment, because the IP address used is not 127.0.0.1.
Use Kubectl proxy–port=8001–address=\ Pyspark example Run error at this time

There is a problem with the data source in the example, which can be run using local files, such as "Sc.textfile ("/opt/spark/licenses/* "). Map (Lambda S:len (S.split ()). SUM ()" Zeppelin WebUI Usage Issues

The same can only be accessed through localhost or 127.0.0.1, by configuring the Zeppelin service type as Nodeport. Refer to the Zeppelin-service.yaml in Spark-20160427.zip.
Using Zeppelin-service.yaml to create the Zeppelin service, you can specify the port by Spec.ports.nodePort, and the port is random when not specified. Use the Kubectl describe Svc zeppelin|grep nodeport command to view the port. Access any node in the browser Ip:nodeport Access Zeppelin WebUI. Click "Create New Note" To enter the note Name.

In the new page, do the following:

%pyspark
Print sc.textfile ("/opt/spark/licenses/*"). Map (Lambda S:len (S.split ()). SUM ()

The example counts the number of words for all files in the local/opt/spark/licenses/directory, and then zeppelin the execution results after a few seconds. build based on Tenxcloud Mirror Image Library

According to the k8s source code in the examples/spark/under the Yaml file to build, all Yaml files copied to the working directory.

Modify Spark-master-controller.yaml and Spark-worker-controller.yaml:
* Spec.template.spec.containers.command are modified to "/start.sh"
* Spec.template.spec.containers.images modified to index.tenxcloud.com/google_containers/spark-master:1.5.2_ respectively V1 and INDEX.TENXCLOUD.COM/GOOGLE_CONTAINERS/SPARK-WORKER:1.5.2_V1

The mirror used by Zeppelin-controller.yaml is modified to INDEX.TENXCLOUD.COM/GOOGLE_CONTAINERS/ZEPPELIN:V0.5.6_V1

Once the modifications are complete, start by following the steps in the official K8s example. a simple spark-driver

Because Zeppelin mirrors are very large, pull takes a lot of time. You can use the following spark-driver.yaml to create a simple spark-driver:

Kind:replicationcontroller
apiversion:v1
metadata:
  name:spark-driver 
Spec:
  replicas:1
  Selector:
    component:spark-driver
  Template:
    metadata: Labels
      :
        component:spark-driver
    Spec:
      containers:
        -Name:spark-driver
          image:index.tenxcloud.com/google_containers/spark-driver : 1.5.2_V1 Resources
          :
            requests:
              cpu:100m

Once created, it can be accessed using KUBECTL exec <spark-driver-podname>-it pyspark.

YAML Configuration Reference here

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

tensorflow on spark how to build gpu cluster coins based on ethereum games based on movies install apache spark on ubuntu k8s local k8s dashboard

OpenGL Series Tutorial Eight: OpenGL vertex buffer Object (VBO) 07-26

Methods for generating various waveform files Vcd,vpd,shm,fsdb 02-11

Mac Ping:sendto:Host is down Ping does not pass other people'... 09-01

Solution to the problem that WordPress cannot be opened after... 12-05

(SOLR is successfully installed on the office machine accordi... 12-07

Webmaster resources (site creation required) 12-07

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Working log--build spark cluster based on k8s

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support