Use fabric to deploy Baidu BMR spark cluster nodes

Source: Internet
Author: User

Use fabric to deploy Baidu BMR spark cluster nodes
Preface

The AI competition that I attended with my friends entered the finals for a while, and I have been imagining to combine the data preprocessing process with the deep learning phase. However, when merging the two parts of the Code, some problems were encountered, so the script file was specially written for processing. Now that the competition is over, I think we should write this part to see if other friends may encounter this problem and hope to help them. If anything is wrong, please correct me. Thank you!

A prerequisite for deploying each node of the cluster during the competition

During the preliminary round, in order to quickly provide data interfaces for subsequent deep learning models to be created and used, we made data preprocessing independent and used the simplest Python operation. Here, considering that our code needs to be transplanted to the computer used by the judges for verification, there may be situations where some libraries are not imported, and finally the program fails to run.

Troubles

After entering the finals, we had to re-import our database because of the urgent need to combine the two parts. However, if we are only on the Master node, the problem is very simple. We can pack the database directly and write a script. In Baidu's BMR spark cluster, because the Slaves node cannot access the network (for example), we need to log on to the Master node and then ssh to Slaves through the Master intranet, then we can open our script to deploy the program running environment.

Proposal

In this case, is there a good way to run a script on the Master to automatically deploy the running environment of all nodes in the entire cluster? After reading the spark best practices book, I learned about the third-party Python library fabric.

Fabric

First, you can introduce fabric. For details about how to use fabric, refer to the official API Doc. Here I will introduce a small part of fabric.

Execute local tasks

Fabric provideslocal("shell"). Shell is the shell command on Linux. For example

From fabric. api import local ('ls/root/') # list of files in the ls root folder
Execute a remote task

Fabric is powerful in that it does not execute commands locally, but can execute commands on a remote server instead, even if fabric is not installed on the remote server. It is implemented through ssh, so we need to define three parameters:

env.hosts = ['ipaddress1', 'ipaddress2']env.user = 'root'env.password = 'fuckyou.'

You can userun("shell")To execute the tasks we need on the remote server. For example

From fabric. api import run, env. hosts = ['ipaddress1', 'ipaddress2'] env. user = 'root' env. password = 'fuckyou. 'Run ('ls/root/') # list of files in the ls root folder
Open a folder

Sometimes we need to accurately open a folder and then execute a script or file under the file. Here, we need to use the following two interfaces:

Local
With LCD ('/root/local/'): local ('cat local.txt ') # cat local.txt file under'/root/local /'
Remote
With cd ('/root/distance/'): run ('cat distance.txt ') # cat remote'/root/distance/'distance.txt File
Execute a fabric task

We can use the command line

Fab -- fabfile = filename. py job_func # filename. py is a Python file written using fabric # job_func is a function with fabric, that is, the main function to be executed # the above two names can be obtained by themselves. In the following introduction, I am a job. py and job
Socket

Why is socket used? In the previous article, I mentioned that in the Baidu BMR cluster, they set the Server Load balancer cluster to use the Server Load balancer hostname instead of the ip address. Because the ip address is required when you use fabric to set the hosts in the environment, you must use the hostname to locate the ip address.

You may have doubts, why not directly set the Slaves IP address? However, each time Baidu BMR creates a spark cluster, the Intranet IP addresses it provides are constantly changing and the IP ends are increasing.

In summary, we still use hostname to get the IP address.

Gethostbyname Interface

We can usegethostbyname('hostname')Interface, input the hostname, and then get an IPV4 IP address.

Use fabric to write an automatic deployment script for each node to obtain the Server Load balancer hostname.

As mentioned in the previous article, our Server Load balancer hostname is stored in Baidu BMR:

'/opt/bmr/Hadoop/etc/hadoop/slaves'
Convert the hostname to an ip address and set the fabric env parameter.
host_list = []f = open(path, 'r')slaves_name = f.read().split('\n')for i in range(1, slaves_name.__len__()-1):    temp_name = slaves_name[i]    temp_ip = socket.gethostbyname(temp_name)    ip_port = temp_ip + ":22"    host_list.append(ip_port)    del temp_name    del temp_ip    del ip_portenv.user = 'root'env.password = '*gdut728'env.hosts = host_list
Compile the job to be automatically deployed

Here, what I want to automatically deploy is:
1. Download the Python third-party library jieba
2. decompress the downloaded jieba package locally.
3. Go to the decompressed folder and install jieba locally.
4. Transfer the downloaded package to the Slaves node.
5. decompress the downloaded jieba package on the remote end.
6. On the remote end, go to the decompressed folder and install jieba.

Convert the above steps into code, that is

def job():    local_command = "wget https://pypi.python.org/packages/71/46/c6f9179f73b818d5827202ad1c4a94e371a29473b7f043b736b4dab6b8cd/jieba-0.39.zip#md5=ca00c0c82bf5b8935e9c4dd52671a5a9"    local(local_command)    jieba_unzip = "unzip jieba-0.39.zip"    jieba_path = "/root/jieba-0.39/"    jieba_install = "python setup.py install"    local(jieba_unzip)    with lcd(jieba_path):        local("ls")        local(jieba_install)    with lcd('/root/'):        put("jieba-0.39.zip", '/root')    run(jieba_unzip)    with cd(jieba_path):        run("ls")        run(jieba_install)
Statement

Finally, in the shell script I mentioned in the previous article, add

yum -y install fabric && fab --fabfile=job.py job 

Input./start-hadoop-spark.shYou can deploy the running environment without any worries. Because of laziness and trouble, I used Python and shell to write the script for automatic deployment. In this process, I learned a lot of knowledge and encountered a lot of troubles. I wrote an article to ease your configuration troubles ~

The result is as follows:
Master:

Slaves1:

Slaves2:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.