Use fabric to deploy Baidu BMR spark cluster nodes
Preface
The AI competition that I attended with my friends entered the finals for a while, and I have been imagining to combine the data preprocessing process with the deep learning phase. However, when merging the two parts of the Code, some problems were encountered, so the script file was specially written for processing. Now that the competition is over, I think we should write this part to see if other friends may encounter this problem and hope to help them. If anything is wrong, please correct me. Thank you!
A prerequisite for deploying each node of the cluster during the competition
During the preliminary round, in order to quickly provide data interfaces for subsequent deep learning models to be created and used, we made data preprocessing independent and used the simplest Python operation. Here, considering that our code needs to be transplanted to the computer used by the judges for verification, there may be situations where some libraries are not imported, and finally the program fails to run.
Troubles
After entering the finals, we had to re-import our database because of the urgent need to combine the two parts. However, if we are only on the Master node, the problem is very simple. We can pack the database directly and write a script. In Baidu's BMR spark cluster, because the Slaves node cannot access the network (for example), we need to log on to the Master node and then ssh to Slaves through the Master intranet, then we can open our script to deploy the program running environment.
Proposal
In this case, is there a good way to run a script on the Master to automatically deploy the running environment of all nodes in the entire cluster? After reading the spark best practices book, I learned about the third-party Python library fabric.
Fabric
First, you can introduce fabric. For details about how to use fabric, refer to the official API Doc. Here I will introduce a small part of fabric.
Execute local tasks
Fabric provideslocal("shell")
. Shell is the shell command on Linux. For example
From fabric. api import local ('ls/root/') # list of files in the ls root folder
Execute a remote task
Fabric is powerful in that it does not execute commands locally, but can execute commands on a remote server instead, even if fabric is not installed on the remote server. It is implemented through ssh, so we need to define three parameters:
env.hosts = ['ipaddress1', 'ipaddress2']env.user = 'root'env.password = 'fuckyou.'
You can userun("shell")
To execute the tasks we need on the remote server. For example
From fabric. api import run, env. hosts = ['ipaddress1', 'ipaddress2'] env. user = 'root' env. password = 'fuckyou. 'Run ('ls/root/') # list of files in the ls root folder
Open a folder
Sometimes we need to accurately open a folder and then execute a script or file under the file. Here, we need to use the following two interfaces:
Local
With LCD ('/root/local/'): local ('cat local.txt ') # cat local.txt file under'/root/local /'
Remote
With cd ('/root/distance/'): run ('cat distance.txt ') # cat remote'/root/distance/'distance.txt File
Execute a fabric task
We can use the command line
Fab -- fabfile = filename. py job_func # filename. py is a Python file written using fabric # job_func is a function with fabric, that is, the main function to be executed # the above two names can be obtained by themselves. In the following introduction, I am a job. py and job
Socket
Why is socket used? In the previous article, I mentioned that in the Baidu BMR cluster, they set the Server Load balancer cluster to use the Server Load balancer hostname instead of the ip address. Because the ip address is required when you use fabric to set the hosts in the environment, you must use the hostname to locate the ip address.
You may have doubts, why not directly set the Slaves IP address? However, each time Baidu BMR creates a spark cluster, the Intranet IP addresses it provides are constantly changing and the IP ends are increasing.
In summary, we still use hostname to get the IP address.
Gethostbyname Interface
We can usegethostbyname('hostname')
Interface, input the hostname, and then get an IPV4 IP address.
Use fabric to write an automatic deployment script for each node to obtain the Server Load balancer hostname.
As mentioned in the previous article, our Server Load balancer hostname is stored in Baidu BMR:
'/opt/bmr/Hadoop/etc/hadoop/slaves'
Convert the hostname to an ip address and set the fabric env parameter.
host_list = []f = open(path, 'r')slaves_name = f.read().split('\n')for i in range(1, slaves_name.__len__()-1): temp_name = slaves_name[i] temp_ip = socket.gethostbyname(temp_name) ip_port = temp_ip + ":22" host_list.append(ip_port) del temp_name del temp_ip del ip_portenv.user = 'root'env.password = '*gdut728'env.hosts = host_list
Compile the job to be automatically deployed
Here, what I want to automatically deploy is:
1. Download the Python third-party library jieba
2. decompress the downloaded jieba package locally.
3. Go to the decompressed folder and install jieba locally.
4. Transfer the downloaded package to the Slaves node.
5. decompress the downloaded jieba package on the remote end.
6. On the remote end, go to the decompressed folder and install jieba.
Convert the above steps into code, that is
def job(): local_command = "wget https://pypi.python.org/packages/71/46/c6f9179f73b818d5827202ad1c4a94e371a29473b7f043b736b4dab6b8cd/jieba-0.39.zip#md5=ca00c0c82bf5b8935e9c4dd52671a5a9" local(local_command) jieba_unzip = "unzip jieba-0.39.zip" jieba_path = "/root/jieba-0.39/" jieba_install = "python setup.py install" local(jieba_unzip) with lcd(jieba_path): local("ls") local(jieba_install) with lcd('/root/'): put("jieba-0.39.zip", '/root') run(jieba_unzip) with cd(jieba_path): run("ls") run(jieba_install)
Statement
Finally, in the shell script I mentioned in the previous article, add
yum -y install fabric && fab --fabfile=job.py job
Input./start-hadoop-spark.sh
You can deploy the running environment without any worries. Because of laziness and trouble, I used Python and shell to write the script for automatic deployment. In this process, I learned a lot of knowledge and encountered a lot of troubles. I wrote an article to ease your configuration troubles ~
The result is as follows:
Master:
Slaves1:
Slaves2: