Using Docker to build large data-processing cluster _

Using Docker to build large data-processing cluster __c language

Last Update:2018-07-26 Source: Internet

Author: User

Tags mkdir docker hub

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Just after the Android project ahead, the project summary article has not finished, the company needs to study the large data processing application platform, the task reaches our department, in view of the department physical machine only one, and the virtual machine starts too slow reason, oneself do-it-yourself in Docker set up three three node data analysis cluster, Mainly includes HDFs cluster (distributed storage cluster), yarn cluster (distributed resource management cluster), spark cluster (distributed computing cluster).
Before you start the body, need to have the following basic knowledge: Linux Basics (recommended "Brother Bird's Linux private dish", I looked at the time is the third edition, now has a new version); Doceker image, Container and warehouse concepts (recommended "docker-from Introduction to Practice"); The basic concepts and principles of Hadoop;

Set up data analysis on CENTOS7 cluster process includes: Installing Docker on CNETOS7 and creating a Hadoop mirror and three-node container configuring three nodes on Docker HDFS cluster configuring three nodes docker cluster on yarn Configure the three-node spark cluster on Docker (i) Install Docker and create a Hadoop mirror and three-node container 1.1 installation Dcoker

This article installs the Docker on the CNETOS7 system, installs the Docker to the Linux system request is
64-bit operating system with a kernel version of at least 3.10. 1.1.1 Installation Docker

Curl-ssl https://get.docker.com/| SH
   
   
    
    1
   
   
   
   
    
    1

1.1.2 Configure Docker Accelerator and power-on boot service

Here need to register a Aliyun account, each account has its own dedicated accelerator, the exclusive accelerator address, according to their own address match.

sudo cp-n/lib/systemd/system/docker.service/etc/systemd/system/docker.service
sudo systemctl daemon-reload
sudo service docker restart
   
   
    
    1
    
    2
    
    3
   
   
   
   
    
    1
    
    2
    
    3

1.2 Creating a Hadoop mirror on the Docker 1.2.1 from Docker Hub official warehouse to get CentOS Mirror library

Docker pull CentOS
#查看镜像库
docker images
   
   
    
    1
    
    2
    
    3
   
   
   
   
    
    1
    
    2
    
    3

1.2.2 generates a CentOS mirror file with SSH functionality

In order to configure the SSH password-free login between the nodes, you need to install SSH in the CentOS Mirror library of Pull.
This uses the Dockerfile file to create a mirror

Cd/usr/local # Create a directory that holds CentOS mirrored dockerfile files with SSH mkdir dockerimagesfiles/centos7.shh #创建带ssh的centos的Dockerfile file VI Dockerfile # dockerfile file content #基于centos镜像库创建 from CentOS maintainer dys #安装ssh RUN yum install-y openssh-server sudo run s  Ed-i ' s/usepam yes/usepam no/g '/etc/ssh/sshd_config run yum install-y openssh-clients #配置root名 run echo "root:123456" | Chpasswd run echo "root all= (All)" >>/etc/sudoers #生成ssh key run Ssh-keygen-t dsa-f/etc/ssh/ssh_host _dsa_key run ssh-keygen-t rsa-f/etc/ssh/ssh_host_rsa_key #配置sshd服务 run mkdir/var/run/sshd expose CMD ["/usr/sbin/s
    
    SHD ","-D "] 1 2 3 4 5 6 7 8 9
    
    10 11 12 13 14 15 16 17 18 19
   
   
   
   
    
    20 21 22 23 24 25 26 27 28
   1 2 
    3 4 5 6 7 8 9 10 11 12 13
    
    14 15 16 17 18 19 20 21 22 23
   
    24 25 26 27 28

1.2.3 generates CENTOS7-SSH mirrors based on the dockerfile above

Docker build-t= "Centos7-ssh".
#执行完成后, view installed mirrored libraries
Docker images
   
   
    
    1
    
    2
    
    3
   
   
   
   
    
    1
    
    2
    
    3

1.2.4 Generate Hadoop mirrored library files

In the directory where you build the Hadoop mirror Library, upload the downloaded jdk-8u101-linux-x64.tar.gz, hadoop-2.7.3.tar.gz,scala-2.11.8.tgz, Dockerfile, Spark-2.0.1-bin-hadoop2.7.tgz. Note: Configure the environment variable in advance in the Dockerfile file, if the mirror library is built it does not work to configure environment variables in a container.

Cd/usr/local # Create a directory that holds the Hadoop mirrored dockerfile file mkdir dockerimagesfiles/hadoop #创建带ssh的centos的Dockerfile file VI dockerfile # dockerfile file content #基于centos7-ssh build from Centos7-ssh #安装java ADD jdk-8u101-linux-x64.tar.gz/usr/local/run mv/usr/local/ jdk1.8.0_101/usr/local/jdk1.8 #配置JAVA环境变量 env java_home/usr/local/jdk1.8 env PATH $JAVA _home/bin: $PATH #安装hadoop ADD ha Doop-2.7.3.tar.gz/usr/local RUN Mv/usr/local/hadoop-2.7.3/usr/local/hadoop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More