Just after the Android project ahead, the project summary article has not finished, the company needs to study the large data processing application platform, the task reaches our department, in view of the department physical machine only one, and the virtual machine starts too slow reason, oneself do-it-yourself in Docker set up three three node data analysis cluster, Mainly includes HDFs cluster (distributed storage cluster), yarn cluster (distributed resource management cluster), spark cluster (distributed computing cluster).
Before you start the body, need to have the following basic knowledge: Linux Basics (recommended "Brother Bird's Linux private dish", I looked at the time is the third edition, now has a new version); Doceker image, Container and warehouse concepts (recommended "docker-from Introduction to Practice"); The basic concepts and principles of Hadoop;
Set up data analysis on CENTOS7 cluster process includes: Installing Docker on CNETOS7 and creating a Hadoop mirror and three-node container configuring three nodes on Docker HDFS cluster configuring three nodes docker cluster on yarn Configure the three-node spark cluster on Docker (i) Install Docker and create a Hadoop mirror and three-node container 1.1 installation Dcoker
This article installs the Docker on the CNETOS7 system, installs the Docker to the Linux system request is
64-bit operating system with a kernel version of at least 3.10. 1.1.1 Installation Docker
Curl-ssl https://get.docker.com/| SH
1
1
1.1.2 Configure Docker Accelerator and power-on boot service
Here need to register a Aliyun account, each account has its own dedicated accelerator, the exclusive accelerator address, according to their own address match.
sudo cp-n/lib/systemd/system/docker.service/etc/systemd/system/docker.service
sudo systemctl daemon-reload
sudo service docker restart
1
2
3
1
2
3
1.2 Creating a Hadoop mirror on the Docker
1.2.1 from Docker Hub official warehouse to get CentOS Mirror library
Docker pull CentOS
#查看镜像库
docker images
1
2
3
1
2
3
1.2.2 generates a CentOS mirror file with SSH functionality
In order to configure the SSH password-free login between the nodes, you need to install SSH in the CentOS Mirror library of Pull.
This uses the Dockerfile file to create a mirror
Cd/usr/local # Create a directory that holds CentOS mirrored dockerfile files with SSH mkdir dockerimagesfiles/centos7.shh #创建带ssh的centos的Dockerfile file VI Dockerfile # dockerfile file content #基于centos镜像库创建 from CentOS maintainer dys #安装ssh RUN yum install-y openssh-server sudo run s Ed-i ' s/usepam yes/usepam no/g '/etc/ssh/sshd_config run yum install-y openssh-clients #配置root名 run echo "root:123456" | Chpasswd run echo "root all= (All)" >>/etc/sudoers #生成ssh key run Ssh-keygen-t dsa-f/etc/ssh/ssh_host _dsa_key run ssh-keygen-t rsa-f/etc/ssh/ssh_host_rsa_key #配置sshd服务 run mkdir/var/run/sshd expose CMD ["/usr/sbin/s
SHD ","-D "] 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28
1 2
3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23
24 25 26 27 28
1.2.3 generates CENTOS7-SSH mirrors based on the dockerfile above
Docker build-t= "Centos7-ssh".
#执行完成后, view installed mirrored libraries
Docker images
1
2
3
1
2
3
1.2.4 Generate Hadoop mirrored library files
In the directory where you build the Hadoop mirror Library, upload the downloaded jdk-8u101-linux-x64.tar.gz, hadoop-2.7.3.tar.gz,scala-2.11.8.tgz, Dockerfile, Spark-2.0.1-bin-hadoop2.7.tgz. Note: Configure the environment variable in advance in the Dockerfile file, if the mirror library is built it does not work to configure environment variables in a container.
Cd/usr/local # Create a directory that holds the Hadoop mirrored dockerfile file mkdir dockerimagesfiles/hadoop #创建带ssh的centos的Dockerfile file VI dockerfile # dockerfile file content #基于centos7-ssh build from Centos7-ssh #安装java ADD jdk-8u101-linux-x64.tar.gz/usr/local/run mv/usr/local/ jdk1.8.0_101/usr/local/jdk1.8 #配置JAVA环境变量 env java_home/usr/local/jdk1.8 env PATH $JAVA _home/bin: $PATH #安装hadoop ADD ha Doop-2.7.3.tar.gz/usr/local RUN Mv/usr/local/hadoop-2.7.3/usr/local/hadoop