Hadoop is a complex system mix and it's a hassle to build a Hadoop environment for production. But there are always some cows in this world who will help you solve some seemingly painful problems, if not now, that is sooner or later. CDH is the Cloudera of the Hadoop set environment, CDH related to the introduction please see www.cloudera.com, I will not say more. This is mainly about using CDH5.3 to install a Hadoop environment that can be used for production. Although people Cloudera to help you solve the problem of Hadoop installation, but the following is: The installation of Cloudera Manager is not easier than the installation of Hadoop, and there are many pits, in the following article we will step through.
First environmental preparation
First, the server preparation:
We are preparing a small cluster of 12, all servers installed Redhat 6.4 Server x64 operating system. Server hostname Unified named server[1-12].cdhwork.org, the intranet IP address is 192.168.10.[ 1-12], all servers must have a DNS server (either 202.96.209.5 or 8.8.8.8), and the root password for all servers must be set to the same .
Server Role Assignment Table
Server (cdhwork.org) |
IP Address |
the installed role |
Server1 |
192.168.10.1 |
CDH Local Mirror, Cloudera Manager, Time server |
Server2 |
192.168.10.2 |
Cloudera Management Service Host Monitor Cloudera Management Service Service Monitor |
Server3 |
192.168.10.3 |
HDFS NameNode Hive Gateway Impala Catalog Server Cloudera Management Service Alert Publisher Spark Gateway ZooKeeper Server |
Server4 |
192.168.10.4 |
HDFS Secondarynamenode Hive Gateway Impala Statestore SOLR Server Spark Gateway YARN (MR2 Included) ResourceManager ZooKeeper Server |
Server5 |
192.168.10.5 |
HDFS Balancer Hive Gateway Hue Server Cloudera Management Service Activity Monitor Oozie Server Spark Gateway Sqoop 2 Server ZooKeeper Server |
Server6 |
192.168.10.6 |
HBase Master Hive Gateway MapReduce Jobtracker SOLR Server Spark Gateway YARN (MR2 Included) jobhistory Server ZooKeeper Server |
Server7 |
192.168.10.7 |
HBase REST Server HBase Thrift Server Hive Metastore Server HiveServer2 Key-value Store Indexer Lily HBase Indexer Cloudera Management Service Event Server Spark History Server |
Server8 |
192.168.10.8 |
HBase Regionserver HDFS DataNode Impala Daemon MapReduce Tasktracker YARN (MR2 Included) NodeManager |
Server9 |
192.168.10.9 |
HBase Regionserver HDFS DataNode Impala Daemon MapReduce Tasktracker YARN (MR2 Included) NodeManager |
Server10 |
192.168.10.10 |
HBase Regionserver HDFS DataNode Impala Daemon MapReduce Tasktracker YARN (MR2 Included) NodeManager |
Server11 |
192.168.10.11 |
HBase Regionserver HDFS DataNode Impala Daemon MapReduce Tasktracker YARN (MR2 Included) NodeManager |
Server12 |
192.168.10.12 |
HBase Regionserver HDFS DataNode Impala Daemon MapReduce Tasktracker YARN (MR2 Included) NodeManager |
Follow the same action on all servers with the root account.
1. Turn off the firewall
/etc/init.d/iptables Stop #关闭防火墙chkconfig iptables off #设置启动时关闭防火墙服务
2. Turn off SELinux
Command line execution: Setenforce 0
Edit the configuration file to keep the settings after a restart:
Vi/etc/selinux/config
# This file controls the state of SELinux in the system.# selinux= can take one of the these three values:# Enforcing-se Linux security Policy is enforced.# permissive-selinux prints warnings instead of enforcing.# disabled-no SELi Nux policy is loaded. Selinux=disabled
Modify Selinux=disabled, save exit.
3. Speed up Memory release
Execute command: Sysctl vm.swappiness=0
Edit the configuration file to keep the settings after a restart:
Vi/etc/sysctl.conf
# controls the maximum shared segment size, in Byteskernel.shmmax = 68719476736# Controls The maximum number of shared mem Ory segments, in pageskernel.shmall = 4294967296vm.swappiness = 0
Add vm.swappiness = 0 to save the exit.
4. Close Redhat Memory Hugepage
Execute command: Echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag
Edit the configuration file to keep the settings after a restart:
Vi/etc/rc.local
#!/bin/sh## This script is executed *after* all the other init scripts.# you can put your own initialization stuff in Here if you don ' t# want to do the full Sys V style init Stuff.echo never >/sys/kernel/mm/redhat_transparent_hugepage/ Defragtouch/var/lock/subsys/local
Add echo Never >/sys/kernel/mm/redhat_transparent_hugepage/defrag, save exit.
5. Modify the Hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain Localhost6 localhost6.localdomain6# CDH Local mirror 192.168.10.1archive.cloudera.com# ClouderaManager192.168.10.1 server1.cdhwork.org# Cloudera Management Service Host monitor,cloudera Management Service Service monitor192.168.10.2server2.cdhwork.org# HDFS namenode,hive gateway,impala Catalog server,cloudera Management Service Alert publisher,spark gateway,zookeeper server192.168.10.3server3.cdhwork.org# HDFS secondarynamenode,hive Gateway, Impala STATESTORE,SOLR server,spark Gateway,yarn (MR2 Included) resourcemanager,zookeeper server192.168.10.4server4.cdhwork.org# HDFS balancer,hive Gateway,hue server,cloudera Management Service Activity Monitor,oozie Server,spark gateway,sqoop 2 server,zookeeper server192.168.10.5server5.cdhwork.org# HBase Master,Hive Gateway,mapreduce JOBTRACKER,SOLR server,spark Gateway,yarn (MR2 Included) jobhistory server,zookeeper Server, Postgresql-9.2192.168.10.6server6.cdhwork.org# HBase REST server,hbase Thrift server,hive metastore server,hiveserver2,key-value Store Indexer Lily HBase indexer,cloudera Management Service Event Server,spark History server192.168.10.7server7.cdhwork.org# HBase Regionserver,hdfs Datanode,impala daemon,mapreduce TaskTracker,YARN ( MR2 Included) Nodemanager192.168.10.8server8.cdhwork.org192.168.10.9server9.cdhwork.org192.168.10.10server10.cdhwork.org192.168.10.11se rver11.cdhwork.org192.168.10.12server12.cdhwork.org
6. Configure the Yum source
CD/ETC/YUM.REPOS.D/MV Rhel-source.repo RHEL-SOURCE.REPO.BAKVI Rhel-source.repo
[base]name=centos-6.6-basebaseurl=http://mirrors.163.com/centos/6.6/os/x86_64/gpgcheck=1gpgkey=http:// mirrors.163.com/centos/rpm-gpg-key-centos-6exclude=postgresql* #released updates[updates]name=centos-$releasever -updatesbaseurl=http://mirrors.163.com/centos/6.6/updates/x86_64/gpgcheck=1gpgkey=http://mirrors.163.com/ centos/rpm-gpg-key-centos-6exclude=postgresql* #packages used/produced in the build but not released#[addons] #name = centos-$releasever-addons#baseurl=http://mirrors.163.com/centos/6.6/addons/x86_64/#gpgcheck =1#gpgkey=http:// Mirrors.163.com/centos/rpm-gpg-key-centos-6#additional packages that could be useful[extras]name=centos-$releasever- extrasbaseurl=http://mirrors.163.com/centos/6.6/extras/x86_64/gpgcheck=1gpgkey=http://mirrors.163.com/centos/ Rpm-gpg-key-centos-6#additional packages that extend functionality of existing packages[centosplus]name=centos-$ Releasever-plusbaseurl=http://mirrors.163.com/centos/6.6/centosplus/x86_64/gpgcheck=1enabled=0
Save changes to exit. The Yum source was modified to make the later installation faster. of course, there is a premise that all of your servers can access the external network, if not, either install a proxy server, proxy access to the network, or directly do a yum source image to provide services. It is recommended that you build a yum image so that it is convenient and easy, and the only drawback is a bit of disk space.
Self-built mirror site can use wget-r <target> command to replicate all content under the target site, with the HTTPD service to all Web Access, if your replication site in/usr/site, you can directly under the/var/www/html/to create a soft connection:
Ln-s/usr/site/var/www/html/site
This way, you can use Http://ip/site to access the mirror site. Seriously, there are many practical tools under Linux, and Wget,ln is one of them.
7. Update the server environment to the latest settings
Yum Update
This is done in order to make the following Cloudera installation error-prone, because many times Cloudera will be inexplicably reported that the dependent RPM resource package does not exist, or the version is too old what. Just started not know how to solve, and then a cruel to do a system update, unexpectedly solved! Although I do not know why this, but finally solved the problem is not? So everyone work hard to update it, if the network speed is fast enough, it won't take much longer.
CDH5.3 cluster Installation Notes-environment preparation (1)