CDH5.3 cluster Installation Notes-environment preparation (1)

Source: Internet
Author: User
Tags gpg postgresql solr sqoop value store

Hadoop is a complex system mix and it's a hassle to build a Hadoop environment for production. But there are always some cows in this world who will help you solve some seemingly painful problems, if not now, that is sooner or later. CDH is the Cloudera of the Hadoop set environment, CDH related to the introduction please see www.cloudera.com, I will not say more. This is mainly about using CDH5.3 to install a Hadoop environment that can be used for production. Although people Cloudera to help you solve the problem of Hadoop installation, but the following is: The installation of Cloudera Manager is not easier than the installation of Hadoop, and there are many pits, in the following article we will step through.

First environmental preparation

First, the server preparation:

We are preparing a small cluster of 12, all servers installed Redhat 6.4 Server x64 operating system. Server hostname Unified named server[1-12].cdhwork.org, the intranet IP address is 192.168.10.[ 1-12], all servers must have a DNS server (either 202.96.209.5 or 8.8.8.8), and the root password for all servers must be set to the same .

Server Role Assignment Table
Server (cdhwork.org) IP Address the installed role
Server1 192.168.10.1 CDH Local Mirror, Cloudera Manager, Time server
Server2 192.168.10.2 Cloudera Management Service Host Monitor
Cloudera Management Service Service Monitor
Server3 192.168.10.3 HDFS NameNode
Hive Gateway
Impala Catalog Server
Cloudera Management Service Alert Publisher
Spark Gateway
ZooKeeper Server
Server4 192.168.10.4 HDFS Secondarynamenode
Hive Gateway
Impala Statestore
SOLR Server
Spark Gateway
YARN (MR2 Included) ResourceManager
ZooKeeper Server
Server5 192.168.10.5 HDFS Balancer
Hive Gateway
Hue Server
Cloudera Management Service Activity Monitor
Oozie Server
Spark Gateway
Sqoop 2 Server
ZooKeeper Server
Server6 192.168.10.6 HBase Master
Hive Gateway
MapReduce Jobtracker
SOLR Server
Spark Gateway
YARN (MR2 Included) jobhistory Server
ZooKeeper Server
Server7 192.168.10.7 HBase REST Server
HBase Thrift Server
Hive Metastore Server
HiveServer2
Key-value Store Indexer Lily HBase Indexer
Cloudera Management Service Event Server
Spark History Server
Server8 192.168.10.8 HBase Regionserver
HDFS DataNode
Impala Daemon
MapReduce Tasktracker
YARN (MR2 Included) NodeManager
Server9 192.168.10.9 HBase Regionserver
HDFS DataNode
Impala Daemon
MapReduce Tasktracker
YARN (MR2 Included) NodeManager
Server10 192.168.10.10 HBase Regionserver
HDFS DataNode
Impala Daemon
MapReduce Tasktracker
YARN (MR2 Included) NodeManager
Server11 192.168.10.11 HBase Regionserver
HDFS DataNode
Impala Daemon
MapReduce Tasktracker
YARN (MR2 Included) NodeManager
Server12 192.168.10.12 HBase Regionserver
HDFS DataNode
Impala Daemon
MapReduce Tasktracker
YARN (MR2 Included) NodeManager

Follow the same action on all servers with the root account.

1. Turn off the firewall

/etc/init.d/iptables Stop #关闭防火墙chkconfig iptables off    #设置启动时关闭防火墙服务

2. Turn off SELinux
Command line execution: Setenforce 0
Edit the configuration file to keep the settings after a restart:

Vi/etc/selinux/config

# This file controls the state of SELinux in the system.# selinux= can take one of the these three values:#     Enforcing-se Linux security Policy is enforced.#     permissive-selinux prints warnings instead of enforcing.#     disabled-no SELi Nux policy is loaded. Selinux=disabled
Modify Selinux=disabled, save exit.

3. Speed up Memory release

Execute command: Sysctl vm.swappiness=0

Edit the configuration file to keep the settings after a restart:

Vi/etc/sysctl.conf

# controls the maximum shared segment size, in Byteskernel.shmmax = 68719476736# Controls The maximum number of shared mem Ory segments, in pageskernel.shmall = 4294967296vm.swappiness = 0
Add vm.swappiness = 0 to save the exit.

4. Close Redhat Memory Hugepage

Execute command: Echo never >/sys/kernel/mm/redhat_transparent_hugepage/defrag
Edit the configuration file to keep the settings after a restart:
Vi/etc/rc.local

#!/bin/sh## This script is executed *after* all the other init scripts.# you can put your own initialization stuff in Here if you don ' t# want to do the full Sys V style init Stuff.echo never >/sys/kernel/mm/redhat_transparent_hugepage/ Defragtouch/var/lock/subsys/local
Add echo Never >/sys/kernel/mm/redhat_transparent_hugepage/defrag, save exit.

5. Modify the Hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain Localhost6 localhost6.localdomain6# CDH Local mirror 192.168.10.1archive.cloudera.com# ClouderaManager192.168.10.1 server1.cdhwork.org# Cloudera Management Service Host monitor,cloudera Management Service Service monitor192.168.10.2server2.cdhwork.org# HDFS namenode,hive gateway,impala Catalog server,cloudera Management Service Alert publisher,spark gateway,zookeeper server192.168.10.3server3.cdhwork.org# HDFS secondarynamenode,hive Gateway, Impala STATESTORE,SOLR server,spark Gateway,yarn (MR2 Included) resourcemanager,zookeeper server192.168.10.4server4.cdhwork.org# HDFS balancer,hive Gateway,hue server,cloudera Management Service Activity Monitor,oozie Server,spark gateway,sqoop 2 server,zookeeper server192.168.10.5server5.cdhwork.org# HBase Master,Hive Gateway,mapreduce JOBTRACKER,SOLR server,spark Gateway,yarn (MR2 Included) jobhistory server,zookeeper Server, Postgresql-9.2192.168.10.6server6.cdhwork.org# HBase REST server,hbase Thrift server,hive metastore server,hiveserver2,key-value Store Indexer Lily HBase indexer,cloudera Management Service Event Server,spark History server192.168.10.7server7.cdhwork.org# HBase Regionserver,hdfs Datanode,impala daemon,mapreduce TaskTracker,YARN ( MR2 Included) Nodemanager192.168.10.8server8.cdhwork.org192.168.10.9server9.cdhwork.org192.168.10.10server10.cdhwork.org192.168.10.11se rver11.cdhwork.org192.168.10.12server12.cdhwork.org

6. Configure the Yum source

CD/ETC/YUM.REPOS.D/MV Rhel-source.repo RHEL-SOURCE.REPO.BAKVI Rhel-source.repo

[base]name=centos-6.6-basebaseurl=http://mirrors.163.com/centos/6.6/os/x86_64/gpgcheck=1gpgkey=http:// mirrors.163.com/centos/rpm-gpg-key-centos-6exclude=postgresql* #released updates[updates]name=centos-$releasever -updatesbaseurl=http://mirrors.163.com/centos/6.6/updates/x86_64/gpgcheck=1gpgkey=http://mirrors.163.com/ centos/rpm-gpg-key-centos-6exclude=postgresql* #packages used/produced in the build but not released#[addons] #name = centos-$releasever-addons#baseurl=http://mirrors.163.com/centos/6.6/addons/x86_64/#gpgcheck =1#gpgkey=http:// Mirrors.163.com/centos/rpm-gpg-key-centos-6#additional packages that could be useful[extras]name=centos-$releasever- extrasbaseurl=http://mirrors.163.com/centos/6.6/extras/x86_64/gpgcheck=1gpgkey=http://mirrors.163.com/centos/ Rpm-gpg-key-centos-6#additional packages that extend functionality of existing packages[centosplus]name=centos-$ Releasever-plusbaseurl=http://mirrors.163.com/centos/6.6/centosplus/x86_64/gpgcheck=1enabled=0

Save changes to exit. The Yum source was modified to make the later installation faster. of course, there is a premise that all of your servers can access the external network, if not, either install a proxy server, proxy access to the network, or directly do a yum source image to provide services. It is recommended that you build a yum image so that it is convenient and easy, and the only drawback is a bit of disk space.

Self-built mirror site can use wget-r <target> command to replicate all content under the target site, with the HTTPD service to all Web Access, if your replication site in/usr/site, you can directly under the/var/www/html/to create a soft connection:

Ln-s/usr/site/var/www/html/site

This way, you can use Http://ip/site to access the mirror site. Seriously, there are many practical tools under Linux, and Wget,ln is one of them.

7. Update the server environment to the latest settings

Yum Update

This is done in order to make the following Cloudera installation error-prone, because many times Cloudera will be inexplicably reported that the dependent RPM resource package does not exist, or the version is too old what. Just started not know how to solve, and then a cruel to do a system update, unexpectedly solved! Although I do not know why this, but finally solved the problem is not? So everyone work hard to update it, if the network speed is fast enough, it won't take much longer.

CDH5.3 cluster Installation Notes-environment preparation (1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.