Hadoop Cluster CDH System setup (i.)

Source: Internet
Author: User
Tags mysql version ssh centos iptables yum repolist

First of all, to ask, what is CDH?
To install a Hadoop cluster that deploys 100 or even 1000 servers, package I including hive,hbase,flume ... Components, a day to build the complete, there is to consider the system after the update asked questions, then need to CDH

Advantages of the CDH version:
Clear Version Division
Faster version update
Support for Kerberos security authentication
Document Clarity (Official document)
Supports multiple installation methods (Cloudera Manager mode)

How the framework is installed:
Yum
Rpm
Tar
CM (for CDH version only)

1, Big Data each service component to the server resource demand situation.

Namenode: There is an estimate of the amount of memory required in memory and the size of the HDFS: 1000M memory--1 million-piece metadata has certain requirements for network bandwidth and slave node Datanode Data interaction Datanode: Disk space 4-24t disk function: Redundant matrix (Backup of data) HDFS data itself will have 3 replicas ResourceManager: High network bandwidth requirements are often associated with 
        Namenode deployment together with NodeManager: Memory is deployed with Datanode. The MapReduce calculation principle is mobile computing rather than moving data zookeeper: Memory requirements are not high disk capacity requirements disk read and write speed---SSD SSD network bandwidth requirements are very high hbase:maste R: can be deployed with Namenode and ResourceManager to make hot standby network bandwidth requirements are high because the load rate is low, so the resource requirements on the server is not very high regionserv ER: Memory with Datanode deployment: Write Memory Memstore 128M depending on the number of stores me 
                    Mstore Total size Region structure: region is the HBase table in the direction of the Rowkey Line Division region: 
                Multiple stores, the number of stores is determined by the list of clusters each store consists of one memstore and multiple storefile of read memory Blockcache heap_size*0.4 each regionserver previous Blockcache   

Spark Cluster Memory Requirements Cup  
 

2. Cloudera Manager Technical Architecture

Server
Master node, deployed on a single server (server service process requires 8G of memory reserved)
communicate with each slave node to gather information about resources, processes, etc. sent from the node
Responsible for installation and deployment of cluster framework components
Responsible for cluster start-up and stop
......

Agent
From the node
Also known as the host (so the big data framework of the installation and deployment of the service process and the operation of the task is based on the host node)
Collects resource information and other framework components ' health information on the server, and then reports to the server uniformly

Database
CM requires a database support to store metadata information for the cluster under CM management and health information for each serviced component process
Metadata: Number of clusters, host name, deployment allocation information for the framework, etc.
Hive also requires a database of supported storage Metastore, shared with the CM database

Cloudera managerment Service:
A set of monitoring components, a set of service processes
A component that communicates with the server and truly monitors cluster resource information

Web-ui:
CM provides a web operator interface
Using an embedded jetty server

3. Resource requirements

Server  needs to reserve memory 8G (single Server service process) 
Agent  requirements 1-2g memory (excluding other big data service processes)
disk   30G disk  
bandwidth    is the cluster intranet IO   100m/s or more  

4. Software Environment
Linux
jdk1.7
Database
Mysql
5.5 Installation of Impala is not supported
5.6
– Select: Specify 5.6 version by modifying Yum source, install via Yum
Oracle
PostgreSQL
5. Network Configuration
Big data clusters typically use intranet clusters
Host name Settings
Bigdata01.beifeng.com
Cannot appear _
Disable IPv6
Turn off the firewall and disable SELinux
6. Users
It is recommended that you use the root user when you start to set up basic settings

CM Installation Deployment Readiness:
First, the Environment preparation
# free-m
# df-h
# Cat/proc/cpuinfo | grep "Processor" |wc-l view the total number of cores for the server

1.  Configure IP, hostname, hosts mapping  (all servers)  

172.21.192.1 Zyf1.hadoop
172.21.192.2 Zyf2.hadoop
172.21.192.3 Zyf3.hadoop
172.21.192.4 Zyf4.hadoop
172.21.192.5 Zyf5.hadoop

# vi/etc/sysconfig/network Hostname=zyf1.hadoop//NOTE: The hostname cannot have a space or enter verification: # HOSTNAME Restart effective # vi/etc/hosts authentication: # Ping Zyf4.hadoop Configure window local Hosts file 2, disable IPv6 (all servers) # echo "Alias net-pf-10 off" >>/etc/modprob E.d/dist.conf # echo "Alias IPv6 off" >>/etc/modprobe.d/dist.conf authentication: IP A//does not appear inet6 information representation

Disabling successful restart takes effect: 3, turn off the firewall and set to boot does not start and disable SELinux (all servers) # service iptables stop && chkconfig iptables off && chkconfig --list | grep iptables # vi/etc/selinux/config or # vi/etc/sysconfig/selinux modify selinux=disabled Test  
Card: # Getenforce//If return disabled indicates successful restart is effective. 4, SSH without secret key login (all servers) will be all nodes in the cluster can log in with no secret key # Ssh-keygen//Generate public key and private key is the following command for the user executes the # su Zyf1.had in the interactive window

    OOP # Ssh-copy-id Zyf2.hadoop ...  

Verify: # SSH Zyf3.hadoop # exit//Login must exit ... 5. Install JDK1.7 (all servers) # yum-y Install lrzsz* # rpm-e--noDeps Tzdata-java-2012j-1.el6.noarch java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64 java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64 # mkdir-p/usr/java # tar zxf/opt/softwares/jdk-7u67-linux- 
 X64.tar.gz-c/usr/java/# Vi/etc/profile

Export java_home=/usr/java/jdk1.7.0_67
Export Path= Path:path:java_home/bin
# Source/etc/profile

    # scp-r/usr/java/  zyf2.hadoop:/usr/  //Copy Java directory to other nodes in the cluster
    # SCP  /etc/profile  zyf2.hadoop:/etc/
    # Source/etc/profile     

6, modify the maximum server resource limit available to users (all servers)

    # Ulimit-a
    #  vi/etc/security/ Limits.conf    
Soft nofile 32728 hard nofile 1024567 soft nproc 65535 hard Nproc Unlimited soft memlock unlimited hard Memlock Unlimited

Soft warning value
Hard error value

    Restart effective  
    validation: 
        # ulimit-  A

7. Cluster time server
Select a server as the time server for the cluster
Zyf1.hadoop

172.21.192.1 Zyf1.hadoop
172.21.192.2 Zyf2.hadoop
172.21.192.3 Zyf3.hadoop
172.21.192.4 Zyf4.hadoop
172.21.192.5 Zyf5.hadoop

Modify the configuration file (on the server node)
# vi/etc/ntp.conf
Restrict 172.21.192.0 mask 255.255.255.0 nomodify notrap//Note Open, modify the network segment as the local cluster network segment
#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
The following two lines are manually added if not
Server 127.127.1.0 # Local clock
Fudge 127.127.1.0 Stratum 10
Perform naming on all nodes (all servers)
# service NTPD start && chkconfig ntpd on && chkconfig–list | grep ntpd

Modify the Bigdata01.beifeng.com system time to synchronize with network time (on bigdata01.beifeng.com)

ntp.sjtu.edu.cn (Shanghai Jiaotong University Network Center NTP server address)
s1a.time.edu.cn
s1b.time.edu.cn Tsinghua University

# Ntpdate-u  us.pool.ntp.org

Modifying the local hardware clock time (on bigdata01.beifeng.com)
# Hwclock–localtime//view local hardware clock time
# HWCLOCK–LOCALTIME-W//write system time to local hardware clock time

Automatic synchronization of local hardware clock time to system time (on bigdata01.beifeng.com)

· # vi/etc/sysconfig/ntpdate
Sync_hwclock=yes
# VI/ETC/SYSCONFIG/NTPD
Sync_hwclock=yes

Effect: When
    the system shuts down, the system time is automatically addressed to the local hardware clock time
    when the system is powered on, the hardware clock time is synchronized to the system time  

Time synchronization of other nodes of the cluster with the time servers of the cluster
Develop a timed execution plan (other than bigdata01.beifeng.com)
# CRONTAB-E

        */20 * * * *  /usr/sbin/ntpdate-  u 172.21.192.1

    # service Crond restart Restart     effective   

II. installation and deployment of MySQL (on bigdata01.beifeng.com)
Version selection
MySQL 5.6
Using Yum to install the default version of 5.1, you need to modify the Yum source
Installing on the server node

1, the unloading system comes with
# Rpm-qa | grep MySQL
# rpm-e–nodeps Mysql-libs-5.1.66-2.el6_3.x86_64
2. Download MySQL's new Yum source
3. Install update MySQL source
# RPM-UVH mysql57-community-release-el6-8.noarch.rpm
Two MySQL source files appear in the/ETC/YUM.REPOS.D directory
4. Modify MySQL's yum source configuration file to specify mysql5.6 version
# VI Mysql-community.repo
# Enable to use MySQL 5.6
Enabled=1
[Mysql57-community]
Enabled=0
# VI Mysql-community-source.repo
[Mysql56-community-source]
Enabled=1
[Mysql-tools-preview-source]
Enabled=1
5. Update Yum Cache
# yum Makecache

6, you can view the current system based Yum source can download the installed MySQL version
# Yum Repolist enabled | grep MySQL

7. Yum Install mysql5.6 version
# yum-y Install Mysql-community-server

Complete..  indicates that the installation was successful.

8. Start MySQL and initialize the settings
# service Mysqld Start
# mysql_secure_installation

Set root Password? [y/n] Y
New Password:
Re-enter new password:

Remove anonymous users? [y/n] Y
... success!

Disallow Root login remotely? [y/n] n
... skipping.

Remove test database and access to it? [y/n] n
... skipping.

Reload privilege tables now? [y/n] Y

An article can not finish, next please read the next article!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.