First of all, to ask, what is CDH?
To install a Hadoop cluster that deploys 100 or even 1000 servers, package I including hive,hbase,flume ... Components, a day to build the complete, there is to consider the system after the update asked questions, then need to CDH
Advantages of the CDH version:
Clear Version Division
Faster version update
Support for Kerberos security authentication
Document Clarity (Official document)
Supports multiple installation methods (Cloudera Manager mode)
How the framework is installed:
Yum
Rpm
Tar
CM (for CDH version only)
1, Big Data each service component to the server resource demand situation.
Namenode: There is an estimate of the amount of memory required in memory and the size of the HDFS: 1000M memory--1 million-piece metadata has certain requirements for network bandwidth and slave node Datanode Data interaction Datanode: Disk space 4-24t disk function: Redundant matrix (Backup of data) HDFS data itself will have 3 replicas ResourceManager: High network bandwidth requirements are often associated with
Namenode deployment together with NodeManager: Memory is deployed with Datanode. The MapReduce calculation principle is mobile computing rather than moving data zookeeper: Memory requirements are not high disk capacity requirements disk read and write speed---SSD SSD network bandwidth requirements are very high hbase:maste R: can be deployed with Namenode and ResourceManager to make hot standby network bandwidth requirements are high because the load rate is low, so the resource requirements on the server is not very high regionserv ER: Memory with Datanode deployment: Write Memory Memstore 128M depending on the number of stores me
Mstore Total size Region structure: region is the HBase table in the direction of the Rowkey Line Division region:
Multiple stores, the number of stores is determined by the list of clusters each store consists of one memstore and multiple storefile of read memory Blockcache heap_size*0.4 each regionserver previous Blockcache
Spark Cluster Memory Requirements Cup
2. Cloudera Manager Technical Architecture
Server
Master node, deployed on a single server (server service process requires 8G of memory reserved)
communicate with each slave node to gather information about resources, processes, etc. sent from the node
Responsible for installation and deployment of cluster framework components
Responsible for cluster start-up and stop
......
Agent
From the node
Also known as the host (so the big data framework of the installation and deployment of the service process and the operation of the task is based on the host node)
Collects resource information and other framework components ' health information on the server, and then reports to the server uniformly
Database
CM requires a database support to store metadata information for the cluster under CM management and health information for each serviced component process
Metadata: Number of clusters, host name, deployment allocation information for the framework, etc.
Hive also requires a database of supported storage Metastore, shared with the CM database
Cloudera managerment Service:
A set of monitoring components, a set of service processes
A component that communicates with the server and truly monitors cluster resource information
Web-ui:
CM provides a web operator interface
Using an embedded jetty server
3. Resource requirements
Server needs to reserve memory 8G (single Server service process)
Agent requirements 1-2g memory (excluding other big data service processes)
disk 30G disk
bandwidth is the cluster intranet IO 100m/s or more
4. Software Environment
Linux
jdk1.7
Database
Mysql
5.5 Installation of Impala is not supported
5.6
– Select: Specify 5.6 version by modifying Yum source, install via Yum
Oracle
PostgreSQL
5. Network Configuration
Big data clusters typically use intranet clusters
Host name Settings
Bigdata01.beifeng.com
Cannot appear _
Disable IPv6
Turn off the firewall and disable SELinux
6. Users
It is recommended that you use the root user when you start to set up basic settings
CM Installation Deployment Readiness:
First, the Environment preparation
# free-m
# df-h
# Cat/proc/cpuinfo | grep "Processor" |wc-l view the total number of cores for the server
1. Configure IP, hostname, hosts mapping (all servers)
172.21.192.1 Zyf1.hadoop
172.21.192.2 Zyf2.hadoop
172.21.192.3 Zyf3.hadoop
172.21.192.4 Zyf4.hadoop
172.21.192.5 Zyf5.hadoop
# vi/etc/sysconfig/network Hostname=zyf1.hadoop//NOTE: The hostname cannot have a space or enter verification: # HOSTNAME Restart effective # vi/etc/hosts authentication: # Ping Zyf4.hadoop Configure window local Hosts file 2, disable IPv6 (all servers) # echo "Alias net-pf-10 off" >>/etc/modprob E.d/dist.conf # echo "Alias IPv6 off" >>/etc/modprobe.d/dist.conf authentication: IP A//does not appear inet6 information representation
Disabling successful restart takes effect: 3, turn off the firewall and set to boot does not start and disable SELinux (all servers) # service iptables stop && chkconfig iptables off && chkconfig --list | grep iptables # vi/etc/selinux/config or # vi/etc/sysconfig/selinux modify selinux=disabled Test
Card: # Getenforce//If return disabled indicates successful restart is effective. 4, SSH without secret key login (all servers) will be all nodes in the cluster can log in with no secret key # Ssh-keygen//Generate public key and private key is the following command for the user executes the # su Zyf1.had in the interactive window
OOP # Ssh-copy-id Zyf2.hadoop ...
Verify: # SSH Zyf3.hadoop # exit//Login must exit ... 5. Install JDK1.7 (all servers) # yum-y Install lrzsz* # rpm-e--noDeps Tzdata-java-2012j-1.el6.noarch java-1.6.0-openjdk-1.6.0.0-1.50.1.11.5.el6_3.x86_64 java-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64 # mkdir-p/usr/java # tar zxf/opt/softwares/jdk-7u67-linux-
X64.tar.gz-c/usr/java/# Vi/etc/profile
Export java_home=/usr/java/jdk1.7.0_67
Export Path= Path:path:java_home/bin
# Source/etc/profile
# scp-r/usr/java/ zyf2.hadoop:/usr/ //Copy Java directory to other nodes in the cluster
# SCP /etc/profile zyf2.hadoop:/etc/
# Source/etc/profile
6, modify the maximum server resource limit available to users (all servers)
# Ulimit-a
# vi/etc/security/ Limits.conf
Soft nofile 32728 hard nofile 1024567 soft nproc 65535 hard Nproc Unlimited soft memlock unlimited hard Memlock Unlimited
Soft warning value
Hard error value
Restart effective
validation:
# ulimit- A
7. Cluster time server
Select a server as the time server for the cluster
Zyf1.hadoop
172.21.192.1 Zyf1.hadoop
172.21.192.2 Zyf2.hadoop
172.21.192.3 Zyf3.hadoop
172.21.192.4 Zyf4.hadoop
172.21.192.5 Zyf5.hadoop
Modify the configuration file (on the server node)
# vi/etc/ntp.conf
Restrict 172.21.192.0 mask 255.255.255.0 nomodify notrap//Note Open, modify the network segment as the local cluster network segment
#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
The following two lines are manually added if not
Server 127.127.1.0 # Local clock
Fudge 127.127.1.0 Stratum 10
Perform naming on all nodes (all servers)
# service NTPD start && chkconfig ntpd on && chkconfig–list | grep ntpd
Modify the Bigdata01.beifeng.com system time to synchronize with network time (on bigdata01.beifeng.com)
ntp.sjtu.edu.cn (Shanghai Jiaotong University Network Center NTP server address)
s1a.time.edu.cn
s1b.time.edu.cn Tsinghua University
# Ntpdate-u us.pool.ntp.org
Modifying the local hardware clock time (on bigdata01.beifeng.com)
# Hwclock–localtime//view local hardware clock time
# HWCLOCK–LOCALTIME-W//write system time to local hardware clock time
Automatic synchronization of local hardware clock time to system time (on bigdata01.beifeng.com)
· # vi/etc/sysconfig/ntpdate
Sync_hwclock=yes
# VI/ETC/SYSCONFIG/NTPD
Sync_hwclock=yes
Effect: When
the system shuts down, the system time is automatically addressed to the local hardware clock time
when the system is powered on, the hardware clock time is synchronized to the system time
Time synchronization of other nodes of the cluster with the time servers of the cluster
Develop a timed execution plan (other than bigdata01.beifeng.com)
# CRONTAB-E
*/20 * * * * /usr/sbin/ntpdate- u 172.21.192.1
# service Crond restart Restart effective
II. installation and deployment of MySQL (on bigdata01.beifeng.com)
Version selection
MySQL 5.6
Using Yum to install the default version of 5.1, you need to modify the Yum source
Installing on the server node
1, the unloading system comes with
# Rpm-qa | grep MySQL
# rpm-e–nodeps Mysql-libs-5.1.66-2.el6_3.x86_64
2. Download MySQL's new Yum source
3. Install update MySQL source
# RPM-UVH mysql57-community-release-el6-8.noarch.rpm
Two MySQL source files appear in the/ETC/YUM.REPOS.D directory
4. Modify MySQL's yum source configuration file to specify mysql5.6 version
# VI Mysql-community.repo
# Enable to use MySQL 5.6
Enabled=1
[Mysql57-community]
Enabled=0
# VI Mysql-community-source.repo
[Mysql56-community-source]
Enabled=1
[Mysql-tools-preview-source]
Enabled=1
5. Update Yum Cache
# yum Makecache
6, you can view the current system based Yum source can download the installed MySQL version
# Yum Repolist enabled | grep MySQL
7. Yum Install mysql5.6 version
# yum-y Install Mysql-community-server
Complete.. indicates that the installation was successful.
8. Start MySQL and initialize the settings
# service Mysqld Start
# mysql_secure_installation
Set root Password? [y/n] Y
New Password:
Re-enter new password:
Remove anonymous users? [y/n] Y
... success!
Disallow Root login remotely? [y/n] n
... skipping.
Remove test database and access to it? [y/n] n
... skipping.
Reload privilege tables now? [y/n] Y
An article can not finish, next please read the next article!