Ii. Installing Hadoop and the services it needs
1. CDH Installation Overview
CDH's full name is Cloudera's distribution including Apache Hadoop, a Hadoop distribution version of Cloudera Corporation. There are three ways of installing CDH:
. Path A-Automatic installation via Cloudera Manager
. Path B-Installation using Cloudera Manager parcels or packages
. Path C-Manual installation using Cloudera Manager tarballs
The installation steps in different ways are summarized as follows:
Steps |
|
|
|
Step 1: Install the JDK Cloudera Manager Server, Management service, and CDH need to install the JDK. |
There are two options: . Use the Cloudera Manager Setup program to install a supported version of Oracle JDK under/usr/java for all hosts in the cluster. . Use the command line to install a supported version of the Oracle JDK on all hosts, and set the JAVA_HOME environment variable to the JDK's installation directory. |
Step 2: Set up the database Cloudera Manager Server, Cloudera Management service, and optional services for some CDH require that the database be installed, configured, and started. |
There are two options: . Use the Cloudera Manager installer to install, configure, and launch an inline PostgreSQL database. . Install, configure, and start the database using a command-line package installation tool such as Yum. |
|
Path A |
Path B |
Path |
Step 3: Install the Cloudera Manager server Install and start the Cloudera Manager server on a single host. |
Install the server using the Cloudera Manager Setup program. Requires sudo access to the host and access to the Internet. |
Install the Cloudera Manager server using the Linux package installation commands (such as Yum). Modify Database Properties. Use the service command to start the Cloudera Manager server. |
Unpack using the Linux command, and start the service with the Services command. |
Step 4: Install the Cloudera Manager agent Install and start the Cloudera Manager agent on all hosts. |
Use the Cloudera Manager Setup Wizard to install the agent on all hosts. |
There are two options: . Install the Cloudera Manager agent on all hosts using the Linux Package installation command (such as Yum). . Use the Cloudera Manager Setup Wizard to install the agent on all hosts. |
Use the Linux command to unpack and start the agent on all hosts. |
Step 5: Install CDH and services Install CDH and its services on all hosts. |
Use the Cloudera Manager Setup Wizard to install CDH and its services. |
There are two options: . Use the Cloudera Manager Setup Wizard to install CDH and its services. . Install CDH and its services on all hosts using the Linux package installation commands (such as Yum). |
Use the Linux command to unpack on all hosts and use the service command to start CDH and its services. |
Step 6: Establish, configure, and start CDH and services Configure and start CDH and its services on all hosts. |
Use the Cloudera Manager Setup Wizard to give the host a role and configure the cluster. Many of the configurations are automatic. |
Use the Cloudera Manager Setup Wizard to grant the host a role and configure the cluster. Many of the configurations are automatic. |
Use the Cloudera Manager Setup Wizard to give the host a role and configure the cluster. Many of the configurations are automatic. You can also use the Cloudera Manager API to manage a cluster, which is useful for scripting pre-configured deployments. |
2. Experimental environment
Host information:
Host Name |
IP Address |
CDH1 |
172.16.1.101 |
CDH2 |
172.16.1.102 |
CDH3 |
172.16.1.103 |
CDH4 |
172.16.1.104 |
Hardware configuration:
Each host: CPU4 core, Memory 8G, HDD 100G
Software version:
Name |
Version |
Operating system |
CentOS Release 6.4 (Final) 64-bit |
Jdk |
1.7.0_80 |
Database |
MySQL 5.6.14 |
Jdbc |
MySQL Connector Java 5.1.38 |
Cloudera Manager |
5.7.0 |
CDH |
5.7.0 |
3. Installation Configuration
(1) Pre-installation (all 4 host configurations in the cluster using the root user)
. Download the required installation files from the following address
Http://archive.cloudera.com/cm5/cm/5/cloudera-manager-el6-cm5.7.0_x86_64.tar.gz
Http://archive.cloudera.com/cdh5/parcels/5.7/CDH-5.7.0-1.cdh5.7.0.p0.45-el6.parcel
Http://archive.cloudera.com/cdh5/parcels/5.7/CDH-5.7.0-1.cdh5.7.0.p0.45-el6.parcel.sha1
Http://archive.cloudera.com/cdh5/parcels/5.7/manifest.json
- Use the following command to check the OS dependency package, XXXX Swap package name
Rpm-qa | grep xxxx
The following packages must be installed:
Chkconfig
Python (2.6 required for CDH 5)
Bind-utils
Psmisc
Libxslt
Zlib
Sqlite
Cyrus-sasl-plain
Cyrus-sasl-gssapi
Fuse
Portmap (Rpcbind)
Fuse-libs
Redhat-lsb
- Configure Domain Name resolution
Vi/etc/hosts
# Add the following 4 lines of content
172.16.1.101 CDH1
172.16.1.102 CDH2
172.16.1.103 CDH3
172.16.1.104 Cdh4
The JDK version recommended by CDH5 is 1.7.0_67, 1.7.0_75, 1.7.0_80, installed here 1.7.0_80
Note: All hosts will install the same version of the JDK; the installation directory is/usr/java/jdk-version
mkdir/usr/java/
MV jdk-7u80-linux-x64.tar.gz/usr/java/
cd/usr/java/
TAR-ZXVF jdk-7u80-linux-x64.tar.gz
Chown-r Root:root jdk1.7.0_80/
vi/etc/profile.d/java.sh
# Add the following 3 lines of content
Export java_home=/usr/java/jdk1.7.0_80
Export classpath=.: $JAVA _home/jre/lib/*: $JAVA _home/lib/*
Export path= $PATH: $JAVA _home/bin
# Make Environment variables effective
source/etc/profile.d/java.sh
- Install, configure, and start the NTP service
Yum Install NTP
Chkconfig ntpd on
Ntpdate-u 202.112.29.82
Vi/etc/ntp.conf
# Add the following 8 lines of content
Driftfile/var/lib/ntp/drift
Restrict default Kod nomodify notrap nopeer noquery
restrict-6 default Kod nomodify notrap nopeer noquery
Restrict 127.0.0.1
Restrict-6:: 1
Server 202.112.29.82
Includefile/etc/ntp/crypto/pw
Keys/etc/ntp/keys
# Start the NTP service
Service NTPD Start
Useradd--system--home=/opt/cm-5.7.0/run/cloudera-scm-server--no-create-home--shell=/bin/false--comment " Cloudera SCM User "CLOUDERA-SCM
Usermod-a-G Root CLOUDERA-SCM
echo user=\ "cloudera-scm\" >>/etc/default/cloudera-scm-agent
echo "Defaults Secure_path =/sbin:/bin:/usr/sbin:/usr/bin" >>/etc/sudoers
- Install the configuration MySQL database (for easy configuration, each host is installed)
RPM-IVH mysql-5.6.14-1.el6.x86_64.rpm
vi/etc/profile.d/mysql.sh
# Add the following 2 lines of content
Export mysql_home=/home/mysql/mysql-5.6.14
Export path= $PATH: $MYSQL _home/bin
# Make Environment variables effective
source/etc/profile.d/mysql.sh
# Change Root password
Mysqladmin-u Root Password
# Edit configuration file
Vi/etc/my.cnf
# contents are as follows
[Mysqld]
Transaction-isolation = read-committed
Log_bin=/data/mysql_binary_log
Binlog_format = Mixed
Innodb_flush_log_at_trx_commit = 2
Innodb_flush_method = O_direct
Key_buffer = 16M
Key_buffer_size = 32M
Max_allowed_packet = 32M
Thread_stack = 256K
Thread_cache_size = 64
Query_cache_limit = 8M
Query_cache_size = 64M
Query_cache_type = 1
Max_connections = 550
Read_buffer_size = 2M
Read_rnd_buffer_size = 16M
Sort_buffer_size = 8M
Join_buffer_size = 8M
Innodb_flush_log_at_trx_commit = 2
Innodb_log_buffer_size = 64M
Innodb_buffer_pool_size = 4G
Innodb_thread_concurrency = 8
Innodb_log_file_size = 512M
[Mysqld_safe]
Log-error=/data/mysqld.err
Pid-file=/data/mysqld.pid
Sql_mode=strict_all_tables
# Add Boot Boot
Chkconfig MySQL on
# start MySQL
Service MySQL Restart
# Build the metabase as needed
Mysql-u root-p-E "CREATE DATABASE hive default CHARACTER set utf8;create database rman default CHARACTER set Utf8;creat E database Oozie DEFAULT CHARACTER SET utf8;grant all on * * to ' root ' @ '% ' identified by ' mypassword '; "
- Installing the MySQL JDBC driver
TAR-ZXVF mysql-connector-java-5.1.38.tar.gz
CP./mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar/usr/share/java/mysql-connector-java.jar
- Configure password-free SSH (any two machines configured here are password-free)
# Generate a key pair on four machines, respectively:
CD ~
SSH-KEYGEN-T RSA
# and then return
# performed on CDH1:
CD ~/.ssh/
Ssh-copy-id CDH1
Scp/root/.ssh/authorized_keys cdh2:/root/.ssh/
# performed on CDH2:
CD ~/.ssh/
Ssh-copy-id CDH2
Scp/root/.ssh/authorized_keys cdh3:/root/.ssh/
#在cdh3上执行:
CD ~/.ssh/
Ssh-copy-id CDH3
Scp/root/.ssh/authorized_keys cdh4:/home/grid/.ssh/
#在cdh4上执行:
CD ~/.ssh/
Ssh-copy-id Cdh4
Scp/root/.ssh/authorized_keys cdh1:/root/.ssh/
Scp/root/.ssh/authorized_keys cdh2:/root/.ssh/
Scp/root/.ssh/authorized_keys cdh3:/root/.ssh/
(2) Install Cloudera Manager on CDH1
TAR-XZVF cloudera-manager*.tar.gz-c/opt/
# Build CM Database
/opt/cm-5.7.0/share/cmf/schema/scm_prepare_database.sh MySQL cm-hlocalhost-uroot-pmypassword--scm-host localhost SCM SCM SCM
# Configure CM Proxy
Vi/opt/cm-5.7.0/etc/cloudera-scm-agent/config.ini
# change CM host name to Cdh1
Server_host=cdh1
# Copy parcel related three files to/opt/cloudera/parcel-repo
CP cdh-5.7.0-1.cdh5.7.0.p0.45-el6.parcel/opt/cloudera/parcel-repo/
CP cdh-5.7.0-1.cdh5.7.0.p0.45-el6.parcel.sha1/opt/cloudera/parcel-repo/
CP manifest.json/opt/cloudera/parcel-repo/
# renaming
mv/opt/cloudera/parcel-repo/cdh-5.7.0-1.cdh5.7.0.p0.45-el6.parcel.sha1/opt/cloudera/parcel-repo/ Cdh-5.7.0-1.cdh5.7.0.p0.45-el6.parcel.sha
# Modify Owner
Chown-r cloudera-scm:cloudera-scm/opt/cloudera/
Chown-r cloudera-scm:cloudera-scm/opt/cm-5.7.0/
# Copy the/opt/cm-5.7.0 directory to three other hosts
Scp-r-p/opt/cm-5.7.0 cdh2:/opt/
Scp-r-p/opt/cm-5.7.0 cdh3:/opt/
Scp-r-p/opt/cm-5.7.0 cdh4:/opt/
(3) Create the/opt/cloudera/parcels directory on each host and modify the owner
Mkdir-p/opt/cloudera/parcels
Chown Cloudera-scm:cloudera-scm/opt/cloudera/parcels
(4) Start the CM server on CDH1
/opt/cm-5.7.0/etc/init.d/cloudera-scm-server start
# This step needs to run some time, see the boot situation with the following command
Tail-f/opt/cm-5.7.0/log/cloudera-scm-server/cloudera-scm-server.log
(5) Start cm agent on all hosts
Mkdir/opt/cm-5.7.0/run/cloudera-scm-agent
Chown cloudera-scm:cloudera-scm/opt/cm-5.7.0/run/cloudera-scm-agent
/opt/cm-5.7.0/etc/init.d/cloudera-scm-agent start
(6) Login cm console, installation configuration CDH5 and its services
Open the console
http://172.16.1.101:7180/
The page looks like this.
The default user name and password are admin, login to enter the Welcome page. Tick the license agreement as shown, and click Continue.
Go to the Release notes page, as shown, and click Continue.
Go to the Service description page, as shown in, click Continue.
Go to the Select Host page, as shown, select all four hosts, point to continue.
Go to the Select Repository page, as shown in, click Continue.
Go to the cluster installation page, as shown in, click Continue.
Go to the Verification page, as shown in, click Finish.
Go to the Cluster Settings page, as shown, select the service as needed, and click Continue.
Go to the Custom Role Assignment page, as shown, and leave the point unchanged.
Go to the Database Settings page, fill in the relevant information, point to test the connection, as shown in, click Continue.
Go to the Audit changes page, keep the same, point continue.
Go to the first run page and wait for the run to finish, as shown, point continues.
Go to the Installation Success page, as shown in, click Finish.
Go to the Installation Success page as shown in.
At this point, the CDH installation is complete, and the host and role correspond as shown in the following table.
Service |
Role |
Host |
Hdfs |
DataNode |
Cdh1 |
Cdh3 |
Cdh4 |
NameNode |
Cdh2 |
Secondarynamenode |
Cdh2 |
Hive |
Hive Metastore Server |
Cdh2 |
HiveServer2 |
Cdh2 |
Hue |
Hue Server |
Cdh2 |
Impala |
Impala Catalog Server |
Cdh2 |
Impala Daemon |
Cdh1 |
Cdh3 |
Cdh4 |
Impala Statestore |
Cdh2 |
Oozie |
Oozie Server |
Cdh2 |
Sqoop 2 |
Sqoop 2 Server |
Cdh2 |
YARN |
Jobhistory Server |
Cdh2 |
NodeManager |
Cdh1 |
Cdh3 |
Cdh4 |
ResourceManager |
Cdh2 |
CDH's official Installation documentation URL address is:
Http://www.cloudera.com/documentation/enterprise/latest/topics/installation.html
The practice of data Warehouse based on Hadoop ecosystem--environment construction (II.)