I. Related software preparation and planning
1, related software and download address:
Cloudera manager:http://archive-primary.cloudera.com/cm5/cm/5/
CDH installation package Address: http://archive.cloudera.com/cdh5/parcels/latest/
Java Official Download (login required): http://www.oracle.com/technetwork/java/archive-139210.html
Java versions archive Download (no login required): https://www.reucon.com/cdn/java/
MySQL JDBC driver jar pack: http://dev.mysql.com/downloads/connector/j/
2, after downloading the documents are as follows:
Cloudera-manager-el6-cm5.5.3_x86_64.tar.gz
Cdh-5.3.9-1.cdh5.3.9.p0.8-el6.parcel
Cdh-5.3.9-1.cdh5.3.9.p0.8-el6.parcel.sha1
Manifest.json
Mysql-connector-java-5.1.38.tar.gz (after decompression, there are quite a jar package inside)
Java download version recommended more than or equal to 1.7 version
Cdh-5.3.9-1.cdh5.3.9.p0.8-el6.parcel The package is relatively large, the following offline installation is the contents of this package after decompression to upload to each node, CDH-5.3.9-1.CDH5.3.9.P0.8-EL6.PARCEL.SHA1 This file will need to be renamed for subsequent use, this must be noted.
3, host planning
Here I selected 5 host installation, in the test generally recommended that the number of hosts must be greater than 2 units below. The 5 hosts are named as follows:
Nn1.hadoop.com 192.168.0.10
Nn2.hadoop.com 192.168.0.11
Dn1.hadoop.com 192.168.0.12
Dn2.hadoop.com 192.168.0.13
Dn3.hadoop.com 192.168.0.14
Second, the host environment configuration
Host configuration is divided into IP address, host name, Java environment, handle number optimization, as follows:
1, IP configuration
# Cat/etc/sysconfig/network-scripts/ifcfg-eth0
Device=eth0
Type=ethernet
Onboot=yes
Nm_controlled=yes
Bootproto=none
ipaddr=192.168.0.10
Prefix=24
gateway=192.168.0.1
Defroute=yes
Ipv4_failure_fatal=yes
Ipv6init=no
2, Host name configuration
# cat/etc/sysconfig/network
Hostname=nn1.hadoop.com
And in the Hosts file to do the next several host IP point. If a DNS server is built internally,/etc/hosts specifies that it can be omitted.
3. Java environment variable Configuration
This section can be reviewed in another blog post--Installation of JDK
4, Handle number optimization
Modify the/etc/security/limits.conf file to add the following:
* Hard Nofile 65535
* Soft Nofile 65535
5, SSH key to get through
This is not necessary, Cloudera-manager for each node installed HDFs, Flume, hive and other applications do not rely on SSH transmission, the agent based on communication, transmission, installation files. If you want an SSH key to get through, see my other blog post---linux Configure SSH public key authentication.
6, Iptables and SELinux
Service iptables Stop (temporary shutdown)
Chkconfig iptables off (effective after reboot)
Setenforce 0 (Temporary entry into force)
Modify the mode under/etc/selinux/config to selinux=disabled (effective after reboot).
7. NTP pair time Configuration
When you select one of the node servers to the extranet pair, the other hosts pair the host. Because the NTP service configuration is fairly simple, this is also skipped.
Similarly, in the above method, the above example of the five hosts make the changes.
Third, Cloudera Manager installation
Upload the downloaded Cloudera Manager package to one of the servers first. And under the/opt unpack---strongly recommended to/OPT, because the default path is under/OPT, if the decompression to other paths under the application, the configuration file will involve a number of changes.
1, the MySQL service configuration
Download and install MySQL and start
[root@nn1 opt]# yum-y Install Mysql-server
[Root@nn1 opt]#/etc/init.d/mysqld start
[root@nn1 opt]#/usr/bin/mysqladmin-u root password ' Hadoop '
[Root@nn1 opt]# chkconfig mysqld on
Creating hive and monitoring Services MySQL Library
Mysql> CREATE DATABASE Hive DEFAULT CHARSET UTF8 COLLATE utf8_general_ci;
mysql> CREATE DATABASE Monitor DEFAULT CHARSET UTF8 COLLATE utf8_general_ci;
Mysql> Grant all on *.* to ' root ' @ '% ' identified by ' Hadoop ';
A, in the standard installation hive library is necessary, monitor monitoring database is optional library--If you do not choose to install Hive Service, you can not create hive library.
b, to authorize the root user all hosts can log in because hive and monitoring will involve multiple services, when the service is not installed on the same host, there will be need to connect through other hosts to create the database created by the problem, which will cause failure. If, for security reasons, you can authorize MySQL users, specify only the network segment where Hadoop is located.
2. Import Database
MySQL jar package into Lib library
Remove the jar file from the good MySQL JDBC package below and place it in/opt/cm-x.x.x/share/cmf/lib/, where x.x.x is the version number of Cloudera manager. Because the current MySQL is Oracle company under the product, so there is no open source authorization, so other products in the MySQL support, all need to download from the MySQL official station into the Lib package. and Cloudera Manager-supported PostgreSQL and Oracle do not have this problem (a little confused about why Oracle does not exist this problem, nnd Oracle).
Create a user
Create CLOUDERA-SCM User
Useradd--system--home=/opt/cm-5.1.3/run/cloudera-scm-server/--no-create-home--shell=/bin/false--comment " Cloudera SCM User "CLOUDERA-SCM
The above user needs to be created on five hosts, otherwise in the later Web interface management installation, in the check item will be wrong, prompt does not exist CLOUDERA-SCM user.
Import data
/opt/cm-5.5.3/share/cmf/schema/scm_prepare_database.sh MySQL cm-hlocalhost-uroot-pxxxx--scm-host localhost SCM SCM Scm
3. Agent Configuration
Modify the host name of the Server_host primary node in/opt/cm-5.5.3/etc/cloudera-scm-agent/config.ini--you can also use IP, but it is recommended that you use a hostname so that when IP changes are involved, Directly in DNS or hosts in the IP point to modify can be. This configuration file will also have a number of other items set up, interested can be viewed under their own.
The/opt/cm-5.5.3 directory is copied to the OPT directory on other agent hosts through an SCP or other tool. Agent replication must be replicated before startup, after the launch will produce a UUID, copied to its host restart, will report a UUID-related error.
4, parcels offline package settings
Place the CHD5-related parcel package into the/opt/cloudera/parcel-repo/directory of the master node (created manually when Parcel-repo is not saved). Copy the following three files, which were previously downloaded, into the directory:
Cdh-5.3.9-1.cdh5.3.9.p0.8-el6.parcel
Cdh-5.3.9-1.cdh5.3.9.p0.8-el6.parcel.sha1
Manifest.json
The cdh-5.3.9-1.cdh5.3.9.p0.8-el6.parcel.sha1 file needs to be renamed to Cdh-5.3.9-1.cdh5.3.9.p0.8-el6.parcel.sha or the system will be downloaded again cdh-5.3.9-1.cdh5.3.9 . p0.8-el6.parcel package. This bag contains more than 1 g,hadoop commonly used ecological uses should be included in this bag.
5. Start Manager and Agent services
Start the service end via/opt/cm-5.5.3/etc/init.d/cloudera-scm-server start;
Start the agent service with/opt/cm-5.5.3/etc/init.d/cloudera-scm-agent start.
Similarly, you can stop and restart.
Iv. install CDH5 via Cloudera Manager
The port used by the Cloudera Manager by default is two---7180 is the port used by the Web interface, and 7182 is the port used by the agent communication. Can be opened through the browser: Http://cloudera Manager ip:7180 Open the Admin interface, the default username password is admin.
After the landing, we will select the version, we choose the free Cloudera Express, in the early version of the Express version only allowed 50 nodes, there is no such restriction. After the selection is complete, the Express version relies on a simple introduction to the following:
Since the agent was started, we specified the IP of the manger host through the configuration file, so we can find that the manager has taken over 5 hosts
Additional hosts this can also be automatically searched by entering IP in the new host entry
Install here because it is offline install, select Parcel Installation
There will be some selection settings, the reason for choosing opt is because the default is under OPT. Pull down there will be more configuration, due to the layout reasons, the other parts of my intercept
If you do not increase the CLOUDERA-SCM user or do not set the swappiness to 0 o'clock, there will be a warning message and you can check again once you have modified it
Here is the choice to install the service, I choose the core Hadoop, here can choose according to their own needs
The following diagram is the subject of this article, and the opening did not choose any other installation method, because this chart gives a detailed overview of the application and functionality of the main Hadoop biosphere
By default, the services installed by each host are randomly assigned according to the configuration. The following figure here you can specify the appropriate host on each component's corresponding function module
The database Installation section, where you can also use PostgreSQL and Oracle, where you choose MySQL, you also need/opt/cloudera/parcels/cdh-5.3.9-1.cdh5.3.9.p0.8-el6/lib/hive/ lib/, since it is easy to make mistakes here, it is recommended that the above libraries should be tested on other hosts via mysql-h-u connection
After the installation progress is skipped, nothing more than a period of time after the waiting to enter the installation completion interface. After installation, by looking at the above functional modules, it can be found that, although the Express version of Cloudera Manager, the function is very powerful.
On the manager we can add new nodes, configure Kerberos or LDAP authentication, general monitoring, data query and display, and so on.
This article first writes here, later will introduce through the Yum package way, step-by-step installs each component and the module.