or download the Word document: http://download.csdn.net/download/xfg0218/9747346
about CDH and Cloudera Manager
CDH (Cloudera's distribution, including Apache Hadoop), is one of the many branches of Hadoop, built from Cloudera maintenance, based on the stable version of Apache Hadoop, and integrates many patches, Can be used directly in production environments.
Cloudera Manager simplifies the installation and configuration management of the host, Hadoop, Hive, and spark services in a cluster by making it easy to install and monitor management components for large data processing related to Hadoop in the cluster.
System Environment
· Operating system: CentOS 6.5 x64 (at least 2G memory, there is not enough memory for the students to recommend or the whole number of real-machine configuration is good, will CDH all the components of the installation will occupy a lot of memory, I have started to set the virtual machine memory is 1G, the installation process directly stuck dead)
· Cloudera manager:5.1.3
· cdh:5.1.3
Installation Instructions
Official Reference documents:
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v5-latest/ Cloudera-manager-installation-guide/cm5ig_install_path_c.html
The official total of 3 installation methods:
The first method must require all machines can be connected to the network, due to the recent various foreign sites by the wall of the fierce, I tried several times the time-out error, huge delay not to say, once failed, reload very painful.
The second method downloads many packages.
The third method has the least intrusive system, the biggest advantage can be fully offline installation, and re-install what is very convenient. The later cluster Unified package upgrade is also very good. This is why I chose to install offline.
download address for related packages
Cloudera Manager Download Address:
Http://archive.cloudera.com/cm5/cm/5/cloudera-manager-el6-cm5.1.3_x86_64.tar.gz,
Download Information:
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v5-latest/ Cloudera-manager-version-and-download-information/cloudera-manager-version-and-download-information.html#cmvd_ Topic_1
CDH installation package Address: http://archive.cloudera.com/cdh5/parcels/latest/, because our operating system is CentOS6.5, the following files need to be downloaded:
· Cdh-5.1.3-1.cdh5.1.3.p0.12-el6.parcel
· Cdh-5.1.3-1.cdh5.1.3.p0.12-el6.parcel.sha1
· Manifest.json
Note: Unlike CDH4, the original installation CDH4 need to download Impala, Cloudera Search (SOLR), CDH5 will be included together, so only need to download a CDH5 package on it.
preparatory work: System environment Construction
The following operations are done with the root user.
1. Network configuration (all nodes)
1. Vi/etc/sysconfig/network
Modify hostname:
1. Networking=yes
2. Hostname=n1
Copy Code
Pass
1. Service Network restart
Copy Code
Restart the network service to take effect.
1. vi/etc/hosts
Copy Code
, modify the correspondence between IP and host name
1.192.168.1.106 N1
2.192.168.1.107 N2
3.192.168.1.108 N3
Copy Code
Note: Here need to each machine's IP and host name corresponding to the relationship is written in, the machine also to write in, or start the agent will prompt hostname parsing error.
2 ssh, set SSH login without password (all nodes)
Executes on the master node
1. Ssh-keygen-t RSA
Copy Code
All the way to the return, generate a key pair without a password.
Add the public key to the authentication file:
1. Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Copy Code
, and set the access rights for Authorized_keys:
1. chmod ~/.ssh/authorized_keys
Copy Code
。
SCP file to all Datenode nodes:
1. SCP ~/.ssh/authorized_keys root@n2:~/.ssh/
Copy Code
Test: SSH n2 on the main node, under normal circumstances, without the need for a password can be directly logged in.
3. Install Oracle's Java (all nodes)
CentOS, comes with OPENJDK, but running CDH5 requires the use of Oracle's JDK, which requires Java 7 support.
Uninstall your own openjdk, using
1. Rpm-qa | grep java
Copy Code
Query Java-related packages, using
1. Rpm-e--nodeps
Copy Code
The package name is unloaded.
Go to Oracle's website to download the JDK RPM installation package and install it using the RPM-IVH package name.
Since the RPM package does not require us to configure the environment variables, we only need to configure a global java_home variable to execute the command:
1. Echo "java_home=/usr/java/latest/" >>/etc/environment
Copy Code
4. Install the configuration MySQL (Master node)
Pass
1. Yum Install Mysql-server
Copy Code
Install the MySQL server.
1. Chkconfig mysqld on
Copy Code
Set the boot up and
1. Service Mysqld Start
Copy Code
Start the MySQL service and follow the prompts to set the root password:
1. Mysqladmin-u root password ' xxxx '
Copy Code
。
1. mysql-uroot-pxxxx
Copy Code
Go to the MySQL command line and create the following database:
1. #hive
2. Create DATABASE hive DEFAULT CHARSET UTF8 COLLATE utf8_general_ci;
3. #activity Monitor
4. Create database Amon DEFAULT CHARSET UTF8 COLLATE utf8_general_ci;
Copy Code
Set ROOT to grant access to all of the above databases:
1. #授权root用户在主节点拥有所有数据库的访问权限
2. Grant all privileges on *. * to ' root ' @ ' N1 ' identified by ' xxxx ' with GRANT option;
3. Flush privileges;
Copy Code
Official MySQL configuration document: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v5-latest/ Cloudera-manager-installation-guide/cm5ig_mysql.html#cmig_topic_5_5
5. Turn off the firewall and SELinux
Note: the need to execute on all nodes, because there are too many ports involved, temporarily shut down the firewall is to install more convenient, after installation can be set up as needed firewall policy, to ensure cluster security.
To turn off the firewall:
1. Service iptables Stop (temporary shutdown)
2. Chkconfig iptables off (effective after reboot)
Copy Code
Turn off SELinux (the actual installation process found that there is no shutdown is also possible, do not know if there will be problems, but also further verification):
1. Setenforce 0 (temporary entry into force)
2. Modify the selinux=disabled under/etc/selinux/config (permanent after reboot)
Copy Code
6. Configure NTP service for all nodes
All hosts in the cluster must maintain time synchronization, which can cause various problems if the time difference is large. Specific ideas are as follows:
The master node synchronizes the time between the NTP server and the outside, and then provides time synchronization services for all Datanode nodes.
All Datanode nodes synchronize time based on the master node.
All node installation related components:
1. Yum Install NTP
Copy Code
。 When you are finished, configure boot-up:
1. Chkconfig ntpd on
Copy Code
, check whether the setting is successful:
1. Chkconfig--list ntpd
Copy Code
Where 2-5 is on the status represents success.
Master node Configuration
Before the configuration, the use of Ntpdate manual synchronization of time, so that the local and the center time gap is too large, so that the ntpd can not sync properly. Here, 65.55.56.206 is chosen as the center of the time,
1. Ntpdate-u 65.55.56.206
Copy Code
。
The NTP service has only one configuration file and the configuration is OK. Here is only a useful configuration, the unnecessary configuration is used # to drop, here is not given:
1. Driftfile/var/lib/ntp/drift
2. Restrict 127.0.0.1
3. Restrict-6:: 1
4. Restrict default nomodify Notrap
5. Server 65.55.56.206 prefer
6. INCLUDEFILE/ETC/NTP/CRYPTO/PW
7. Keys/etc/ntp/keys
Copy Code
Configuration file completion, save exit, start the service, execute the following command:
1. Service NTPD Start
Copy Code
Check for success, use the Ntpstat command to view the synchronization status, and the following status indicates the success of the boot:
1. Synchronised to NTP server () at Stratum 2
2. Time correct to within
3. Polling server every
Copy Code
If an exception occurs, wait a few minutes, typically waiting for 5-10 minutes to sync.
Configuring NTP clients (all Datanode nodes)
1. Driftfile/var/lib/ntp/drift
2. Restrict 127.0.0.1
3. Restrict-6:: 1
4. Restrict default Kod nomodify notrap nopeer noquery
5. Restrict-6 default Kod nomodify notrap nopeer noquery
6. #这里是主节点的主机名或者ip
7. Server N1
8. INCLUDEFILE/ETC/NTP/CRYPTO/PW
9. Keys/etc/ntp/keys
Copy Code
Ok save exit, before requesting the server, please use Ntpdate manual synchronization time: Ntpdate-u N1 (Master node NTP server)
There may be synchronization failure situation, please do not worry, usually the local NTP server has not started normally, it is generally necessary to wait 5-10 minutes to synchronize normally. Start the service:
1. Service NTPD Start
Copy Code
Because it is connected to the intranet, the time to start waiting will be faster than the master node, but also need to wait patiently for a while.
officially started installing Cloudera Manager Server and Agent
Main node Decompression installation
Cloudera Manager's Directory default location under/OPT, unzip:
1. Tar xzvf cloudera-manager*.tar.gz
Copy Code
Place the extracted cm-5.1.3 and Cloudera directories in the/OPT directory.
establishing a database for Cloudera Manager 5
First need to go to MySQL's official website download JDBC driver, http://dev.mysql.com/downloads/connector/j/, unzip, find Mysql-connector-java-5.1.33-bin.jar, put to The/opt/cm-5.1.3/share/cmf/lib/.
Initialize the CM5 database on the primary node:
1./opt/cm-5.1.3/share/cmf/schema/scm_prepare_database.sh mysql cm-hlocalhost-uroot-pxxxx--scm-host localhost SCM SCM SCM
Copy Code
Agent Configuration
Modifies the host name of the Server_host primary node in/opt/cm-5.1.3/etc/cloudera-scm-agent/config.ini.
synchronizing agents to other nodes
1. Scp-r/opt/cm-5.1.3 root@n2:/opt/
Copy Code
Create CLOUDERA-SCM users on all nodes
1. Useradd--system--home=/opt/cm-5.1.3/run/cloudera-scm-server/--no-create-home--shell=/bin/false--comment " Cloudera SCM User "CLOUDERA-SCM
Copy Code
prepare parcels to install CDH5
Place the CHD5-related parcel package in the/opt/cloudera/parcel-repo/directory of the master node (Parcel-repo need to be created manually).
The relevant files are as follows:
· Cdh-5.1.3-1.cdh5.1.3.p0.12-el6.parcel
· Cdh-5.1.3-1.cdh5.1.3.p0.12-el6.parcel.sha1
· Manifest.json