Suitable for domestic CDH5 Installation

Source: Internet
Author: User

Suitable for domestic CDH5 Installation
0. Cluster Planning

Note: CDH allows you to conveniently add and delete hosts and dynamically change the services on the hosts. Therefore, you can allocate the services that run on each machine later.

Three machines in total

Operating System: centos6.5

Machine name: work01, work02, and work03

Work03 run Cloudera Manager

1. Disable firewall and SELinux

Note: If you do not close,Inter-cluster communication may fail, resulting in service failure. If the production environment needs to be used as an online service.

1.1 disable the Firewall:

Service iptables stop (temporarily disabled)

Chkconfig iptables off (effective after restart)

1.2 disable SELINUX:

Setenforce 0 (temporarily effective) (this method is not running successfully)

Modify selinux = disabled under/etc/SELINUX/config (This method takes effect permanently after restart.

View selinux status:/usr/sbin/sestatus-v

Note:All three machines must perform the same operation.

2. Change the host name with FQDN

Note:

A. All three machines must perform the same operation.

B./etc/sysconfig/network configure the corresponding host name

C./etc/hosts three machines share the same content, so that the three machines can access each other through the host name.

D. If there are many machines, you can configure the DNS server to resolve the host name.

1) modify the/etc/sysconfig/network File

NETWORKING = yes

HOSTNAME = work01

2) modify the/etc/hosts file

192.168.1.185 work01 work01
192.168.1.141 work02 work02
192.168.1.198 work03 work03

3) restart the network service to take effect: service network restart

Restarting the network service during the test will cause network disconnection and will not automatically connect. You need to click the connection icon to connect again. Please proceed with caution.

3. Password-less ssh Login for machines across clusters

Note:

A. Some files will be copied between machines through ssh, and some Service Startup commands will be sent to create a password-less ssh Login between clusters. You do not need to enter a lot of passwords every time you start the service.

B. It seems that Cloudera Manager has managed the logon password. This step may be skipped. If you are interested, try it.

C. ssh password-less login principle is to generate a pair of public keys and keys, give the public key to others, and others can access themselves with or without a password. For example, if A gives the generated public key to B, then B can access A without A password.

D. The generated public key is id_rsa.pub. the public key of the machine to be accessed is saved in the authorized_keys file.

E. To save the public keys of multiple machines, add them to authorized_keys as an append.

1) switch the root account on work01

Su

2) generate the key and public key of the root account on work01

Ssh-keygen-t rsa

Press enter to generate the Public Key id_rsa.pub and the key id_rsa.

3) generate the key and public key of the root account on work02 and work03

4) copy the public key files on work02 and work03 to work01.

[Root @ work02 ~] # Scp ~ /. Ssh/id_rsa.pub root @ work01 :~ /. Ssh/work02.pub

[Root @ work03 ~] # Scp ~ /. Ssh/id_rsa.pub root @ work01 :~ /. Ssh/work03.pub

Differentiate file names during copying

5) Add the public keys of work01, work02, and work03 to the authorized_keys file of work01.

Catid_rsa.pub> authorized_keys

Cat work02.pub> authorized_keys

Catwork03.pub> authorized_keys

6) copy the authorized_keys file on work01 to work02 and work03.

[Root @ work01 ~] # Scp ~ /. Sshauthorized_keys root @ work02 :~ /. Ssh/

[Root @ work01 ~] # Scp ~ /. Sshauthorized_keys root @ work03 :~ /. Ssh/

Note:Password-less logon is only valid for accounts that generate public keys. Note that the accounts that generate public keys must be the same as those that require remote service startup.

4. yum source configuration

Note: The yum source provided by the system is abroad, and the software installation process will be slow. configuring the yun source in China can increase the installation speed.

1) Go to the yum source configuration directory.

Cd/etc/yum. repos. d

2) yum source provided by the backup system

Mv CentOS-Base.repo CentOS-Base.repo.bk

3) download the 163 yum Source:

Wget http://mirrors.163.com/.help/CentOS-Base-163.repo

Mv CentOS6-Base-163.repo CentOS-Base.repo

3) after the yum source is updated, run the following command to update the yum configuration so that the operation takes effect immediately.

Yum makecache

Yum clean all

5. Download the CDH parcels installation package

Note:
A. centos 6. x application CDH version is CDH-xxxx-el6.parcel, centos 5. x application CDH version is CDH-xxxx-el5.parcel
B. cloudera Manager Automatically downloads the file. Due to network speed problems, the download process is slow and may last for several hours. If an error occurs, the file will be downloaded from the beginning. Early download can speed up the installation. Step 1 of the configuration method is introduced.

Download link: http://archive.cloudera.com/cdh5/parcels/latest/

Download CDH-5.1.0-1.cdh5.1.0.p0.53-el6.parcel

And manifest. json

6. Install Cloudera Manager

Note:

A. the required rpm files are automatically downloaded from the installation file of Cloudera Manager. However, the installation process is slow because the yum source of these files is abroad, therefore, you can manually download these rpm files to increase the download speed.

B. Run the Cloudera Manager Installation File to obtain the desired rpm file address.

6.1 download the cloudera Manager Installation File

: Http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin

6.2 run the Cloudera Manager Installation File

Chmod u + x cloudera-manager-installer.bin
/Cloudera-manager-installer.bin

6.3 obtain the rmp file to be installed

1) enter the yum source directory

Cd/etc/yum. repos. d

2) check whether the cloudera-manager yum source file has been downloaded.

An additional cloudera-manager.repo File

3) Get the rpm

Cat cloudera-manager.repo

Where rpm is: baseurl = http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5/

6.4 close the Cloudera Manager Installation Wizard

1) Close cloudera-manager-installer.bin

2) Kill the yum process started by the Cloudera Manager Installation Wizard.

Ps aux | grep yum (obtain the yum process number started by the cm Installation Wizard)

Kill xxxx (kill the corresponding process by process number)

6.5 manually download the corresponding rpm file (a total of 7 files)

Download from address 6.3: http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5/RPMS/x86_64/

  Name Last modified Size Description
 

Parent Directory

 

-

 
 

Cloudera-manager-agent-5.0.2-1.cm502.p0.297.el6.x86_64.rpm

11-Jun-2014

3.7 M

 
 

Cloudera-manager-daemons-5.0.2-1.cm502.p0.297.el6.x86_64.rpm

11-Jun-2014

315 M

 
 

Cloudera-manager-server-5.0.2-1.cm502.p0.297.el6.x86_64.rpm

11-Jun-2014

8.0 K

 
 

Cloudera-manager-server-db-2-5.0.2-1.cm502.p0.297.el6.x86_64.rpm

11-Jun-2014

9.6 K

 
 

Enterprise-debuginfo-5.0.2-1.cm502.p0.297.el6.x86_64.rpm

11-Jun-2014

669 K

 
 

Jdk-6u31-linux-amd64.rpm

11-Jun-2014

68 M

 
 

Oracle-j2sdk1.7-1.7.0 + update45-1.x86_64.rpm

11-Jun-2014

131 M

 


6.6 manually install the downloaded rpm file

Yum localinstall -- nogpgcheck *. rpm

6.7 run the Cloudera Manager installation file again

Two errors occurred during running:

1) Problem description: fatal erro

Solution: rm-rf/usr/share/cmf/

2) Problem description: Installation failed. Failed to start Embedded Service and Configuration Database, See vim/var/log/cloudera-manager-installer/5. start-embedded-db.log for details.

Bash:/usr/share/cmf/bin/initialize_embedded_db.sh: No such file or directory

Solution: reboot Installation Wizard error not reproduced

7. Configure the CDH parcels package

Note:

A. There are two ways to install CDH using Cloudera Manager. One is to use the rpm package and the other is to use the parcels package. This test uses the parcels package.

B. Cloudera Manager Automatically downloads the required parcels package, but the connection speed is slow because it connects to a foreign site.

C. Configure the CDH parcels file downloaded in step 1 so that Cloudera Manager can directly read the local parcels file.

7.1 put the previously downloaded CDH parcels file in the/opt/cloudera/parcel-repo directory

7.2 generate the corresponding sha File

1) Find the corresponding hash value in the manifest. json file downloaded in step 1 according to the version "CDH-5.1.0-1.cdh5.1.0.p0.53-el6.parcel"

"Hash": "67fc4c86b260eeba15c339f1ec6be3b59b4ebe30"

2) the hash value is stored in the sha file.

Echo '67fc4c86b260eeba15c339f1ec6be3b59b4ebe30'> CDH-5.1.0-1.cdh5.1.0.p0.53-el6.parcel.sha

8. Start Cloudera Manager

Note: a. Follow the prompts in the Cloudera Manager Installation Wizard to open Cloudera Manager. B. The CDH Installation Wizard will be started for the first time. configure it according to the wizard.

The following problems occur during installation. For specific solutions, see "problem list" Problem 1:
Python-c 'import socket; import sys; s = socket. socket (socket. AF_INET); s. settimeout (5.0); s. connect ("localhost", int (7182); s. close ();'

9. Add a service

Note: a. Only HDFS and HBase are installed in this test. You can use Cloudera Manager to quickly add and uninstall services. c. When adding services, the system will prompt whether the dependent services have been installed.

Reference:
Note: All reference documents should be listed as much as possible. If any omission exists, please be advised.

Cloudera Manager and CDH 4 ultimate installation http://www.tuicool.com/articles/AnuiUra

C? L? O? U? D? E? R? A? M? A? N? A? G? E? R? And? C? D? H? 4? Ann? Pack: http://wenku.baidu.com/link? Url = SOOI3r56NN7Un55Z3jsNprQp9PpOc-F8_ByXPJ7v4GJmAioEMLM6vL0Hkc2c0HSxztlWWvPOA13Grs1vf2-0wJdbueQfbEAvuNbGIldxxou

CDH kit semi-manual installation flow http://www.douban.com/note/352772895/

Install the CDH Hadoop cluster with yum (cdh5 disables ipv6, hostname settings, yum source, clock sync): http://blog.javachen.com/2013/04/06/install-cloudera-cdh-by-yum/

View SELinux status and disable SELinux: http://bguncle.blog.51cto.com/3184079/957315

Modify yum Source: http://www.cnblogs.com/liuling/p/2014-4-14-001.html in CentOS6.5

Problem list:

Problem 1 PTR localhost:

Description:

DNS reverse resolution error. The Cloudera Manager Server host name cannot be correctly parsed.
Logs:
Detecting Cloudera Manager Server...
Detecting Cloudera Manager Server...

BEGIN host-t PTR 192.168.1.198

198.1.168.192.in-addr. arpa domain name pointer localhost.

END (0)

Using localhost as scm server hostname

BEGIN which python

/Usr/bin/python

END (0)

BEGIN python-c 'import socket; import sys; s = socket. socket (socket. AF_INET); s. settimeout (5.0); s. connect (sys. argv [1], int (sys. argv [2]); s. close (); 'localhost 7182

Traceback (most recent call last ):

File "<string>", line 1, in <module>

File "<string>", line 1, in connect

Socket. error: [Errno 111] Connection refused

END (1)

Cocould not contact scm server at localhost: 7182, giving up

Waiting for rollback request

Not elegant solution:

Delete the host/usr/bin/host file that cannot be connected

BEGIN host-t PTR 192.168.1.198

/Tmp/scm_prepare_node.8OX5y7is/scm_prepare_node.sh: line 100:/usr/bin/host: insufficient Permissions

END (126)

BEGIN which python

/Usr/bin/python

END (0)

BEGIN python-c 'import socket; import sys; s = socket. socket (socket. AF_INET); s. settimeout (5.0); s. connect (sys. argv [1], int (sys. argv [2]); s. close (); '192.168.1.198 7182

END (0)

BEGIN which wget

/Usr/bin/wget

END (0)

BEGIN wget-qO--T 1-t 1 http: // 169.254.169.254/latest/meta-data/public-hostname &/bin/echo

END (4)

Note:

I don't understand the original intention of cloudera. Here I have obtained the ip address of the Cloudera Manager Server, but I have to resolve the ip address to the host name to connect to it.

Because DNS reverse resolution is not configured properly, the localhost is obtained after resolving the host name based on the ip address of the Cloudera Manager Server, resulting in subsequent connection errors.

The solution here is to delete/usr/bin/host directly, so that Cloudera Manager will directly use the ip address for connection, so there is no error

Refer:

Cloudera manager 4.8

Http://www.reader8.cn/jiaocheng/20140419/2307406.html
 

 

Question 2 NTP:

Question 2.1

Problem description:

Bad Health -- Clock Offset

The host's NTP service did not respond to a request for the clock offset.

Solution:

Configure NTP service

Step reference:

Configure NTP Server for CentOS:

Http://www.hailiangchen.com/centos-ntp/

Common NTP server addresses and IP addresses in China

Http://www.douban.com/note/171309770/

Modify the configuration file:

[Root @ work03 ~] # Vim/etc/ntp. conf

# Use public servers from the pool.ntp.org project.

# Please consider joining the pool (http://www.pool.ntp.org/join.html ).

Server s1a.time.edu.cn prefer

Server s1b.time.edu.cn

Server s1c.time.edu.cn

Restrict 172.16.1.0 mask 255.255.255.0 nomodify <=== allow access to lan sources

Start ntp

# Service ntpd restart <=== start the ntp service

Client synchronization time (work02, work03 ):

Ntpdate work01

Note: It takes about five minutes to start the NTP service. If the client synchronization time is set before the service is started, the error "no server suitable for synchronization found" appears"

Scheduled synchronization time:

Configure crontab scheduled synchronization time on work02 and work03

Crontab-e

00 12 * root/usr/sbin/ntpdate 192.168.56.121>/root/ntpdate. log 2> & 1

Question 2.2

Description:

Clock Offset

Ensure that the host's hostname is configured properly. ensure that port 7182 is accessible on the Cloudera Manager Server (check firewall rules ). ensure that ports 9000 and 9001 are free on the host being added. check agent logs in/var/log/cloudera-scm-agent/on the host being added (some of the logs can be found in the installation details ).
Problem locating:

Run 'ntpdc-c loopinfo' on the corresponding host (work02, work03'

[Root @ work03 work] # ntpdc-c loopinfo

Ntpdc: read: Connection refused

Solution:

Enable the ntp service:

Start the ntp service on all three machines

Chkconfig ntpd on

Question 3 heartbeat:

Error message:

Installation failed. Failed to receive heartbeat from agent.

Solution: Disable the Firewall

Question 4 Unknow Health:
Unknow Health
After restart: Request to Host MonitorFailed.
Service -- status-all | grep clo
Check the status of scm-agent on the machine: cloudera-scm-agent dead but pid file exists
Solution: restart the service.
Service cloudera-scm-agent restart

Service cloudera-scm-server restart

 

Question 5: canonial name hostname consistent:

Bad Health

The hostname and canonical name for this host are not consistent when checked from a Java process.

Canonical name:

4092 Monitor-HostMonitor throttling_logger WARNING (29 skipped) hostname work02 differs from the canonical name work02.xinzhitang.com

Solution: Modify the hosts so that the FQDN and hostname are the same.

Ps: The Host Name and host alias must be the same.

/Etc/hosts

192.168.1.185 work01 work01

192.168.1.141 work02 work02

192.168.1.198 work03 work03

Question 6 Concerning Health:

Concerning Health Issue

-- Network Interface Speed --

Description: The host has 2 network interface (s) that appear to be operating at less than full speed. Warning threshold: any.

Details:

This is a host health test that checks for network interfaces that appear to be operating at less than full speed.

A failure of this health test may indicate that network interface (s) may be configured incorrectly and may be causing performance problems. use the ethtool command to check and configure the host's network interfaces to use the fastest available link speed and duplex mode.

Solution:

This test modified the Cloudera Manager configuration, which is not a real solution.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.