About CDH and Cloudera Manager

Source: Internet
Author: User
Tags json require scp file ssh centos create database iptables mysql command line

or download the Word document: http://download.csdn.net/download/xfg0218/9747346



about CDH and Cloudera Manager

CDH (Cloudera's distribution, including Apache Hadoop), is one of the many branches of Hadoop, built from Cloudera maintenance, based on the stable version of Apache Hadoop, and integrates many patches, Can be used directly in production environments.

Cloudera Manager simplifies the installation and configuration management of the host, Hadoop, Hive, and spark services in a cluster by making it easy to install and monitor management components for large data processing related to Hadoop in the cluster.


System Environment

· Operating system: CentOS 6.5 x64 (at least 2G memory, there is not enough memory for the students to recommend or the whole number of real-machine configuration is good, will CDH all the components of the installation will occupy a lot of memory, I have started to set the virtual machine memory is 1G, the installation process directly stuck dead)

· Cloudera manager:5.1.3

· cdh:5.1.3


Installation Instructions

Official Reference documents:
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v5-latest/ Cloudera-manager-installation-guide/cm5ig_install_path_c.html

The official total of 3 installation methods:

The first method must require all machines can be connected to the network, due to the recent various foreign sites by the wall of the fierce, I tried several times the time-out error, huge delay not to say, once failed, reload very painful.

The second method downloads many packages.

The third method has the least intrusive system, the biggest advantage can be fully offline installation, and re-install what is very convenient. The later cluster Unified package upgrade is also very good. This is why I chose to install offline.



download address for related packages

Cloudera Manager Download Address:
Http://archive.cloudera.com/cm5/cm/5/cloudera-manager-el6-cm5.1.3_x86_64.tar.gz,
Download Information:
http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v5-latest/ Cloudera-manager-version-and-download-information/cloudera-manager-version-and-download-information.html#cmvd_ Topic_1

CDH installation package Address: http://archive.cloudera.com/cdh5/parcels/latest/, because our operating system is CentOS6.5, the following files need to be downloaded:

· Cdh-5.1.3-1.cdh5.1.3.p0.12-el6.parcel

· Cdh-5.1.3-1.cdh5.1.3.p0.12-el6.parcel.sha1

· Manifest.json

Note: Unlike CDH4, the original installation CDH4 need to download Impala, Cloudera Search (SOLR), CDH5 will be included together, so only need to download a CDH5 package on it.

preparatory work: System environment Construction

The following operations are done with the root user.


1. Network configuration (all nodes)

1. Vi/etc/sysconfig/network

Modify hostname:

1. Networking=yes

2. Hostname=n1

Copy Code


Pass

1. Service Network restart

Copy Code


Restart the network service to take effect.

1. vi/etc/hosts

Copy Code


, modify the correspondence between IP and host name

1.192.168.1.106 N1

2.192.168.1.107 N2

3.192.168.1.108 N3

Copy Code


Note: Here need to each machine's IP and host name corresponding to the relationship is written in, the machine also to write in, or start the agent will prompt hostname parsing error.



2 ssh, set SSH login without password (all nodes)

Executes on the master node

1. Ssh-keygen-t RSA

Copy Code


All the way to the return, generate a key pair without a password.

Add the public key to the authentication file:

1. Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Copy Code


, and set the access rights for Authorized_keys:

1. chmod ~/.ssh/authorized_keys

Copy Code


SCP file to all Datenode nodes:

1. SCP ~/.ssh/authorized_keys root@n2:~/.ssh/

Copy Code

Test: SSH n2 on the main node, under normal circumstances, without the need for a password can be directly logged in.



3. Install Oracle's Java (all nodes)

CentOS, comes with OPENJDK, but running CDH5 requires the use of Oracle's JDK, which requires Java 7 support.

Uninstall your own openjdk, using

1. Rpm-qa | grep java

Copy Code


Query Java-related packages, using

1. Rpm-e--nodeps

Copy Code


The package name is unloaded.

Go to Oracle's website to download the JDK RPM installation package and install it using the RPM-IVH package name.

Since the RPM package does not require us to configure the environment variables, we only need to configure a global java_home variable to execute the command:

1. Echo "java_home=/usr/java/latest/" >>/etc/environment

Copy Code





4. Install the configuration MySQL (Master node)

Pass

1. Yum Install Mysql-server

Copy Code


Install the MySQL server.

1. Chkconfig mysqld on

Copy Code


Set the boot up and

1. Service Mysqld Start

Copy Code


Start the MySQL service and follow the prompts to set the root password:

1. Mysqladmin-u root password ' xxxx '

Copy Code


1. mysql-uroot-pxxxx

Copy Code


Go to the MySQL command line and create the following database:

1. #hive

2. Create DATABASE hive DEFAULT CHARSET UTF8 COLLATE utf8_general_ci;

3. #activity Monitor

4. Create database Amon DEFAULT CHARSET UTF8 COLLATE utf8_general_ci;

Copy Code

Set ROOT to grant access to all of the above databases:

1. #授权root用户在主节点拥有所有数据库的访问权限

2. Grant all privileges on *. * to ' root ' @ ' N1 ' identified by ' xxxx ' with GRANT option;

3. Flush privileges;

Copy Code



Official MySQL configuration document: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v5-latest/ Cloudera-manager-installation-guide/cm5ig_mysql.html#cmig_topic_5_5


5. Turn off the firewall and SELinux

Note: the need to execute on all nodes, because there are too many ports involved, temporarily shut down the firewall is to install more convenient, after installation can be set up as needed firewall policy, to ensure cluster security.

To turn off the firewall:

1. Service iptables Stop (temporary shutdown)

2. Chkconfig iptables off (effective after reboot)

Copy Code



Turn off SELinux (the actual installation process found that there is no shutdown is also possible, do not know if there will be problems, but also further verification):

1. Setenforce 0 (temporary entry into force)

2. Modify the selinux=disabled under/etc/selinux/config (permanent after reboot)

Copy Code





6. Configure NTP service for all nodes

All hosts in the cluster must maintain time synchronization, which can cause various problems if the time difference is large. Specific ideas are as follows:

The master node synchronizes the time between the NTP server and the outside, and then provides time synchronization services for all Datanode nodes.

All Datanode nodes synchronize time based on the master node.

All node installation related components:

1. Yum Install NTP

Copy Code


。 When you are finished, configure boot-up:

1. Chkconfig ntpd on

Copy Code


, check whether the setting is successful:

1. Chkconfig--list ntpd

Copy Code


Where 2-5 is on the status represents success.

Master node Configuration

Before the configuration, the use of Ntpdate manual synchronization of time, so that the local and the center time gap is too large, so that the ntpd can not sync properly. Here, 65.55.56.206 is chosen as the center of the time,

1. Ntpdate-u 65.55.56.206

Copy Code


The NTP service has only one configuration file and the configuration is OK. Here is only a useful configuration, the unnecessary configuration is used # to drop, here is not given:

1. Driftfile/var/lib/ntp/drift

2. Restrict 127.0.0.1

3. Restrict-6:: 1

4. Restrict default nomodify Notrap

5. Server 65.55.56.206 prefer

6. INCLUDEFILE/ETC/NTP/CRYPTO/PW

7. Keys/etc/ntp/keys

Copy Code

Configuration file completion, save exit, start the service, execute the following command:

1. Service NTPD Start

Copy Code

Check for success, use the Ntpstat command to view the synchronization status, and the following status indicates the success of the boot:

1. Synchronised to NTP server () at Stratum 2

2. Time correct to within

3. Polling server every

Copy Code

If an exception occurs, wait a few minutes, typically waiting for 5-10 minutes to sync.

Configuring NTP clients (all Datanode nodes)

1. Driftfile/var/lib/ntp/drift

2. Restrict 127.0.0.1

3. Restrict-6:: 1

4. Restrict default Kod nomodify notrap nopeer noquery

5. Restrict-6 default Kod nomodify notrap nopeer noquery

6. #这里是主节点的主机名或者ip

7. Server N1

8. INCLUDEFILE/ETC/NTP/CRYPTO/PW

9. Keys/etc/ntp/keys

Copy Code

Ok save exit, before requesting the server, please use Ntpdate manual synchronization time: Ntpdate-u N1 (Master node NTP server)

There may be synchronization failure situation, please do not worry, usually the local NTP server has not started normally, it is generally necessary to wait 5-10 minutes to synchronize normally. Start the service:

1. Service NTPD Start

Copy Code

Because it is connected to the intranet, the time to start waiting will be faster than the master node, but also need to wait patiently for a while.


officially started installing Cloudera Manager Server and Agent

Main node Decompression installation

Cloudera Manager's Directory default location under/OPT, unzip:

1. Tar xzvf cloudera-manager*.tar.gz

Copy Code


Place the extracted cm-5.1.3 and Cloudera directories in the/OPT directory.

establishing a database for Cloudera Manager 5

First need to go to MySQL's official website download JDBC driver, http://dev.mysql.com/downloads/connector/j/, unzip, find Mysql-connector-java-5.1.33-bin.jar, put to The/opt/cm-5.1.3/share/cmf/lib/.

Initialize the CM5 database on the primary node:

1./opt/cm-5.1.3/share/cmf/schema/scm_prepare_database.sh mysql cm-hlocalhost-uroot-pxxxx--scm-host localhost SCM SCM SCM

Copy Code

Agent Configuration

Modifies the host name of the Server_host primary node in/opt/cm-5.1.3/etc/cloudera-scm-agent/config.ini.

synchronizing agents to other nodes

1. Scp-r/opt/cm-5.1.3 root@n2:/opt/

Copy Code

Create CLOUDERA-SCM users on all nodes

1. Useradd--system--home=/opt/cm-5.1.3/run/cloudera-scm-server/--no-create-home--shell=/bin/false--comment " Cloudera SCM User "CLOUDERA-SCM

Copy Code

prepare parcels to install CDH5

Place the CHD5-related parcel package in the/opt/cloudera/parcel-repo/directory of the master node (Parcel-repo need to be created manually).

The relevant files are as follows:

· Cdh-5.1.3-1.cdh5.1.3.p0.12-el6.parcel

· Cdh-5.1.3-1.cdh5.1.3.p0.12-el6.parcel.sha1

· Manifest.json

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.