Hadoop Series (iii): Managing Hadoop clusters with Cloudera deployment

Source: Internet
Author: User

1. Cloudera Introduction

Hadoop is an open source project that Cloudera Hadoop, simplifies the installation process, and provides some encapsulation of Hadoop.

Depending on the needs of the Hadoop cluster to install a lot of components, one installation is more difficult to configure, but also consider ha, monitoring and so on.

With Cloudera, you can easily deploy clusters, install the components you need, and monitor and manage your clusters.

CDH is a distribution of Cloudera company, including Hadoop,spark,hive,hbase and some tools.

There are two versions of Cloudera:

Cloudera Express version is free of charge
Cloudera Enterprise (60 days trial) need to purchase registration code

2. Install Cloudrea Manager, deployHadoop cluster2.1 Installation Method

Install the Cloudrea Manager first, and then install the Cloudrea Manager client, CDH, and administrative tools on the node through Cloudrea Manager.

Official documents:

Https://www.cloudera.com/documentation/manager/5-1-x.html

Environmental requirements:

1. Turn off SELinux

2. Each node can SSH login

3. Add the host name of each node in the/etc/hosts

2.2 Installing Cloudrea Manager

You can install the package via the official one-click button or by Yum or RPM.

Installation of the official one-click installation package is described below.

This installation environment is Cnetos 7, installed on 3 machines

test165 (Cloudera Manager server)

test166 (Cloudera Manager Agent)

test167 (Cloudera Manager Agent)

2.2.1 Download one-click installation package

http://archive.cloudera.com/cm5/installer/latest/

Download the latest version: Cloudera-manager-installer.bin

2.2.2 Installing Cloudera Manager

To install Cloudera Manager Server on test165, start the Installation Wizard

# chmod a+x cloudera-manager-installer.bin#./cloudera-manager-installer.bin

The following screen appears

Select < Next > and < Yes to start the installation.

Need to download Java and Cloudrea Manager, a total of more than 600 MB, depending on the network situation, will take some time.

The following page appears and the installation is complete.

After the installation is complete, access the Cloudrea Manager's page, the user name password is admin

Http://IP or host name: 7180/

2.2.3 Installing the Cloudera Manager agent

Log on to the Cloudrea Manager page and select the version you want to install, Cloudera Express is installed

Select the host to install CDH, with hostname or IP search, this time is installed on three nodes CDH

Select Use Parcel to install, select CDH version

Choose to install the JDK

Provide SSH login information

Start installing the JDK and Cloudera Manager agent

If the download installs JDK or cloudera-manager-agent fails during installation, you can install it manually on the node and then continue installing on the Cloudrea Manager

# yum-y Install jdk# yum-y install oracle-j2sdk1.7# yum-y Install Cloudera-manager-agent

Download parcel and assign parcel to each node

Parcel package 1.5G, need a period of time, in order to improve the installation speed, you can first download the package to Cloudrea Manager Local, configure the local source

Parcel

Http://archive.cloudera.com/cdh5/parcels/5.5.1/

Copy the following files to the/opt/cloudera/parcel-repo/folder

Cdh-5.5.1-1.cdh5.5.1.p0.11-el7.parcel

Cdh-5.5.1-1.cdh5.5.1.p0.11-el7.parcel.sha

Manifest.json

After the installation is complete, point continues to the page where the results are checked

"Cloudera recommended setting/proc/sys/vm/swappiness to 0 when checking host correctness." The current setting is 30. "Warning, make the following settings

# vi/etc/sysctl.confvm.swappiness = 0# sysctl–p

When checking host correctness, the "enabled" transparent large page appears, which can cause significant performance issues. "Warning, make the following settings

echo never >/sys/kernel/mm/transparent_hugepage/enabledecho never >/sys/kernel/mm/transparent_hugepage/ defrag# Vi/etc/rc.localecho never >/sys/kernel/mm/transparent_hugepage/enabledecho never >/sys/kernel/mm/ Transparent_hugepage/defrag

2.3 Installation cluster, including hadoop,yarn,hive, etc.

After checking the correctness of the host, click Finish to enter the cluster configuration

Select the services you want to install, and you can choose to combine or customize

Configure how each node is allocated

Note: HDFs has a minimum of 3 data Node.

Test database connection

Start installation

3. Confirm, Test

Confirm that the cluster is in good condition and normal operation

1. On the cluster page confirm that all service status is OK

2. On the host page confirm that the heartbeat state of each node is normal, and the time is less than 15 seconds

3. Run the task to test

Log in to any host in the cluster and perform the following tasks (Calculate PI value with Hadoop, pi)

The meaning of the following 2 numeric parameters: 10 means to run 10 map tasks, 10000 refers to each map task, how many times to throw, 2 parameters of the product is the total number of throws.

# sudo-u HDFs Hadoop jar/opt/cloudera/parcels/cdh/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar Pi 10 10000

The results of the implementation are as follows:

The execution of the task can be confirmed on the yarn page

Applications, YARN-and Cluster 1, cluster

4. Other

On the Cloudrea Manager page, you can add/remove hosts to the cluster, add services to the cluster, and so on.

The Cloudrea Manager page opens google-analytics because the local access is slow and can be turned off google-analytics

Allow usage data collection not selected

5. PostScript

工欲善其事 its prerequisite, managing Hadoop clusters, Cloudrea is a good choice.

Hadoop Series (iii): Managing Hadoop clusters with Cloudera deployment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.