Hadoop cluster installation Based on Cloudera Manager 5 and CDH5 (version 5.3.3)

Source: Internet
Author: User
Tags value of pi

Hadoop cluster installation Based on Cloudera Manager 5 and CDH5 (version 5.3.3)

1. Cloudera Manager/CDH5

1. I will not detail what cloudera manager and CDH are. There are official websites and Encyclopedias.

Link to the official website: cloudera manager

2. Installation Guide on the official website

The official documentation provides three installation methods: online automatic installation/manual installation package installation/manual use of cloudera manager to manage installation

The third method is used to install the Hadoop cluster.

II. Environment Planning

1. System: CentOS 6.4 _ x86

  • Master: 4 GB memory, large disk capacity as much as possible
  • Slave1: 2 GB memory, Max disk capacity
  • Slave2: 2 GB memory, Max disk capacity

2. Cloudera Manager 5.3.3

3. CDH 5.3.3

Download the installation package:

  • Cloudera Manager 5.3.3: http://archive-primary.cloudera.com/cm5/cm/5/.
  • CDH 5.5.5: http://archive-primary.cloudera.com/cdh5/parcels/5.3.3/

Download the software package:

  • CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel
  • CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel.sha1
  • Manifest. json

3. Prepare the system environment

  • The execution permission is root user.
  • All machines must adopt the ssh mutual trust mode.
  • Modify the host name using the hosts file or DNS Server
  • Disable iptables and selinux
  • Uninstall the openjdk that comes with the system and install the jdk of Oracle.
  • Install mysql on the master node
  • Time of all nodes must be synchronized (ntp server or other methods)
  • Modify kernel parameters of all nodes

I. echo 0>/proc/sys/vm/swappiness

Ii. echo never>/sys/kernel/mm/RedHat_transparent_hugepage/defrag

4. Start Installation
1. The default directory of cloudera manager is under/opt and decompress it to the/opt directory.
# Tar cloudera-manager-el6-cm5.3.3_x86_64.tar.gz-C/opt/
# Ls/opt/
Cloudera cm-5.3.3
#
# Install mysql-connector-java
# Yum-y install mysql-connector-java

2. initialize the database
#/Opt/cm-5.3.3/share/cmf/schema/scm_prepare_database.sh mysql cm-hlocalhost-uroot-p123456 -- scm-host localhost scm
# Mysql-uroot-p123456-e "show databases;" check whether the cm database is successfully created

3. Copy data to other nodes
# Change the configuration file server_host to the Host Name of the master node.
# Grep server_host/opt/cm-5.3.3/etc/cloudera-scm-agent/config. ini
Server_host = master
# Scp-rp/opt/cm-5.3.3 slave1:/opt/
# Scp-rp/opt/cm-5.3.3 slave2:/opt/
#
# Create a system user for each node
# Useradd -- system -- home =/opt/cm-5.3.3/run/cloudera-scm-server -- no-create-home -- shell =/bin/false -- comment "cloudera scm user" cloudera -scm

4. Database Configuration
# Hive
# Create database hive default charset utf8 COLLATE utf8_general_ci;
# Active monitor
# Create database amon default charset utf8 COLLATE utf8_general_ci;
# Authorizing a master host
# Grant all on *. * to 'root' @ 'master' identified by 'passwd ';

5. Prepare the parcels installation package to/opt/cloudera/parcel-repo/
# Ls/opt/cloudera/parcel-repo/
CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel.sha manifest. json
# Note: The CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel.sha after the downloaded file is renamed
#
# Start the server and agent scripts of the master node
#/Opt/cm-5.3.3/etc/init. d/cloudera-scm-agent start
#/Opt/cm-5.3.3/etc/init. d/cloudera-scm-server start
# Server port startup is slow
#
# Start the agent script for all other nodes
#/Opt/cm-5.3.3/etc/init. d/cloudera-scm-agent start

5. install and configure CDH 5

1. Access http: // master-ip: Port 7180 to start installation.

Login Name: admin Password: admin

2. Select a version for installation. Here, select the first free version. Next --> next

3. Select all hosts:

4. If the configuration is correct, the following interface will appear:

5. then proceed to the next step. Because we have downloaded the offline package, the download will be completed soon.

6. Host Detection: Check whether the host meets the installation requirements. If the host meets the requirements, it will all pass. Otherwise, set it as required.

7. Select the software package to be installed. You can select all or customize the software package, or select a software package that integrates a certain component function to install the software package.

8. assign a role. The default value here is as needed.

9. Database test:

10. review and modify parameters. You can use the default value or change it as needed.

11. Start the normal installation configuration. Wait until the installation is complete and visit the home page again.

12. login page after completion

Because the host performance is low and the data delay is high, queries are often not displayed. In addition, due to insufficient disk space, there are many warnings. All installation has been completed.

Vi. Other problems

Enable oozie's web interface:

The cloudera documentation describes how to configure oozie:

Install this operation:

# Music ext-2.2.zip/var/lib/oozie/

# Cd/var/lib/oozie

# Unzip ext-2.2.zip

Refresh page:

Hadoop2.x has updated some new features and supports High Availability of HDFS. The cloudera manager management interface can be used to directly operate these functions, which is very convenient.

In the upper-right corner of the HDFS interface in the cluster, there is an operation option bar:

Click it and follow the prompts to configure it properly. It is also very convenient to add and remove hosts from the cluster on the management interface. The specific operations will not be demonstrated one by one.

VII. Hadoop Testing Program

# Calculating pi values

# Sudo-u hdfs hadoop jar/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100

...

Job Finished in 126.439 seconds

Estimated value of Pi is 3.14800000000000000000

# The execution result is displayed.

You can view detailed job information on the YARN web interface.

There are also many testing programs that can run the test. Here we will not demonstrate them one by one.

In addition, this installation method is fast and convenient, but it is not conducive to the overall understanding, it is recommended that you download the installation package, the configuration file and so on are all manually written, a deep understanding.

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.