Hadoop cluster installation Based on Cloudera Manager 5 and CDH5 (version 5.3.3)
1. Cloudera Manager/CDH5
1. I will not detail what cloudera manager and CDH are. There are official websites and Encyclopedias.
Link to the official website: cloudera manager
2. Installation Guide on the official website
The official documentation provides three installation methods: online automatic installation/manual installation package installation/manual use of cloudera manager to manage installation
The third method is used to install the Hadoop cluster.
II. Environment Planning
1. System: CentOS 6.4 _ x86
- Master: 4 GB memory, large disk capacity as much as possible
- Slave1: 2 GB memory, Max disk capacity
- Slave2: 2 GB memory, Max disk capacity
2. Cloudera Manager 5.3.3
3. CDH 5.3.3
Download the installation package:
- Cloudera Manager 5.3.3: http://archive-primary.cloudera.com/cm5/cm/5/.
- CDH 5.5.5: http://archive-primary.cloudera.com/cdh5/parcels/5.3.3/
Download the software package:
- CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel
- CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel.sha1
- Manifest. json
3. Prepare the system environment
- The execution permission is root user.
- All machines must adopt the ssh mutual trust mode.
- Modify the host name using the hosts file or DNS Server
- Disable iptables and selinux
- Uninstall the openjdk that comes with the system and install the jdk of Oracle.
- Install mysql on the master node
- Time of all nodes must be synchronized (ntp server or other methods)
- Modify kernel parameters of all nodes
I. echo 0>/proc/sys/vm/swappiness
Ii. echo never>/sys/kernel/mm/RedHat_transparent_hugepage/defrag
4. Start Installation
1. The default directory of cloudera manager is under/opt and decompress it to the/opt directory.
# Tar cloudera-manager-el6-cm5.3.3_x86_64.tar.gz-C/opt/
# Ls/opt/
Cloudera cm-5.3.3
#
# Install mysql-connector-java
# Yum-y install mysql-connector-java
2. initialize the database
#/Opt/cm-5.3.3/share/cmf/schema/scm_prepare_database.sh mysql cm-hlocalhost-uroot-p123456 -- scm-host localhost scm
# Mysql-uroot-p123456-e "show databases;" check whether the cm database is successfully created
3. Copy data to other nodes
# Change the configuration file server_host to the Host Name of the master node.
# Grep server_host/opt/cm-5.3.3/etc/cloudera-scm-agent/config. ini
Server_host = master
# Scp-rp/opt/cm-5.3.3 slave1:/opt/
# Scp-rp/opt/cm-5.3.3 slave2:/opt/
#
# Create a system user for each node
# Useradd -- system -- home =/opt/cm-5.3.3/run/cloudera-scm-server -- no-create-home -- shell =/bin/false -- comment "cloudera scm user" cloudera -scm
4. Database Configuration
# Hive
# Create database hive default charset utf8 COLLATE utf8_general_ci;
# Active monitor
# Create database amon default charset utf8 COLLATE utf8_general_ci;
# Authorizing a master host
# Grant all on *. * to 'root' @ 'master' identified by 'passwd ';
5. Prepare the parcels installation package to/opt/cloudera/parcel-repo/
# Ls/opt/cloudera/parcel-repo/
CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel.sha manifest. json
# Note: The CDH-5.3.3-1.cdh5.3.3.p0.5-el6.parcel.sha after the downloaded file is renamed
#
# Start the server and agent scripts of the master node
#/Opt/cm-5.3.3/etc/init. d/cloudera-scm-agent start
#/Opt/cm-5.3.3/etc/init. d/cloudera-scm-server start
# Server port startup is slow
#
# Start the agent script for all other nodes
#/Opt/cm-5.3.3/etc/init. d/cloudera-scm-agent start
5. install and configure CDH 5
1. Access http: // master-ip: Port 7180 to start installation.
Login Name: admin Password: admin
2. Select a version for installation. Here, select the first free version. Next --> next
3. Select all hosts:
4. If the configuration is correct, the following interface will appear:
5. then proceed to the next step. Because we have downloaded the offline package, the download will be completed soon.
6. Host Detection: Check whether the host meets the installation requirements. If the host meets the requirements, it will all pass. Otherwise, set it as required.
7. Select the software package to be installed. You can select all or customize the software package, or select a software package that integrates a certain component function to install the software package.
8. assign a role. The default value here is as needed.
9. Database test:
10. review and modify parameters. You can use the default value or change it as needed.
11. Start the normal installation configuration. Wait until the installation is complete and visit the home page again.
12. login page after completion
Because the host performance is low and the data delay is high, queries are often not displayed. In addition, due to insufficient disk space, there are many warnings. All installation has been completed.
Vi. Other problems
Enable oozie's web interface:
The cloudera documentation describes how to configure oozie:
Install this operation:
# Music ext-2.2.zip/var/lib/oozie/
# Cd/var/lib/oozie
# Unzip ext-2.2.zip
Refresh page:
Hadoop2.x has updated some new features and supports High Availability of HDFS. The cloudera manager management interface can be used to directly operate these functions, which is very convenient.
In the upper-right corner of the HDFS interface in the cluster, there is an operation option bar:
Click it and follow the prompts to configure it properly. It is also very convenient to add and remove hosts from the cluster on the management interface. The specific operations will not be demonstrated one by one.
VII. Hadoop Testing Program
# Calculating pi values
# Sudo-u hdfs hadoop jar/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 100
...
Job Finished in 126.439 seconds
Estimated value of Pi is 3.14800000000000000000
# The execution result is displayed.
You can view detailed job information on the YARN web interface.
There are also many testing programs that can run the test. Here we will not demonstrate them one by one.
In addition, this installation method is fast and convenient, but it is not conducive to the overall understanding, it is recommended that you download the installation package, the configuration file and so on are all manually written, a deep understanding.
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition