1. Cloudera Introduction
Hadoop is an open source project that Cloudera Hadoop, simplifies the installation process, and provides some encapsulation of Hadoop.
Depending on the needs of the Hadoop cluster to install a lot of components, one installation is more difficult to configure, but also consider ha, monitoring and so on.
With Cloudera, you can easily deploy clusters, install the components you need, and monitor and manage your clusters.
CDH is a distribution of Cloudera company, including Hadoop,spark,hive,hbase and some tools.
There are two versions of Cloudera:
Cloudera Express version is free of charge
Cloudera Enterprise (60 days trial) need to purchase registration code
2. Install Cloudrea Manager, deployHadoop cluster2.1 Installation Method
Install the Cloudrea Manager first, and then install the Cloudrea Manager client, CDH, and administrative tools on the node through Cloudrea Manager.
Official documents:
Https://www.cloudera.com/documentation/manager/5-1-x.html
Environmental requirements:
1. Turn off SELinux
2. Each node can SSH login
3. Add the host name of each node in the/etc/hosts
2.2 Installing Cloudrea Manager
You can install the package via the official one-click button or by Yum or RPM.
Installation of the official one-click installation package is described below.
This installation environment is Cnetos 7, installed on 3 machines
test165 (Cloudera Manager server)
test166 (Cloudera Manager Agent)
test167 (Cloudera Manager Agent)
2.2.1 Download one-click installation package
http://archive.cloudera.com/cm5/installer/latest/
Download the latest version: Cloudera-manager-installer.bin
2.2.2 Installing Cloudera Manager
To install Cloudera Manager Server on test165, start the Installation Wizard
# chmod a+x cloudera-manager-installer.bin#./cloudera-manager-installer.bin
The following screen appears
Select < Next > and < Yes to start the installation.
Need to download Java and Cloudrea Manager, a total of more than 600 MB, depending on the network situation, will take some time.
The following page appears and the installation is complete.
After the installation is complete, access the Cloudrea Manager's page, the user name password is admin
Http://IP or host name: 7180/
2.2.3 Installing the Cloudera Manager agent
Log on to the Cloudrea Manager page and select the version you want to install, Cloudera Express is installed
Select the host to install CDH, with hostname or IP search, this time is installed on three nodes CDH
Select Use Parcel to install, select CDH version
Choose to install the JDK
Provide SSH login information
Start installing the JDK and Cloudera Manager agent
If the download installs JDK or cloudera-manager-agent fails during installation, you can install it manually on the node and then continue installing on the Cloudrea Manager
# yum-y Install jdk# yum-y install oracle-j2sdk1.7# yum-y Install Cloudera-manager-agent
Download parcel and assign parcel to each node
Parcel package 1.5G, need a period of time, in order to improve the installation speed, you can first download the package to Cloudrea Manager Local, configure the local source
Parcel
Http://archive.cloudera.com/cdh5/parcels/5.5.1/
Copy the following files to the/opt/cloudera/parcel-repo/folder
Cdh-5.5.1-1.cdh5.5.1.p0.11-el7.parcel
Cdh-5.5.1-1.cdh5.5.1.p0.11-el7.parcel.sha
Manifest.json
After the installation is complete, point continues to the page where the results are checked
"Cloudera recommended setting/proc/sys/vm/swappiness to 0 when checking host correctness." The current setting is 30. "Warning, make the following settings
# vi/etc/sysctl.confvm.swappiness = 0# sysctl–p
When checking host correctness, the "enabled" transparent large page appears, which can cause significant performance issues. "Warning, make the following settings
echo never >/sys/kernel/mm/transparent_hugepage/enabledecho never >/sys/kernel/mm/transparent_hugepage/ defrag# Vi/etc/rc.localecho never >/sys/kernel/mm/transparent_hugepage/enabledecho never >/sys/kernel/mm/ Transparent_hugepage/defrag
2.3 Installation cluster, including hadoop,yarn,hive, etc.
After checking the correctness of the host, click Finish to enter the cluster configuration
Select the services you want to install, and you can choose to combine or customize
Configure how each node is allocated
Note: HDFs has a minimum of 3 data Node.
Test database connection
Start installation
3. Confirm, Test
Confirm that the cluster is in good condition and normal operation
1. On the cluster page confirm that all service status is OK
2. On the host page confirm that the heartbeat state of each node is normal, and the time is less than 15 seconds
3. Run the task to test
Log in to any host in the cluster and perform the following tasks (Calculate PI value with Hadoop, pi)
The meaning of the following 2 numeric parameters: 10 means to run 10 map tasks, 10000 refers to each map task, how many times to throw, 2 parameters of the product is the total number of throws.
# sudo-u HDFs Hadoop jar/opt/cloudera/parcels/cdh/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar Pi 10 10000
The results of the implementation are as follows:
The execution of the task can be confirmed on the yarn page
Applications, YARN-and Cluster 1, cluster
4. Other
On the Cloudrea Manager page, you can add/remove hosts to the cluster, add services to the cluster, and so on.
The Cloudrea Manager page opens google-analytics because the local access is slow and can be turned off google-analytics
Allow usage data collection not selected
5. PostScript
工欲善其事 its prerequisite, managing Hadoop clusters, Cloudrea is a good choice.
Hadoop Series (iii): Managing Hadoop clusters with Cloudera deployment