Cloudera Manager 5 and CDH5.7.0 local (offline) installation __cloudera

Source: Internet
Author: User
Tags install perl

Recently engaged in the installation of Cloudera Manager, experienced a lot of frustrations, summed up:

Also referred to a number of other people's posts such as:

http://blog.csdn.net/a921122/article/details/51939692

Http://www.aboutyun.com/forum.php?mod=viewthread&tid=9086&extra=page%3D1

http://www.aboutyun.com/forum.php?mod=viewthread&tid=10852&highlight=%C0%EB%CF%DF%B0%B2%D7%B0Cloudera% 2BManager

The approximate method is feasible.
System Environment4 virtual machines, including Master node 4 core, 8GB, other nodes 4 cores, 16GB. Network card: 1000M. Total hard drive 8.7TB. Network environment intranet. Centos7 x64 (Install the system as much as possible to install the development package, another master node requires MySQL can be installed in the system tick).

Prepare work Uninstall system with OPEN-JDK (all nodes)Installed CentOS systems are sometimes automatically installed OpenJDK, with command java-version view: Java version "1.6.0"
OpenJDK Runtime Environment (build 1.6.0-b09)
OpenJDK 64-bit Server VM (build 1.6.0-b09, Mixed mode)

If shown above, there are already openjdk in the system, execute the following command to see which OPENJDK related packages are in the system: RPM-QA | grep java

The following packages must be uninstalled, and depending on the system version, the package version number differs: java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64 Java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64 java-1.6.0-openjdk-devel-1.6.0.0-1.66.1.13.0.el6.x86_64 Execute the following command Unloading:
RPM-E--nodeps java-1.5.0-gcj-1.5.0.0-29.1.el6.x86_64
RPM-E--nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
RPM-E--nodeps java-1.6.0-openjdk-devel-1.6.0.0-1.66.1.13.0.el6.x86_64


Install JDK (all nodes)Download the RPM package from the official website, this time use version 1.7.0_55-b13 (CDH5 may support the version prior to 1.7, the specific situation is not tested), execute the command: RPM-IVH jdk-7u55-linux-x64.rpm

Since the RPM package does not require us to configure the environment variables, we only need to configure a global java_home variable to execute the command: echo "java_home=/usr/java/latest/" >>/etc/environment

Execute a command to see if the JDK is properly installed Java-version
Javac-version

Modify Host nameModify/etc/sysconfig/network File: Networking=yes
Hostname=master.hadoop


where hostname is consistent with the hostname. Host name if inconsistent with system installation please execute the hostname command to take effect immediately, otherwise it will affect each node to access each other. Modify/etc/hosts file, add: 188.188.2.170 Master
188.188.2.171 Datanode1
188.188.2.173 Datanode2
188.188.2.174 Datanode3

Execute Command: Service network restart

unblock ssh (all nodes)There is a slight difference between the master node and the other Datanode nodes. First of all, there are nodes to execute the following command, encountered prompts all the way to return: ssh-keygen-t RSA

Then execute the following command on the primary node: Cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

SCP files to all Datenode nodes: SCP ~/.ssh/authorized_keys root@datanode1:~/.ssh/

Then enter the password, and then to other machines will not need the password amount.
shutdown Firewall (all nodes)Firewalls can cause various exceptions to the communication of Hadoop-related components. Firewall: Service FIREWALLD stop (temporary shutdown)
Systemctl Disable FIREWALLD (effective after reboot)

Selinux:setenforce 0 (temporarily in effect) modifies the selinux=disabled under/etc/selinux/config (effective after reboot).


Install NTP service (all nodes)All hosts in the cluster must keep time synchronized, and if the time difference is large, it will cause various problems. The idea is as follows: The master node synchronizes time with the external pair as the NTP server, and then provides time synchronization services for all Datanode nodes. All Datanode nodes base synchronization time on the master node.
All nodes install related components: NTP and ntpdate. Install in order, after completion, configure boot boot: chkconfig ntpd on

Check whether the setting is successful: Chkconfig--list ntpd

Where 2-5 of the on State represents success.
Configure Intranet NTP server (Master node)Before the configuration, the first use of Ntpdate manual sync time, so that the machine and the time center gap is too large, so that ntpd can not be normal synchronization. Here the 65.55.56.206 is chosen as the center of the time. Ntpdate-u 65.55.56.206

The NTP service has only one configuration file and is configured OK. Here only a useful configuration, no need for the configuration are used to note out, here is not given: Driftfile/var/lib/ntp/drift
Restrict 127.0.0.1
Restrict-6:: 1
Restrict default nomodify Notrap
Server 65.55.56.206 prefer
Includefile/etc/ntp/crypto/pw
Keys/etc/ntp/keys

Configuration file complete, save exit, start service, execute the following command:
Service NTPD Start
Chkconfig ntpd on (set to boot)

Check for success, with the Ntpstart command to view the sync status, the following status represents the successful start: Synchronised to NTP server () at Stratum 2
Time correct to within MS
Polling server every 128 s

If there are exceptions, wait a few minutes, and generally wait 5-10 minutes to sync. Configure the NTP client (all Datanode nodes) Driftfile/var/lib/ntp/drift
Restrict 127.0.0.1
Restrict-6:: 1
Restrict default Kod nomodify notrap nopeer noquery
restrict-6 default Kod nomodify notrap nopeer noquery
Server 192.168.1.101
Includefile/etc/ntp/crypto/pw
Keys/etc/ntp/keys

Ok save exit, before requesting server, please manually sync time with ntpdate: Ntdate-u 188.188.2.170 (intranet NTP server)

There may be a synchronization failure, please do not worry, the general is the local NTP server has not started normally, the general need to wait 5-10 minutes before normal synchronization. Start Services: Service ntpd start
Chkconfig ntpd on

Because it is connected to the intranet, this startup wait time will be faster than the master node, but also need to be patient for a while.
MySQL configuration (master node)You only need to configure MySQL for the master node.
Rpm-qa|grep MARIADB--Find out if there is a mariadb exists, to delete, otherwise install MySQL server will report conflict conflict problem
RPM-E--nodeps mariadb*--if there's a mariadb bag, delete it.
Yum install perl-module-install.noarch--generic virtual machines are not Perl Module and require Yum installation
Then start installing MySQL server
RPM-IVH mysql-server-5.6.30-1.el7.x86_64.rpm

RPM-IVH mysql-client-5.6.30-1.el7.x86_64.rpm
After installation, if the service MySQL start did not start successfully, reported error 2002 can not find any *.pid, then use the Mysqld_safe & command to start, and then see MySQL started.

Set MySQL to boot: chkconfig mysqld on

This installation requires the creation of the following database (excluding Cloudera Manager's database, Cloudera Manager database with related script creation, followed by instructions)--hive database
Create DATABASE hive DEFAULT CHARSET UTF8
--Cluster monitoring database
Create database Amon DEFAULT CHARSET UTF8
--hue Database
Create database hue default CHARSET UTF8--oozie databases Create db oozie default CHARSET UTF8; The above database may differ slightly depending on the installation components. Give the user authorization (here the password is set to Hadoop) grant all on *.* to root@ "%" identified by "Hadoop";

Officially started installation Cloudera Manager 5 (CM5)
Download Address http://archive-primary.cloudera.com/cm5/cm/5/, according to their own system to select the appropriate version, the installation of the selection is cloudera-manager-centos7-cm5.7.0_x86_ 64.tar. Upload to the master node only when the download is complete. Then extract to the/OPT directory, can not extract to other places, because the source of the CDH5 will be the default in/opt/cloudera/parcel-repo search, how to make Cdh5 local source files will be introduced later. Add CLOUDERA-SCM users to all nodes: Useradd--system--home=/opt/cm-5.7.0/run/cloudera-scm-server--no-create-home--shell=/bin/ False--comment "Cloudera SCM User" CLOUDERA-SCM

Modify the Server_host server_host=master below/opt/cm-5.0.0/etc/cloudera-scm-agent/config.ini

Establish a database for Cloudera Manager 5:/opt/cm-5.7.0/share/cmf/schema/scm_prepare_database.sh MySQL Cm-hlocalhost-uroot-phadoop- -scm-host localhost SCM SCM

Format is: scm_prepare_database.sh database type database server user name password –scm-host cloudera_manager_server The machine, the back of the three do not know what to represent, directly copy the official website. Because we're using the MySQL database, we need to download MySQL's JDBC driver, This time from the official online download the latest stable version: Mysql-connector-java-5.1.30.tar.gz, after decompression to find Mysql-connector-java-5.1.30-bin.jar put to/opt/cm-5.0.0/ The share/cmf/lib/directory. When you install other components later, it will find the driver from the/usr/share/java directory, and then automatic CP, so for the sake of insurance, it is recommended to copy the MySQL driver jar package to the agent node's/usr/share/java (preferably the version and "Bin" "Remove, as Mysql-connector-java.jar),
Turn on the Cloudera Manager 5 server end:/opt/cm-5.7.0/etc/init.d/cloudera-scm-server start

Note Server does not start off or restart immediately, because the first startup automatically creates related tables and data, and if you quit for special reasons, delete all tables and then start again after the data, otherwise the startup will be unsuccessful. Open the Cloudera Manager 5 agents end. First scp/opt/cm-5.7.0 to all datanode nodes and then on each machine opens the agents end: Scp-r/opt/cm-5.7.0 root@datanode1:/opt/cm-5.7.0
Wait for copy to succeed, start on all Datanode nodes: (Note that you must start with Administrator privileges) Sudo/opt/cm-5.7.0/etc/init.d/cloudera-scm-agent start/opt/cm-5.7.0/run/ That's it. Before starting the server, there will be a server folder and the server's PID file, you need to manually build an agent folder, or startup agent will fail.

The browser starts the Cloudera Manager 5 console (the default port number is 7180) and the login page is visible when you start successfully.

Install CDH5
First download to the local http://archive-primary.cloudera.com/cdh5/parcels/5.7.0/, where you need to download two things, first is the corresponding system version of the parcel package, Then the Manifest.json file. When the download is complete, place the two files under the/opt/cloudera/parcel-repo of the master node (the directory is already generated Cloudera Manager 5 o'clock), and note that the directory cannot be mistaken for a single word. Next open the Manifest.json file, which is the JSON format configuration, we need is the system version of the corresponding hash code, because we are using Centos6.5, so find the following location:


Find the value of "hash" at the bottom of the curly brace.



Copy the value of "hash" and then create a file with the file name consistent with your Parel package name, plus the. sha suffix:


So your directory will have these 3 files, copy the "hash" value into the new Sha file, and save, well, our local source production is complete. This is basically done, and then the operation is the console follow the steps to install it.
Open http://188.188.2.170:7180, login console, the default account and password are admin, the installation of a free version, after the CM5 for the Chinese support is very strong, according to the prompts to install, if the system configuration has any problems in the installation process will be prompted, Follow the prompts to install the components to the system.
If you chose to install hive during installation, you may experience an installation failure, check the log discovery when installing hive to install JDBC driver, so we copy the MySQL driver package to/opt/cloudera/parcels/ cdh-5.0.0-1.cdh5.0.0.p0.47/lib/hive/lib/directory, and then continue to install the installation will not encounter problems.
Additional points:
If you create a new. sha file directly, copy the hash value from the last line of Manifest.json and validate the validation failure directly, so that cm will download the parcel package directly from the remote instead of using a locally downloaded parcel.
The solution is to copy the *.sh1 file directly, and then add the hash value of Manifest.json, so that the hash can be validated by using the local download good parcel package.
Note that the *.sh1 file will also be downloaded when downloading the parcel package.
MySQL database preferably with the CM master node on a node, otherwise there will be no remote permissions problems.


Installation configuration for CDH5
After the Cloudera Manager server and the agent are started, the CDH5 installation configuration is available.
At this point can be accessed through the browser to the main node of the 7180 port test (since the start of CM server will take some time, this may have to wait for a while to access), the default user name and password are admin:



After each agent node starts normally, you can see the corresponding node in the currently managed host list. Select the node to install and click Continue.


Next, the following package name appears, stating that the local parcel package is configured correctly, and the direct point continues.


If the parcel package is not found here, restart the agent service for all nodes and master's Server service.


Click, continue, if the configuration of local parcel package is correct, then the downloaded in the figure below, should be completed in an instant, and then is patiently waiting for the allocation process on the line, the speed of this process depends on the speed of transmission between the nodes.



Currently under Tube

If there is a problem when the installation, such as network connectivity interruption, machine crash, continue to install the time may appear to query the machine, and according to the IP search machine, appear "current under the tube"

The state is "yes", the machine that failed to install can no longer choose.


Stop all services first. Clears the database.

1> Delete the UUID for the agent node

# rm-rf/opt/cm-5.4.7/lib/cloudera-scm-agent/*

2> clears the master node cm database

Enter the MySQL database of the master node, then drop database cm;

3> the CM database on the primary node

#/opt/cm-5.4.7/share/cmf/schema/scm_prepare_database.sh MySQL cm-hlocalhost-uroot-p123456--scm-host localhost SCM SCM SCM

Wait a moment, connect access web:7180 can

All nodes deployed complete



Next is the host check, and you may experience the following issues:


Cloudera recommends setting the/proc/sys/vm/swappiness to 0. The current setting is 60. Use the SYSCTL command to change the setting at run time and edit the/etc/sysctl.conf to save the setting after the reboot. You can proceed with the installation, but you may experience problems, and Cloudera Manager reports that your host is running poorly due to swapping. The following hosts are affected:
can be solved by echo >/proc/sys/vm/swappiness. (note switch to root)

Transparent large page compression is enabled, which can cause significant performance problems. Run echo never >/sys/kernel/mm/transparent_hugepage/defrag to disable this setting, and then add the same command to the initial scripts such as/etc/rc.local to set up when the system restarts. The following hosts will be affected


The next option is to install the service:

Service configuration, generally keep the default on it (Cloudera Manager will be configured automatically according to the machine configuration, if you need special adjustments, you can set it yourself):

The next step is to set up the database, and after checking it, you can do the following:


You may need to create a new Oozie database here


The following is a review page for cluster settings, all of which remain the default configuration:

Finally to install the various services, note that the installation of hive, or Oozie may be an error, because we use MySQL as the hive of metadata storage, hive default without MySQL driver, the following command to copy one on the line:

cp/opt/cm-5.7.1/share/cmf/lib/mysql-connector-java-*-bin.jar/opt/cloudera/parcels/cdh-5.7.1-1.cdh5.7.1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.