Notes for installing Apache hadoop (cloudera cdh4)

Source: Internet
Author: User

Cloudera cdh4 has three installation methods:

1. Automatic Installation through cloudera Manager (only 64-bit Linux operating systems are supported );

2. Use the yum command to manually install the package;

3. Manually install the tarball package;

I personally recommend that you try either method 1 or 2. You should first have a clear understanding of the hadoop architecture, built-in components, and configurations. For specific installation, refer to the official documentation (cdh4 Installation Guide and CM-4.0-free-installation-guide), I am here to talk about method 1 installation (some also apply method 2) points of attention, for reference only:

A) we recommend that you use 64-bit Linux systems to deploy clusters, prepare 3-5 machines, or 3-5 systems in the VM, each machine must have an independent IP address and Host Name (the most convenient Vm, as long as one is installed, others can be copied)

B) pre-install Java on each machine, and configure java_home and modify path; save installation program download and installation time, Java SE 1.6 or above,: http://www.oracle.com/technetwork/java/javase/downloads/index.html

C) SELinux must be disabled on the machine where the cloudera Manager server is installed. Install PostgreSQL (as a database) at: http://www.postgresql.org/download/linux /. Download some built-in systems. You can check them first. Use the yum list PostgreSQL command for my RedHat, develop port 7180 in the firewall, or directly disable the Firewall Service iptables stop.

D. Install the cloudera manager agent on the following machines ):

D1) set the yum timeout to be large enough or none (the default value is 30 in the system). In my RedHat configuration in/etc/yum. conf, add timeout = none.It is very important to note that some of my friends may have a good network or the server network is stable once, but it is a great lesson for me, And I have failed several times, prompting a socket timeout error, even worse, cloudera manager will roll back when an error is found during installation and everything needs to be re-installed. When the timeout value is set to an infinite value, you will always try to connect to the server. Sometimes the installation will be stuck here, which may be due to network congestion or yum package cache congestion,
You don't have to worry about it if you don't roll it back. Solution: click "Abort installation" on the installation page and return to the installation machine to kill the cloudear Manager server. You can run skill-9-T PtY/1 (terminal name ), then, clear the yum cache (YUM clean all) and return to the installation page to reinstall the software. The installation policy is that the software will not be installed as long as it exists.

D2) Ensure that the root directory (/) has enough space. In Linux, I use DF-h to check whether there is more than 1 GB space.What is important here is for the VM. In many cases, it does not provide you with a manual disk partitioning step. You can find information online or refer to my blog to increase the root directory space.

D3) Make sure that cyrus-sasl-gssapi is installed.: Http://asg.web.cmu.edu/sasl/sasl-library.html or http://rpmfind.net/linux/rpm2html/search.php? Query = cyrus-sasl-gssapi

D4) disable the firewall.For machines on namenode nodes or other hadoop machines, because there are many components and services, there are many ports, and all firewall can be disabled to ensure normal.

E) In the hadoop environment, most of them are accessed by domain names. For domain name resolution, you can add a ing name and add it to several hadoop machines and external access machines. Windows is located in C (installation disk): \ windows \ system32 \ drivers \ etc, Linux is located in/etc/hosts

 

For method 1, my installation steps:

1) preparation: vm7.1, RedHat 5.7 (64bit, some machines do not support Virtual Machine 64bit, you can view your CPU chip), jdk1.6, cloudera-manager-installer.bin, postgresql8.4, cyrus-sasl-gssapi

2) install RedHat with a VM and store the file under G: \ hadoop \ SCM-manager. Install Java in the system, configure environment variables, configure the IP address (192.168.0.113), configure the host name SCM-manager, complete domain name for the scm-manager.myhadoop.com, disable the firewall, install postgresql8.4

3) when the above system is shut down, copy the file, that is, copy SCM-manager under G: \ hadoop and change it to SCM-name. Under the SCM-name file, modify the last three digits of ethernet0.generatedaddress and UUID. BIOS under the scm-manager.vmx. The two must be the same for physical address modification.

4) Start SCM-name in the virtual machine, configure the IP address (192.168.0.114), configure the host name SCM-name, the complete domain name is the scm-name.myhadoop.com, and restart the system. Note: The machine can be connected to IOT platform.

5) Start SCM-manager in the virtual machine, install cloudera Manager server, and install the server according to the official process (basically the next operation). Note: The machine can be connected to the Internet.

6) to install hadoop, enter http: // 192.168.0.113: 7180/In any browser to go to The hadoop installation page and choose to install hadoop on 192.168.0.114, after all components are successfully installed, log out and exit without entering the next step. The next step is "host detection ".

7) Disable 192.168.0.114, copy the Virtual Machine folder, rename the folder, modify the physical address, configure the IP address, configure the host name, and restart the system (exactly the same as step 3;

Node 1 System) G: \ hadoop Folder: scm-node1 IP Address: 192.168.0.115 Host Name: scm-node1 add ing name in/etc/hosts, format: 192.168.0.115 scm-node1.myhadoop.com scm-node1

Node 2 system) G: \ hadoop Folder: scm-node2 IP Address: 192.168.0.116 Host Name: scm-node2 add ing name in/etc/hosts, format: 192.168.0.116 scm-node2.myhadoop.com scm-node2

Backup node System) G: \ hadoop Folder: SCM-second IP Address: 192.168.0.118 Host Name: SCM-second add ing name in/etc/hosts, format: 192.168.0.118 scm-name.myhadoop.com SCM-second

8) re-enter the installation page, that is, step 1, add the four machines 6th, 114, 115, and 116, and complete the installation step by step.

9) Add the following ing to the accessed Windows 7 host file:

The 192.168.0.114 scm-name.myhadoop.com
192.168.0.115 scm-node2.myhadoop.com
192.168.0.116 scm-node1.myhadoop.com
192.168.0.118 scm-second.myhadoop.com

At this point, hadoop installation is complete, as follows:

Go to the components page through the Management Console of cloudera Manager. The following is the hue interface:

If you have any shoes that require hadoop installation, Vm root directory expansion, Vm replication, and IP Address Configuration, you can leave a message or send me an email, zzhua2007 # hotmail.com (# Replace @). We also look forward to communicating with children who are interested in hadoop.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.