I. What is cdh? CDHisCloudera '100% opensourceHadoopdistribution, builtspecificallytomeetenterprisedemans is an open source distributed storage system. ii. what software and functions does cdh4 contain? First, hbase, hadoop, zookeeper, these are essential, followed by h
I. What is cdh?
CDH is Cloudera's 100% open source Hadoop distribution, builtspecifically to meet enterprise demands
An open-source distributed storage system
II. what software and functions does cdh4 contain?
First, hbase, hadoop, and zookeeper are essential.
Secondly, hive, oozie, Map/Reduce can also be integrated in it.
HBase is a distributed, column-oriented open source database. this technology comes from the Google paper "Bigtable: A Distributed Storage System for structured data" written by Chang etal"
Hadoop is a distributed system infrastructure developed by the Apache Foundation. You can develop distributed programs without understanding the details of the distributed underlying layer. Make full use of the power of clusters for high-speed computing and storage
ZooKeeper is a formal subproject of Hadoop. it is a reliable coordination system for large-scale distributed systems. It provides functions such as configuration maintenance, name service, distributed synchronization, and group service.
Hive is a Hadoop-based data warehouse tool that maps structured data files into a database table and provides a complete SQL query function. it can convert SQL statements into MapReduce tasks for running.
Oozie is a framework that allows us to combine multiple Map/Reduce jobs into a logical work unit.
MapReduce is a programming model used for parallel operations on large-scale datasets (larger than 1 TB. Concepts such as Map and Reduce are borrowed from functional programming languages, there are also features borrowed from vector programming languages. It greatly facilitates programmers to run their programs on distributed systems without distributed parallel programming.
III. cdh4 installation
Cdh4 installation is generally popular way to log on to the official website http://www.cloudera.com/blog/2012/02/introducing-cdh4/
Download the required rpm Package. according to the official documentation, install yum and configure it.
Here I want to introduce how to install cdh4 through cloudera-manager.
Cloudera-manager is also a product of the apache Foundation. Currently, there are two editions: the free version and the commercial version. The free version only supports 50 nodes, and the commercial version is not limited.
Of course, generally 50 nodes are enough. here we use the free version of cloudera-manager.
Https://ccp.cloudera.com/display/SUPPORT/Downloads
1. installation environment
Node1: 192.168.1.124
CentosSystem 6.2
Node2: 192.168.1.163 centos6.2 system
IptablesClose
Selinux disabled
2. install cloudera-manager
Node1:
The official download will get an executable file cloudera-manager-installer.bin
Here, we need to install the X Window System package Group in advance, the reason is very simple, graphical installation interface
During the installation, yum will automatically install the package required by yum, which is about 100 MB. yum is installed and automatically downloaded because it is a foreign source, coupled with the company's speed limit, tianchao's various policies often cause freezing and installation failures in a day.
My installation method is to directly interrupt the installation of the graphical interface, that is, kill it directly. at this time, the yum source he needs to import has been imported into our system.
Based on the connection http://archive.cloudera.com/cm4/redhat/6/x86_64/cm/4.0.4/ in The yum source
Manually download the following package:
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/2244231523-0.jpg "/>
After the download is complete, use yum for local installation
Yum localinstall -- nogpgcheck *. rpm
After yum installation is complete, re-run the cloudera-manager-installer.bin to complete the installation (if the installation fails and a prompt is installed, go to the/usr/share/cmf directory and delete the uninstall-cloudera-manager.sh file)
Appendix 1: both hosts must be installed, but a graphical interface for running is used as the console, and the other does not need to be moved. here I use node1 as the console
Appendix 2: install the jdk on both hosts. Otherwise, the jdk is automatically downloaded and installed. we recommend that you use the jdk installed in the rpm package.
3. install cdh4
①. After cloudera-manager is installed, it is automatically started. you can use netstat-tnlp to find ports such as 7182,7180 are started.
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/2244235H3-1.jpg "/>
Connect to http: // 192.168.1.124: 7180 through the webpage to enter the cloudera-manager web management portal. by default, the administrator user admin and password admin
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/2244231130-2.jpg "/>
After logon, the following dialog box is displayed, indicating whether to use the free version or commercial version.
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/2244236115-3.jpg "/>
②. The installation is complete on the cloudera-manager console web interface, which is very simple.
First, search for the host, enter the ip addresses of the two hosts, search for the host, and then select install
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/2244232J0-4.jpg "/>
Install the version cdh4, and so on, and then the installation page of the Read bar. here, like installing cloudera-manager, the yum source file is directly interrupted, then the system will kill the yum process and close the page.
To view the download software connection http://archive.cloudera.com/cdh4/redhat/6/x86_64/cdh/4/download the rpmpackage
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/224423JD-5.jpg "/>
Then, like above, yum localinstall -- nogpgcheck *. rpm
Finally, re-open the http: // 192.168.1.124: 7180 page and re-install the host.
Appendix 1: cloudera-manager console
Appendix 2: If the network speed is good, you do not need to interrupt it. you can directly wait for the installation to complete on the graphical interface. However, if the installation fails, do not try again. retry and uninstall the installed content, that is to say, I want to try again. because of foreign sources, I know everything about the Internet.
③. After installing and playing the above content, there will be a host detection, the host will be slow if there are many hosts, this depends on the individual, after the detection, you can choose the service, here I choose hbase, hadoop, zookeeper, and then start the service
Real-time service status detection
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/224423G06-6.jpg "/>
Real-time host status detection
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/2244236410-7.jpg "/>
Enter the host, open hbase shell test
650) this. width = 650; "src =" http://upload.server110.com/image/20130808/2244231543-8.jpg "/>
Now the cdh4 framework can be used.
Note: services that are not selected are not started by default. do not worry about this. if you need to use hive, you can manually execute