Objective
Large data-related backend development work more than a year, with the development of the Hadoop community, and constantly try something new, this article focuses on the next Ambari, the new http://www.aliyun.com/zixun/aggregation/14417. HTML ">apache" project, designed to make it easy for you to quickly configure and deploy the environment for Hadoop ecosystem-related components, and to provide maintenance and monitoring capabilities.
As a novice, I talk about my own learning experience, just beginning to learn, of course, the simplest Google Hadoop, and then download the relevant packages, in their own virtual machine (CentOS 6.3) installed a stand-alone version of Hadoop to do the test, write a few test classes, Then do the crud test and so on, run Map/reduce test, of course, this time for Hadoop is not very understanding, and constantly look at other people's articles, understand the overall structure, what they do is to modify the Conf under the several profiles, so that Hadoop can run normally, This time several in the modification configuration, after this stage, also uses the hbase, this Hadoop biosphere ecosystem's another product, certainly modifies the configuration, then start-all.sh, start-hbase.sh the service to start, then is modifies own procedure, does the test, With HBase to learn the next zookeeper and Hive, and then after the operation of the stage, began to study Hadoop2.0, as well as the csdn of many of the articles of Daniel, the biosphere as a whole some understanding of the ecosystem, The technology involved in the development of the company is just these. But as a person who likes to explore, whether want to know more about it, how is its performance? How does it work specifically? See the big company of those PPT, others (Taobao and other large companies) are dozens of, hundreds of, or even thousands of nodes, how people are managed, performance is how? Looking at the performance test curves inside the PPT, are you also able to learn more about the performance tuning of your project? I seem to have found the answer, and that is Ambari, a Hadoop-related project developed by Hortonworks, that can be officially understood.
Learn about the Hadoop biosphere
Now we often see some of the keywords are: hdfs,mapreduce,hbase,hive,zookeeper,pig,sqoop,oozie,ganglia,nagios,cdh3,cdh4,flume,scribe, Fluented,httpfs and so on, in fact, there should be more, the Hadoop biosphere is now a fairly prosperous development, and in these prosperity behind who is the promotion of it? A friend who has read the history of Hadoop may know that Hadoop first started with Yahoo, but now it is mainly by the Hortonworks and Cloudera 2 companies in the defenders, most of the commiter belong to these 2 companies, So now there are 2 major versions of the CDH series, and Community Edition, I first used the community version, and later changed to CDH3, and now back to the community version, because there are ambari. Of course, with what and what not, as long as their own technology home, or can be modified to run the normal. There's not much to say here. . After talking so much nonsense, start talking about Ambari installation.
Start deployment
First understand the next Ambari, the project address in: http://incubator.apache.org/ambari/
Installation documentation in: http://incubator.apache.org/ambari/1.2.2/installing-hadoop-using-ambari/content/index.html
Installation of the time please look at the installation document, the installation document must look carefully, combined with their current version of the system, configuration of different sources, and the installation process takes a relatively long time, so you need to seriously do every step of the installation document. Here I will say some of the problems I have encountered. (also refer to the Hadoop cluster monitoring Tool Ambari installation)
Here's my own installation process.
Machine Preparation:
My test environment uses 9 HP machines, cloud100-cloud108, cloud108 as management nodes.
Ambari Installed environment path:
installation directory for each machine:
/usr/lib/hadoop
/usr/lib/hbase
/usr/lib/zookeeper
/usr/lib/hcatalog
/usr/lib/hive
Log path, where you need to look at the error information can be found in the directory related logs
/var/log/hadoop
/var/log/hbase
Path to configuration file
/etc/hadoop
/etc/hbase
/etc/hive
HDFs Storage Path
/hadoop/hdfs
Points to note in the installation process:
1, installation, need to do a good job of each machine ssh password-free landing, you can refer to the CentOS6.4 of the illustrated SSH bidirectional login configuration, after doing well, from the management node to each cluster node, can use this landing.
2, if your machine has previously installed the related services of Hadoop, especially the HBASE configuration of the Hbase_home environment variables, need to unset off, this environment variable will affect, because I put these paths in the/etc/profile inside cause affect HBASE, Because the path to the Ambari installation may not be the same as the one you installed earlier.
3, in the Service Selection page, Namenode and Snamenode need to decorate together, I tried to do HA and separate them, but Snamenode has not been up, causing the whole start-up failure, the next time need to spend on ha.
3.png (161.09 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
4. Jobtrakcer and Namenode can also lead to start-up.
5. Datanode nodes can not be less than the block replication number, the basic needs of >= 3.
4.png (89.18 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
6. Confirm Hosts, you need to pay attention to the inside of the Warning information, the relevant Warning are disposed of, there are some Warning will cause installation errors.
7. Remember the new users in the installation, and then you need these users.
5.png (43.68 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
8. Hive and HBase Master are deployed at the same node, of course you can also separate. Set up and start installing.
6.png (59.13 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
9. If the installation fails, how to reinstall.
First, remove the directory associated with the files that are already installed on the system.
SH file_cp.sh cmd "rm-rf/usr/lib/hadoop && rm-rf/usr/lib/hbase && rm-rf/usr/lib/zookeeper"
SH file_cp.sh cmd "rm-rf/etc/hadoop && rm-rf/etc/hbase && rm-rf/hadoop && rm-rf/var/log/hadoop"
SH file_cp.sh cmd "rm-rf/etc/ganglia && rm-rf/etc/hcatalog && rm-rf/etc/hive && rm-rf/etc/ Nagios && rm-rf/etc/sqoop && rm-rf/var/log/hbase && rm-rf/var/log/nagios && rm-rf/var/ Log/hive && rm-rf/var/log/zookeeper && rm-rf/var/run/hadoop && rm-rf/var/run/hbase && Rm-rf/var/run/zookeeper "
Then remove the attached package from the Yum.
SH file_cp.sh cmd "yum-y remove ambari-log4j hadoop hadoop-lzo hbase hive libconfuse nagios sqoop Zookeeper"
I use my own shell to execute commands across multiple machines:
Https://github.com/xinqiyang/opshell/tree/master/hadoop
Reset Ambari-server
Ambari-server stop
Ambari-server Reset
Ambari-server start
10. Attention to time synchronization, time problems will lead to Regionserver
7.png (210.96 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
Iptables need to shut down, sometimes the machine will reboot, so not only need service stop also need chkconfig shut down.
After the last installation is completed, the login Address view service situation:
http://management node ip:8080, such as I here: http://192.168.1.108:8080/login, need to set up before the installation Ambari-server when the account number and password entered
8.png (150.91 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
View Ganglia monitoring
9.png (168.44 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
View Nagios Monitoring
10.png (166.22 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
Test
After the installation is complete, look at these are normal, whether you need to verify it? But basically ran after smoke test, normal, basically still normal, but we also have to operate under it.
Verifying HDFs
11.png (120.58 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
Verifying map/reduce
12.png (134.94 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
Verifying HBase
13.png (111.13 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
Verifying hive
14.png (145.61 KB, download times: 0)
Download attachments save to album
2014-4-25 23:12 Upload
Summary
Here, the relevant configuration of Hadoop and HBase and hive are all configured, followed by some stress tests. There are other tests, for Ambari with the Hortonworks packaged rpm version of Hadoop-related source code, So there may be some difference with the other versions, but as a development environment, there is still no large impact, but it is not used in production, so no matter how stable, then I will be in the process of development projects, I will be encountered in the list of bugs. Generally speaking, Ambari is still worth using, after all, can reduce many unnecessary configuration time, and relatively in a stand-alone environment, in the cluster environment more close to production to do some related performance testing and tuning test, etc. And the configuration of ganglia and Nagios monitoring can also be released to let us see the cluster-related data, generally or recommended use, new things have bugs is inevitable, but in the process of use we will continue to improve. Next, if you have time, you will extend the functionality of the Ambariserver to add monitoring options for commonly used high-performance modules such as Redis/nginx. It's time to get it. In short, welcome to use Ambari.
Original link: http://www.uml.org.cn/sjjm/201305244.asp