Learning about Hadoop experience and using Ambari Rapid deployment Hadoop Large Data Environment introduction

Source: Internet
Author: User
Keywords installation number save

Objective

Large data-related backend development work more than a year, with the development of the Hadoop community, and constantly try something new, this article focuses on the next Ambari, the new http://www.aliyun.com/zixun/aggregation/14417. HTML ">apache" project, designed to make it easy for you to quickly configure and deploy the environment for Hadoop ecosystem-related components, and to provide maintenance and monitoring capabilities.

As a novice, I talk about my own learning experience, just beginning to learn, of course, the simplest Google Hadoop, and then download the relevant packages, in their own virtual machine (CentOS 6.3) installed a stand-alone version of Hadoop to do the test, write a few test classes, Then do the crud test and so on, run Map/reduce test, of course, this time for Hadoop is not very understanding, and constantly look at other people's articles, understand the overall structure, what they do is to modify the Conf under the several profiles, so that Hadoop can run normally, This time several in the modification configuration, after this stage, also uses the hbase, this Hadoop biosphere ecosystem's another product, certainly modifies the configuration, then start-all.sh, start-hbase.sh the service to start, then is modifies own procedure, does the test, With HBase to learn the next zookeeper and Hive, and then after the operation of the stage, began to study Hadoop2.0, as well as the csdn of many of the articles of Daniel, the biosphere as a whole some understanding of the ecosystem, The technology involved in the development of the company is just these. But as a person who likes to explore, whether want to know more about it, how is its performance? How does it work specifically? See the big company of those PPT, others (Taobao and other large companies) are dozens of, hundreds of, or even thousands of nodes, how people are managed, performance is how? Looking at the performance test curves inside the PPT, are you also able to learn more about the performance tuning of your project? I seem to have found the answer, and that is Ambari, a Hadoop-related project developed by Hortonworks, that can be officially understood.

Learn about the Hadoop biosphere

Now we often see some of the keywords are: hdfs,mapreduce,hbase,hive,zookeeper,pig,sqoop,oozie,ganglia,nagios,cdh3,cdh4,flume,scribe, Fluented,httpfs and so on, in fact, there should be more, the Hadoop biosphere is now a fairly prosperous development, and in these prosperity behind who is the promotion of it? A friend who has read the history of Hadoop may know that Hadoop first started with Yahoo, but now it is mainly by the Hortonworks and Cloudera 2 companies in the defenders, most of the commiter belong to these 2 companies, So now there are 2 major versions of the CDH series, and Community Edition, I first used the community version, and later changed to CDH3, and now back to the community version, because there are ambari. Of course, with what and what not, as long as their own technology home, or can be modified to run the normal. There's not much to say here. . After talking so much nonsense, start talking about Ambari installation.

Start deployment

First understand the next Ambari, the project address in: http://incubator.apache.org/ambari/

Installation documentation in: http://incubator.apache.org/ambari/1.2.2/installing-hadoop-using-ambari/content/index.html

Installation of the time please look at the installation document, the installation document must look carefully, combined with their current version of the system, configuration of different sources, and the installation process takes a relatively long time, so you need to seriously do every step of the installation document. Here I will say some of the problems I have encountered. (also refer to the Hadoop cluster monitoring Tool Ambari installation)

Here's my own installation process.

Machine Preparation:

My test environment uses 9 HP machines, cloud100-cloud108, cloud108 as management nodes.

Ambari Installed environment path:

installation directory for each machine:

/usr/lib/hadoop

/usr/lib/hbase

/usr/lib/zookeeper

/usr/lib/hcatalog

/usr/lib/hive

Log path, where you need to look at the error information can be found in the directory related logs

/var/log/hadoop

/var/log/hbase

Path to configuration file

/etc/hadoop

/etc/hbase

/etc/hive

HDFs Storage Path

/hadoop/hdfs

Points to note in the installation process:

1, installation, need to do a good job of each machine ssh password-free landing, you can refer to the CentOS6.4 of the illustrated SSH bidirectional login configuration, after doing well, from the management node to each cluster node, can use this landing.

2, if your machine has previously installed the related services of Hadoop, especially the HBASE configuration of the Hbase_home environment variables, need to unset off, this environment variable will affect, because I put these paths in the/etc/profile inside cause affect HBASE, Because the path to the Ambari installation may not be the same as the one you installed earlier.

3, in the Service Selection page, Namenode and Snamenode need to decorate together, I tried to do HA and separate them, but Snamenode has not been up, causing the whole start-up failure, the next time need to spend on ha.

  

3.png (161.09 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

4. Jobtrakcer and Namenode can also lead to start-up.

5. Datanode nodes can not be less than the block replication number, the basic needs of >= 3.

  

4.png (89.18 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

6. Confirm Hosts, you need to pay attention to the inside of the Warning information, the relevant Warning are disposed of, there are some Warning will cause installation errors.

7. Remember the new users in the installation, and then you need these users.

  

5.png (43.68 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

8. Hive and HBase Master are deployed at the same node, of course you can also separate. Set up and start installing.

  

6.png (59.13 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

9. If the installation fails, how to reinstall.

First, remove the directory associated with the files that are already installed on the system.

SH file_cp.sh cmd "rm-rf/usr/lib/hadoop && rm-rf/usr/lib/hbase && rm-rf/usr/lib/zookeeper"

SH file_cp.sh cmd "rm-rf/etc/hadoop && rm-rf/etc/hbase && rm-rf/hadoop && rm-rf/var/log/hadoop"

SH file_cp.sh cmd "rm-rf/etc/ganglia && rm-rf/etc/hcatalog && rm-rf/etc/hive && rm-rf/etc/ Nagios && rm-rf/etc/sqoop && rm-rf/var/log/hbase && rm-rf/var/log/nagios && rm-rf/var/ Log/hive && rm-rf/var/log/zookeeper && rm-rf/var/run/hadoop && rm-rf/var/run/hbase && Rm-rf/var/run/zookeeper "

Then remove the attached package from the Yum.

SH file_cp.sh cmd "yum-y remove ambari-log4j hadoop hadoop-lzo hbase hive libconfuse nagios sqoop Zookeeper"

I use my own shell to execute commands across multiple machines:

Https://github.com/xinqiyang/opshell/tree/master/hadoop

Reset Ambari-server

Ambari-server stop

Ambari-server Reset

Ambari-server start

10. Attention to time synchronization, time problems will lead to Regionserver

  

7.png (210.96 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

Iptables need to shut down, sometimes the machine will reboot, so not only need service stop also need chkconfig shut down.

After the last installation is completed, the login Address view service situation:

http://management node ip:8080, such as I here: http://192.168.1.108:8080/login, need to set up before the installation Ambari-server when the account number and password entered

  

8.png (150.91 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

View Ganglia monitoring

  

9.png (168.44 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

View Nagios Monitoring

  

10.png (166.22 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

Test

After the installation is complete, look at these are normal, whether you need to verify it? But basically ran after smoke test, normal, basically still normal, but we also have to operate under it.

Verifying HDFs

  

11.png (120.58 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

Verifying map/reduce

  

12.png (134.94 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

Verifying HBase

  

13.png (111.13 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

Verifying hive

  

14.png (145.61 KB, download times: 0)

Download attachments save to album

2014-4-25 23:12 Upload

Summary

Here, the relevant configuration of Hadoop and HBase and hive are all configured, followed by some stress tests. There are other tests, for Ambari with the Hortonworks packaged rpm version of Hadoop-related source code, So there may be some difference with the other versions, but as a development environment, there is still no large impact, but it is not used in production, so no matter how stable, then I will be in the process of development projects, I will be encountered in the list of bugs. Generally speaking, Ambari is still worth using, after all, can reduce many unnecessary configuration time, and relatively in a stand-alone environment, in the cluster environment more close to production to do some related performance testing and tuning test, etc. And the configuration of ganglia and Nagios monitoring can also be released to let us see the cluster-related data, generally or recommended use, new things have bugs is inevitable, but in the process of use we will continue to improve. Next, if you have time, you will extend the functionality of the Ambariserver to add monitoring options for commonly used high-performance modules such as Redis/nginx. It's time to get it. In short, welcome to use Ambari.

Original link: http://www.uml.org.cn/sjjm/201305244.asp

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.