Apache BigTop trial

Source: Internet
Author: User
Tags hadoop ecosystem

Bigtop is a tool launched last year by the apache Foundation to pack, distribute, and test Hadoop and its surrounding ecosystems. The release is not long. In addition, the official documentation is very simple. It only tells you how to use bigtop to install hadoop. Bigtop is an interesting toy in my personal experience. It is of little practical value, especially for companies and individuals preparing to write articles on hadoop itself, it is a very beautiful thing to look at, but the actual deployment is very debatable.


Bigtop is intended to be a big tent of the circus. The entire apache hadoop ecosystem is composed of animal logos. Therefore, the software Cloudera submitted to apache has a clear intention, he wants to use this to unify the Hadoop ecosystem. In short, Cloudera is ambitious.


I did two things before playing this thing. I read the official documents and find Chinese documents, but I am very sorry, the official document only describes how to download the official repo of apache bigtop to add a hadoop installation source to your operating system, but does not show how it is packaged and distributed. I only found a few articles and an introduction about installing hadoop through bigtop. there is basically no reference value, so I can only study it myself.


If you only read the official documentation, you will think that this is just an official apache source for installing hadoop ecosystem, including yum repo and apt repo. But in fact, this software is still a bit level, and his most important function is not for you. repo file. Instead, you can package apache's hadoop ecosystem into an installation package and create a repository source. In this way, you can package apache's official hadoop ecosystem into your own package for distribution, and the compatibility between these ecosystem software and the hadoop release version is automatically solved by bigtop, cloudera's ambition lies in this automatic solution and continues.


Can I build my own hadoop release and start a company to create Cloudera? I don't think you should think about this. In the packaging process of bigtop, apart from downloading the jar file in maven's repo, the most important thing is that he needs to download many jar packages of cloudera. What you finally finished is a cloudera and apache rpm package. This is what I think cloudera's ambition is, so hortonworks and mapr are nothing like this. Not mentioned. With regard to open-source, there is something in it that closes the source. God knows what the jar package of the closed source is doing. No one has verified the performance and stability. So I think this is a toy. Just play with it and put the things he typed into the production environment. What will happen? Only God knows. Of course, he is open-source. You can modify the content in pom to replace cloudera with your own, but I think it is more difficult to directly change the hadoop source code.


I will introduce how to use centos. According to the requirements of the official documentation, you must resolve the following Dependencies by yourself:

Building Bigtop requires the following tools:

  • Java JDK 1.6

  • Apache Ant

  • Apache Maven

  • Wget

  • Tar

  • Git

  • Subversion

  • Gcc

  • Gcc-c ++

  • Make

  • Fuse

  • Protobuf-compiler

  • Autoconf

  • Automake

  • Libtool

  • Sharutils

  • Asciidoc

  • Xmlto

On RPM-based systems one also needs

  • Lzo-devel

  • Zlib-devel

  • Fuse-devel

  • Openssl-devel

  • Python-devel

  • Libxml2-devel

  • Libxslt-devel

  • Cyrus-sasl-devel

  • Sqlite-devel

  • Mysql-devel

  • Openldap-devel

  • Rpm-build

  • Createrepo

  • Redhat-rpm-config (RedHat/CentOS only)


I played on centos6, and then I found that installing these things with yum was just the beginning of the Long March. Maven and ProtocolBuffer must be installed by yourself, but not in yum. Then he wrote less dependent on cmake, And then you still need to go to export PATH, JAVA_HOME, LD_LIBARY_PATH or something on your own.


After installing various dependencies, you can start to build hadoop. Bigtop-0.6.0 is compiled and packaged with the latest apache-hadoop-2.0.5-alpha source code.


Normally, you will encounter errors during compilation. maven will claim that the com. google. protobuf class cannot be found. At this time, you still need to modify a hadoop pom. xml file and replace protobuf in the compilation configuration with the jar package of prorocolbuffer compiled by yourself. The jar package of protobuf automatically downloaded by maven should be faulty and cannot be used.


The compilation process is very simple, but it is very troublesome to compile and troubleshoot. After compilation, it looks familiar.

[Root @ localhost hadoop] # ll-h

Total usage 77 M

-Rw-r -- 1 root 12 M July 6 16:33 hadoop-2.0.5-1.el6.src.rpm

-Rw-r -- 1 root 15 M July 6 16:53 hadoop-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 7.6 K July 6 16:53 hadoop-client-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 15 K July 6 16:53 hadoop-conf-pseudo-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 139 K July 6 16:53 hadoop-debuginfo-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.0 M July 6 16:53 hadoop-doc-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 12 M July 6 16:53 hadoop-hdfs-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.6 K July 6 16:53 hadoop-hdfs-datanode-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 21 K July 6 16:53 hadoop-hdfs-fuse-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.5 K July 6 16:53 hadoop-hdfs-journalnode-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.6 K July 6 16:53 hadoop-hdfs-namenode-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.6 K July 6 16:53 hadoop-hdfs-secondarynamenode-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.6 K July 6 16:53 hadoop-hdfs-zkfc-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 17 M July 6 16:53 hadoop-httpfs-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 26 K July 6 16:53 hadoop-libhdfs-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 11 M July 6 16:53 hadoop-mapreduce-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.6 K July 6 16:53 hadoop-mapreduce-historyserver-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 8.8 M July 6 16:53 hadoop-yarn-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.5 K July 6 16:53 hadoop-yarn-nodemanager-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.4 K July 6 16:53 hadoop-yarn-proxyserver-2.0.5-1.el6.x86_64.rpm

-Rw-r -- 1 root 4.5 K July 6 16:53 hadoop-yarn-resourcemanager-2.0.5-1.el6.x86_64.rpm


The only difference is that there is no cdh prefix in front of Cloudera. In addition to compiling 2.x, 1.x, I also tried. By default, the source code file cannot be found remotely.


I also tried to compile other things. The official mahout code can be executed to add more than 70 megabytes of source code and pack more than 100 megabytes. When the source code of oozie is less than 1 MB, more than 60 MB is reached.


Hive and pig compilation failed, and I directly ran my 2G memory virtual machine, so I think this is basically a toy, and it is affected too much by cloudera. These compiled installation packages, when installing with rpm, you need to use yum to solve the dependency, which is much more than the package prepared by him. I just tried the rpm-ivh hadoop-2.0.5-1.el6.x86_64.rpm and told me that some dependencies were missing, and then I looked at them with yum, even these dependencies plus dependencies, to download more than 150 MB. In addition to the 100 M dependencies I downloaded before compilation, if this is used in the production environment, each server needs to download hundreds of megabytes of operating system dependencies before deployment, which is too bad.


So I think it's okay to have fun with this thing. I think it's okay to study its principles and use it in the production environment. It's even harder to install the hadoop ecosystem with the tar package.

This article was posted on the "practice test truth" blog and declined to be reproduced!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.