How to manage and apply integrated system of Biginsights cluster based on Cloudera

Source: Internet
Author: User
Keywords Cloudera biginsights Application Integration System

This paper first briefly introduces the background of biginsights and Cloudera integration, then introduces the system architecture of Biginsights cluster based on Cloudera, and then introduces two kinds of integration methods on Cloudera. Finally, it introduces how to manage and apply the integrated system.

Cloudera and IBM are the industry's leading large data platform software and service providers, in April 2012, two companies announced the establishment of a partnership in this field, strong alliances. Cloudera provides a complete Hadoop system, and on this basis enhances scalability, stability, and platform performance. Infosphere Biginsights, based on the Hadoop system, has developed a wealth of large data analysis solutions, tools, and software. By deploying Biginsights to the CDH cluster, the advantages of both can be fully realized and the maximum value is achieved for the users.

Introduction of biginsights based on CDH3

Requirements background

Cloudera is a company that provides Hadoop software and services, Cloudera released CDH packages including Hadoop and its associated Open-source software, Cloudera the core features of Hadoop-distributed computing and highly scalable storage, and added other enterprise-class features such as security and high availability. Cloudera also publishes a software called Cloudera Manager for Automating the deployment of the Hadoop cluster and managing cluster services and configurations.

Infosphere Biginsights is IBM's large data management and analysis platform, based on the Hadoop system. Biginsights maintains IBM's Hadoop version, and on this basis, it improves job scheduling, MapReduce computing framework, and distributed File system. At the same time, Biginsights also provides a wide range of software and technologies including visual data querying, text analysis, and cluster control. Biginsights is similar to CDH3, but there are many differences. Compared with the Apache Hadoop system and Cluster management software, Biginsights offers a large number of industry-leading data analysis tools, an extension of existing open source technology and more Cloudera for enterprise applications. The specific comparisons are shown in the following table.

Table 1. Comparison of CDH3 and biginsights functions

features biginsights CDH3 Cluster Management is whether file management is the Eclipse development environment cluster Monitoring is the text Analysis tool whether the Visual data Analysis tool integrates tools or not

In some customer environments, they have deployed Cloudera Hadoop systems, stored the data in HDFS, and deployed some applications and top-level software. Without affecting the use of these systems, the deployment of biginsights to the CDH cluster, so that biginsights can run in the CDH cluster, can give full play to the advantages of biginsights data analysis to achieve 1+1>2 effect. Biginsights began supporting CDH3U3 from the 1.4 Enterprise Edition, and the Biginsights 2.0 release later announced support for Cdh3u4 and Cdh3u5. Currently, Cloudera has released CDH4, but because the release is still in beta, its stability and reliability do not meet the requirements of enterprise-class applications, so biginsights has not increased its support.

System architecture

Infosphere biginsights and Cloudera CDH3 contain a large number of software and tools, including core system Hadoop and software for data management and analysis based on Hadoop. The following table lists the components that are included in the Biginsights and CDH3 distributions.

Table 2. Biginsights and CDH3 Components list

Component Biginsights 2.0 cdh3u3 cdh3u4 cdh3u5 Overview MapReduce 1.0 0.20.2 0.20.2 0.20.2 mapduce Computing Framework HDFS 1.0.3 0.20.2 0.20.2 0.20.2 Hadoop Distributed File System HBase 1.0.3 0.90.4 0.90.6 0.90.6 Distributed column Database zookeeper 3.4.3 3.3.4 3.3.5 3.3.5 Distributed configuration Information Coordination Service Flume 0.9.4 0.9.4 0.9.4/1.1.0 0.9.4/1.2.0 Distributed Log Collection service Hive 0.9.0 0.7.1 0.7.1 0.7.1 sql-based class SQL Data Warehouse Oozie 3.2.0 2.3.2 2.3.2 2.3.2 Job Workflow Management/coordination System mapred 0.1 0.0 0.8.1 0.8.1 0.8.1 data Query language based on Hadoop Lucene 3.3.0       Java Full-text search engine Library Bigsheets 2.0       WEB based available Visual data query/analysis Tools orchestrator 2.0       mapred job Workflow Management/Coordination System JAQL 2.0       Distributed data query language based on JSON JAQL Ser Ver 2.0       REST services for processing JAQL queries Eclipse Tooling 2.0       Eclipse Development plug-in (including Mapred,hive,hbase, Pig etc) text-analytics (SYSTEMT) 2.0       Text analysis tool sqoop 1.4.1 1.3.0 1.3.0 1.3.0 data transfer tool Mahout   0.5 0.5 0.5 based on Hadoop Machine Learning Library whirr   0.5.0 0.5.0 0.5.0 Cluster service Management

As can be seen from the table above, there are many software that exists in two products, and when integrated, the Hadoop,hbase,zookeeper and Flume in CDH3 will replace the corresponding components in biginsights; for other open source components, such as Hive,oozie,pig And so on, Biginsights still installs IBM's version, of course these components will run on CDH3 Hadoop, because they do not cause any conflict, and the IBM-specific components will be installed and run on the CDH3 Hadoop cluster, such as Webconsole, Eclipsetooling,systemt. Biginsights ensures good platform compatibility and collaboration with CDH3, enabling users to enjoy Biginsights features and functionality on the basis of avoiding data/service migrations.

Biginsights and CDH3 Integration follow the following guidelines:
1. The biginsights and CDH3 deployments are relatively independent and do not affect the use of any previously CDH3 software and services
2. biginsights does not modify any CDH3 existing configuration
3. All biginsights operations will be submitted to the CDH3 Hadoop system for execution
4. In addition to a small number of management features are disabled, other features can be used normally
5. Support for manual configuration by CDH3 package and CDH3 integration by Cloudera Manager
6. Ensure Oracle Java compatibility

When deploying Biginsights to an existing CDH3 cluster, the structure of the software hierarchy is shown in the following illustration:

Figure 1. Biginsights and CDH3 Components list

As can be seen from the above illustration, Biginsights integrates the existing CDH3 components, such as Hdfs, MapReduce, zookeeper, and so on, incorporating them into the Biginsights software system so that they and other biginsights The components of a contract work on the same platform.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.