HBase as an open source implementation of BigTable, with the popularization of its application, more and more enterprises are applied to mass data system. This article will brief readers on the basics of Apache HBase and expand on IBM's HBase enhancements and extensions, HBase Master Multi-node high-availability support, and how to leverage IBM Biginsights for HBase in the IBM Hadoop cluster Service and job submission for monitoring and management. This article will help readers in the large Data cloud computing Hadoop cluster applications to use HBase more efficient, intuitive, easy to store, query and optimize the mass of data.
The basics of Apache HBase
November 2006, Google published a paper entitled "BigTable", February 2007, the developers of Hadoop to implement it and named HBase. HBase is a new type of data storage architecture based on column storage based on Hadoop, which solves large data problem and is a distributed database of Hadoop.
HBase is now quite mature and the latest stable version is 0.94.x. HBase has been adopted by many large companies, such as Facebook, Twitter,adobe, Cloudera, IBM, and so on. HBase is not a database based on a traditional RDBMS, but rather a data base that uses disk to store formats, with the advantage of providing fast access based on specific columns and sequential ranges of keywords.
HBase has three important components: a client library, a master server (which can be configured with multiple standby master, described later in this article), and multiple Region servers. Master is responsible for allocating Region to various Region servers, and Region server is responsible for storing the actual data. At the same time, HBase through the use of zookeeper, a reliable, highly available, consistent distributed collaborative services to help them complete the corresponding tasks. HBase Cluster administrators can adjust the workload by adding and removing Region Server nodes during system operation. HBase uses hfile as the basic format for storing data, and the underlying file system defaults to HDFS.
Figure 1. Hbases Basic Architecture
Hbases Basic Architecture
Figure 1. Shows how different components, such as hdfs,zookeeper, work in coordination with HBase. Master server handles the import balance of regions data across Region server, unloads the busy Region server burden, and transfers Region to a more free Region server.
HBase Master is not responsible for actual data storage, it coordinates import balancing, maintains cluster status, maintains schema changes, and metadata metadata operations, such as creating Tables and column families (column families), but never providing any data services.
Region servers is responsible for loading and maintaining Region, including processing all read and write requests to its managed Region, and splitting the Region size when it grows beyond the configured threshold.
After the client obtains the region server for the region that it needs to read and write through communication with zookeeper, it communicates directly with region server and the region server processes all the related requests.
HBase in IBM biginsights architecture
IBM Big Data Product Infosphere Biginsights is a large data management and analysis platform, and its underlying architecture uses Hadoop and HBase to store and query both structured and unstructured data.
HBase in Biginsightsz cluster software hierarchy
Biginsights integrates many of the existing Hadoop open source components, such as HDFS, MapReduce, HBase, zookeeper, and so on, incorporating them well into the Biginsights software system and with other Biginsights Components work together on the same platform. HBase is used as a biginsights storage database, and zookeeper is used as a biginsights service Synergy component. If you want to use HBase, we need to install Hadoop at the same time, zookeeper, because HBase uses Hadoop as its file system, using zookeeper as its service synergy support.
When deploying Biginsights to a cluster, the structure of the software hierarchy is shown in Figure 2. Shown here:
Figure 2. Biginsights Hadoop Open Source Component list
Biginsights Hadoop Open Source Component list
HBase Installation and Configuration
The Biginsights product integrates the HBase after the IBM JDK compilation and some improvements. During the installation of the Biginsights product, you can select and configure HBase through the installation interface, which includes specifying HBase installation paths, log directories, specifying HBase master and HBase regionservers nodes in the cluster, and service ports. The HBase binary packages are installed and configured on all nodes of the Biginsights cluster by default, meaning that each biginsights node after the biginsights is installed can be used as a HBase client.
Before installing biginsights, you need to run the start.sh script from the root directory after decompression, and then enter the URL in the browser http://your-server:8300/Install/open the Setup Wizard, which will lead you through the following installation steps , the installation process will include: Installation type (select Multi-node distributed cluster installation and configuration) selection, file system selection, group, username and SSH configuration, specify cluster node, component installation, security type selection, etc., below gives specific HBase and zookeeper installation configuration instance.
Configure Content configuration Information
HBase master servers specifies the node name of HBase master, either IP or hostname
HBase region servers Specifies the node name of the HBase regionserver, either IP or hostname
Zookeeper mode optional, Shared/separate zookeeper installation mode
HBase root directory advanced settings, default to/hbase (configurable)
HBase Master Port 60000 (configurable)
HBase Master UI Port 60010 (configurable)
HBase master server JMX Port 10101 (configurable)
HBase Region Server Port 60020 (configurable)
HBase Region Server UI Port 60030 (configurable)
HBase Region Server JMX Port 10102 (configurable)
Figure 3. HBase Installation and Configuration
HBase Installation and Configuration
Figure 4. Zookeeper Installation and configuration
Zookeeper Installation and configuration
Finally, when the installation is complete, click Finish or Run "start.sh shutdown" in the background command line to close the Installation wizard.
IBM's improvements and extensions to HBase
Biginsights provides the widest range of unified, IBM-specific HBase management features, including user interface and background command line management mode. This allows users to start and stop/view HBase clusters with simple interface operations or background commands, without concern for specific implementation details.
At the same time, IBM also provides a unified user interface and Add, remove node commands to support the scalability of the HBase cluster.
In addition, the implementation of the HBase Master Multi-node function provides and ensures the high availability of HBase in Biginsights. These enhancements and extensions are described below.
HBase cluster management and monitoring in IBM biginsights
Biginsights Cluster provides HBase with complete management functions, including a unified HBase user management interface, background command line management mode, HBase service status monitoring, checking, synchronizing, adding, deleting, starting, deactivating, viewing HBase services, reverse proxy UI, viewing HDFS Hbase,hbase Application Submission, etc.
Web Interface Management HBase Service status
Enter the Biginsights Web Management Console by http://the master node host name or ip:8080/data/html/index.html. Use Biginsights to install all modules including Hadoop, Hbase, zookeeper, Oozie, Flume, etc. (Note: If you are using the Biginsights Basic version, use the http://master hostname or ip:8080/biginsights to open the console.) The screenshots below are all based on the Enterprise version, and the Basic version will be slightly different. )