Hypertable application Practice: a shoulder to HBase

Last Update:2018-07-23 Source: Internet

Author: User

Tags hypertable server memory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hypertable is an open source, high-performance, scalable database with a similar model to Google's bigtable. BigTable allows users to organize massive amounts of data through a number of primary keys and to implement efficient queries. Hypertable and HBase are the two open source implementations of BigTable: HBase mainly uses the Java language, while hypertable uses boost C + + and also differs in some detail design concepts.

The hypertable system consists mainly of the hyperspace, master, and range server components (shown in Figure 1). Hyperspace is a lock service, the equivalent of Google's chubby, mainly used for synchronization, detection of node failure and storage of top-level location information; Master is mainly used to complete task assignments, future load balancing and post-disaster reconstruction (Range Other functions, such as server invalidation, and so on; Range server is a hypertable real worker who is primarily responsible for providing services to data in a Range, in addition to its responsibility for post-disaster reconstruction, which is to replay local logs to restore their own state before they fail; There are also client-side components that access hypertable.

Fig. 1 schematic diagram of hypertable original architecture

Business Applications

Facebook introduced three applications based on Hadoop/hbase at the SIGMOD 2011 Meeting: Titan (Facebook Messages), Puma (Facebook Insights) and ODS (Facebook Internal Metrics). Titan is primarily used for user data storage, Puma for MapReduce distributed computing, and ODS for storing company internal monitoring data, and Facebook is based on HBase applications similar to several of the country's largest internet companies.

Similar to ODS, for some hardware or software operating data, we will save the monitoring data to the database for software engineers or operational engineers to query. The query may be in large quantities, or it may be an individual entry, a deferred query, or an instant query. The requirements for this type of business are summarized below. Requirements storage capacity is very large, often reached 10~100tb,10 billion ~ 10 billion records. The need to support automatic expansion, because the data growth model is not easy to estimate, there may be a short period of explosive growth. Write throughput pressure is greater, more than 10,000 times per second of the insertion. Recent import data can be quickly retrieved. It is necessary to support scanning for large amounts of early data, such as periodic checks or rollbacks.

One option here is to use traditional DBMS (such as MySQL). But it has the following drawbacks: first MySQL single storage has the upper limit, Generally more than 1.5GB performance will have fluctuations; however, even if the MySQL support disassembly, is not completely distributed, due to the size of the table, for irregular data growth patterns, distributed MySQL can not be a good response, if the jitter frequency is large, need to introduce more manual operation for data migration; Also, MySQL does not support tables Schema dynamic change. Another option is to use Hadoop. However, MapReduce is not a real-time computation, and HDFs does not support random writes, and random read performance is poor.

In summary, we choose the bigtable type of system to support the business requirements, that is, the way to use Hypertable+hadoop (as shown in Figure 2).

Fig. 2 schematic diagram of monitoring data collection and query

High-availability Improvements

Meta-data centralization

Challenge: In hypertable or other similar bigtable systems, metadata generally uses a two-level B + tree structure, which is largely for the sake of scale: This structure can theoretically support the storage and indexing of 2EB of user data. To index so many user data, the required metadata is as high as 16TB, a machine is not stored, so in the class bigtable system, the metadata is distributed on different nodes for management, any node in the cluster may contain either user range or meta data range.

While this approach solves the problem of scale, there are some management challenges, especially when you are recovering from a user table that requires reading the metadata during range recovery, you must first restore the range in the metadata table, and then restore the range in the user table. If multiple range servers fail simultaneously, this cross node dependency is difficult to handle, and other maintenance operations have similar problems. In addition, because a metadata actually covers a 200MB range, any failure of a range server containing metadata can cause a large number of data that is covered by this part of metadata to be inaccessible. Distributing the metadata to several different range servers is tantamount to adding a lot of single points to the system and reducing system reliability.

Solution: In the simple principle, we believe that the separation of metadata from user data and on a dedicated meta Range server is more operational. The only disadvantage of metadata centralization is that because of the Meta Range server memory limit, the metadata available to 32GB physical memory can theoretically support only petabytes of user data. However, considering the size of the machine can be accommodated in the general room, PB-scale data can meet the needs of most companies.

Figure 3 Hypertable High Availability Improvement Architecture diagram

Figure 3 shows the overall structure of the hypertable metadata centralized management. The current implementation divides the data server (Range server) in Hypertable into two categories: Meta range Server and user range server. The Meta range server only manages the Range,user range server for the root table and the metadata table to manage only the range of the user table. Because Master has a lighter load, the Meta Range server is typically placed on the same node as master.

When the system starts, each range server learns its own type from the configuration file and reports its own type when registering. Master records information for each range server. When master needs to assign a range to a range server (for example, table creation and range splitting), the appropriate range server is selected based on the type of table in which the range is located, and the metadata range is assigned to the Meta Range server. User range is assigned to the user range Server.

Separation of data from log storage

Challenge: When some range server in the Hypertable cluster fails (the range server process fails out), you need to restart the range server and restore the service, depending on the range Action logs for server records (Commitlog and Splitlog, etc.). One of the most important functions of the bigtable system (hypertable/hbase) is automatic recovery, where automatic recovery relies on the action log (Commit log) to actually write to HDFs (Sync), which, after the failure occurs, builds the consistency state of the system by replaying the log.

The Hadoop 0.18 version does not yet support the Append sync feature when we use Hypertable and Hadoop systems earlier. Even if the current version of Hadoop supports the Append sync feature, frequent use of sync can also affect the system's write-throughput capabilities. In addition, the stability of Hadoop was not guaranteed at the time, and there was a write failure. If there is a problem with Hadoop, the data that hypertable just written may be lost. If it is a log, then the system state cannot be restored when the reboot occurs.

Solution: In general, the storage of the hypertable system

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More