from: http://m.csdn.net/article_pt.html?arcid=2823943
Apache HBase is a database for online services that is native to the features of Hadoop, making it an obvious choice for applications that are based on the scalability and flexibility of Hadoop for data processing.
In the Hortonworks data platform (HDP http://zh.hortonworks.com/hdp/) 2.2, the high availability of hbase has evolved to ensure that the uptime of applications running on it is up to 99.99%.
This article reviews the development of the past 12 months and shows how developers can improve HBase's high availability and discuss future improvement plans.
The historical perspective of hbase high availability
High Availability (HA) is a key feature of any database and is a prerequisite for any core business application.
Previously, HBase used two strategies to ensure the availability of data:
First, HBase automatically partitions the data and publishes each partition to a different node. A node's downline or outage affects only the data on that node, and data on other nodes is unaffected.
Second, all data stored on hbase is actually stored on HDFS, and the data is backed up into 3 copies, distributed across different nodes, and can be used by any node in the cluster.
This enables hbase to automatically reassign the data hosted on the failed node to a normal node, guaranteeing high availability of data.
Using these intrinsic ha features, combined with Hadoop best practices, makes it possible for HBase-based applications to have high availability of up to 99.9%, or less than 9 hours of total downtime per year.
This applies to most applications, while higher availability guarantees are required for system core applications.
Better high-availability requirements
We are in the early stages of turning big data applications into Hadoop platform reengineering. The increasing penetration and impact of Hadoop has become an ideal choice for applications that emphasize system extensibility or data processing flexibility.
For online applications that want to benefit from Hadoop's ubiquitous, fast-moving innovations, hbase is naturally the preferred database for a member of the Hadoop ecosystem.
When we communicate with customers who want to migrate their critical business to hbase, we often receive the following feedback that customers need hbase to provide data consistency but cannot tolerate even a short downtime recovery time. To enable Hadoop to support the critical business of online applications, HBase's high-availability features need to be significantly improved.
Hortonworks with the HBase community by introducing the timeline consistent region copy technology (also known as hbase read high availability, related content reference HBASE-10070 "https://issues.apache.org/jira/browse/ HBASE-10070 "), greatly improves the high availability of hbase.
From the top, this new HA feature maintains multiple backups of the same data across both the primary and standby replicas of the HBase cluster. With HBase read high availability, if one regionserver fails, the user can still read data on the failed node from other regionserver.
That is, during automatic system recovery, the user simply loses the write availability of the node, but can still read the data for that node. HBase's read-high availability feature is an ideal choice for applications that require continuous readability and consistent read consistency.
Combining best practices, such as using dual replicas and rack-aware, hbase read high availability can make the availability of critical business applications dependent on hbase up to 99.99%.
What is timeline consistency?
On the bright side, this approach makes the implementation of data consistency very simple, with only one owner's strategy implying no brain fissures, no last-write-effective (last-write-wins), and fast and easy implementation of important functions such as counters.
On the downside, if a regionserver goes down, all the key-value ranges held by this regionserver will be taken offline until the data recovery process is complete.
In HBase 0.96, the recovery process has been optimized for less than a minute, but we have sacrificed some usability to ensure high data consistency. According to the CAP theory, we have to compromise on consistency and usability, and we don't have a perfect system that can always take into account consistency and usability.
Many modern database systems attempt to optimize availability by implementing a purely AP model, which is to abandon consistency to optimize availability. Abandoning consistency makes it possible for users of such databases to confront complex issues in distributed systems. Many times, the user of the final consistency database is more like a database developer than a database consumer.
In fact, the problem of network partitioning does not always exist, and all, there is no need to sacrifice consistency at all times to prevent occasional failures. If you are interested in this discussion and timeline consistency, read Daniel Abadi's blog "http://dbmsmusings.blogspot.com/2010/04/ Problems-with-cap-and-yahoos-little.html ".
The read-high availability of hbase implements a timeline-consistent system that provides developers with the ability to choose between a strict consistency policy or a loose consistency policy during the query phase.
Read-High availability with hbase:
- The data is held by a primary region and one or more copies of region.
- Any region, either the primary Reigon or the copy region, can respond to read requests for the above data.
- Only the primary region can handle write requests.
- The data in the replica may not be consistent with the data in the primary region, but
- All replicas receive the update request in exactly the same order.
From the client view:
- The client can specify what consistency policy to use in each request, strictly (Consistency.strong) or Loose (consistency.timeline).
- The returned result will indicate explicitly whether the data is up-to-date (that is, autonomous region) or expired (that is, from copy region).
The client can operate on this identity.
This model has several advantages:
- Guaranteed Write Consistency:
- During a system failure, the data is still readable. With dual backup and a suitable rack location configuration, HBase guarantees non-stop data readability in the event of an entire rack failure.
- Latency: Read consistency still requires only one network back and forth.
- Latency: The client can read the data randomly from all replicas and take the first returned response.
Timeline consistency takes into account the need for strong consistency and graceful demotion in the event of failure, which allows for higher availability without increasing the complexity of the final conformance system for the developer.
HBase Read High Availability: Phase 1/Stage 2
The development of HBase read availability has undergone two stages. The first phase is primarily used to validate prototypes and API semantics, while the second phase provides a version that is appropriate for the production environment. If you are concerned about the HBase read-high availability offered by HDP2.1 and are not considered unavailable because it cannot support the split/merge of such operations, HDP2.2 will provide high availability for all of the hbase operations that you expect.
Hands-on practice
HBase read availability is a feature of the HDP2.2 platform. If you are interested in improving the usability of your application, then we recommend that you try it.
Hbaes High Availability Outlook
The high availability of hbaes has been greatly improved over the past year, but there are many areas that need to be perfected. The two important issues that HBase has not solved so far are:
- Write availability in case of failure
- Read and write consistency across data centers
We are delighted to see that the HBase community has begun to address these issues and is trying to merge the Facebook-developed hydrabase into HBase. In the future, HBase will provide up to 5 9 (or 99.999%) availability while ensuring that critical business systems have strong data consistency requirements.
Original link: http://zh.hortonworks.com/blog/apache-hbase-high-availability-next-level/
Go New phase of hbase high availability