Transaction design strategies for MongoDB, Cassandra, and HBase

Source: Internet
Author: User
Tags cassandra install mongodb riak

Transaction design strategies for MongoDB, Cassandra, and HBase

NoSQL databases (such as MongoDB, Cassandra, Hbase, DynamoDB, and Riak) make application development easier. They provide quite flexible data models and rich data types, and are easier to install and configure than many traditional database systems. However, the lack of support for atomic transactions is a major step backwards. Daniel Abadi is an Associate Professor at Yale University and is mainly engaged in database system architecture and implementation research. Recently, he analyzed in an article why NoSQL databases do not support atomic transactions, and provided two scalable and transactional NoSQL database solutions.

Atomic Transactions allow write operations on different data items in the database at the same time. These operations are either performed in full or not all. In addition, combined with the appropriate concurrency control mechanism, atomicity can ensure that the concurrency and subsequent transactions can either see that the atomic firm has completed the write, or no one can see it. In the absence of atomic transactions, application developers need to handle a set of write operations only partially successfully.

Some people may think that NoSQL databases are relatively new and there is no time to implement atomic transaction support. In fact, Cassandra's "batch Update" feature can be seen as a small step forward in this direction. However, NoSQL databases have been around for nearly ten years, and there is clearly a deeper reason for their lack of transaction support, that is, the concern for scalability. According to the design, most NoSQL systems must be able to expand across multiple different machines, and the data in the database is distributed on different machines. Write operations in a transaction may access data in multiple partitions (on multiple machines). This is a "distributed transaction ". In distributed transactions, make sure that the machines involved in the transactions collaborate with each other. Each machine must be identified, and transactions can be successfully committed on other machines. In addition, a protocol is required to ensure that the machine involved in the transaction write operation will not fail until the data write status is stable. This collaboration process not only consumes a lot of resources, but also increases the latency of database requests. The bigger problem is that before the collaboration process is completed, other operations cannot read the data written by the transaction. The latency of concurrent transactions may lead to the overlap of other delayed transactions in terms of time, and eventually lead to the system "blocking (cloggage )". Distributed collaboration required by distributed transactions seriously affects database system performance, including transaction throughput and transaction latency. Therefore, most NoSQ systems choose not to support transactions.

MongoDB, Riak, Hbase, and Cassandra all support transaction operations with a single key. This is because all the information of a single key is stored on a single machine. Therefore, transaction operations with a single key do not involve the aforementioned complex distributed collaboration. Distributed transactions require distributed collaboration, so it seems necessary to weigh between performance scalability and distributed transaction support. In fact, many NoSQL database providers are based on this assumption that, when building scalable systems, they abandon support for Distributed atomic transactions to prevent server performance degradation.

Daniel pointed out that this is completely wrong. Scalable systems support high-performance distributed atomic transactions. They recently published a paper, proposing a new balance strategy that supports atomic transactions in a scalable system, specifically in terms of fairness, isolation, and throughput (FIT) choose between them. Among them, fairness means that the execution of any transaction will not be deliberately delayed because of other transactions, and isolation ensures that conflicting transactions can see the write operations of other transactions. A Scalable database that supports distributed atomic transactions can implement at least two of the preceding three attributes. The trade-off between FIT can generate three solutions that support distributed atomic transactions:

  • A system that ensures fairness and isolation at the expense of throughput;
  • A system that ensures fairness and throughput at the expense of isolation;
  • A system that ensures isolation and throughput at the cost of fairness.

In other words, the following two methods can be used to build a scalable system with high distributed transaction throughput.

Discard isolation

As mentioned above, the root cause of database system blocking is distributed collaboration. More specifically, if a transaction is being executed, other transactions that need to access Shared data must wait until the completion of distributed collaboration. This kind of wait is guaranteed by strong isolation because it ensures that the transaction can see the transactions that conflict with it. If isolation is abandoned, other transactions will not be able to see the operations of other transactions, so that they can be executed and committed without waiting for the completion of distributed collaboration. In addition, there is a type of database constraints that can ensure the correctness of distributed databases in the case of weak transaction isolation. For more information, see Peter Bailis's article multi-partition atomic read (RAMP).

Give up fairness

Distributed collaboration and isolation mechanisms overlap in time. Therefore, you can reset the order of distributed collaboration to minimize the time overlap between the two, thus reducing the mutual influence between the two. The system built on this gives up fairness and can choose the most appropriate time for Distributed collaboration. Daniel calls such a system a "isolation-throughput" system. For example, you can collaborate outside the transaction. The time required for collaboration does not increase the execution time of concurrent transactions.

G-Store is a good example of a "isolation-throughput" system. It supports multi-key transactions and limits the transaction scope to the key set dynamically defined by the application, that is, the KeyGroup. This key set can be created and destroyed as needed. When an application defines a KeyGroup, G-Store copies all the corresponding key-value pairs to a leading node, and all the transactions in the key set are executed on the leading node. Therefore, G-Store transactions do not need to execute the distributed commit protocol during transaction execution. The key here is that G-Store still must execute distributed collaboration, but the collaboration process is completed before the transaction is executed-before the transaction isolation needs to be considered. Once the collaboration process is completed, the transaction will be completed soon, and concurrent transactions that share data do not need to wait for Distributed collaboration. In this way, G-Store achieves high throughput and strong isolation.

Therefore, the key to achieving high-throughput distributed transactions is to separate distributed collaboration from isolation mechanisms in time according to the above method.

For more information about MongoDB, see the following links:

MongoDB 3.0 official version released and downloaded

CentOS compilation and installation of MongoDB

CentOS compilation and installation of php extensions for MongoDB and mongoDB

CentOS 6 install MongoDB and server configuration using yum

Install MongoDB2.4.3 in Ubuntu 13.04

MongoDB beginners must read (both concepts and practices)

MongoDB Installation Guide for Ubunu 14.04

MongoDB authoritative Guide (The Definitive Guide) in English [PDF]

Nagios monitoring MongoDB sharded cluster service practice

Build MongoDB Service Based on CentOS 6.5 Operating System

MongoDB details: click here
MongoDB: click here

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.