Distributed system Consistency, availability

Distributed system Consistency, availability _ Distributed System

Last Update:2018-08-22 Source: Internet

Author: User

Tags message queue

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Base:an Acid Alternative in partitioned databases, trading some the for consistency can leads to availability dramatic ments in scalability. Dan Pritchett, Ebay

Web applications have grown in popularity over the past decade. Whether you are building a application for end users or application developers (i.e., services), your hope is most likely That your application'll find broad adoption-and with broad adoption would come transactional. If your application relies upon persistence, then data storage would probably become. Your.

There are two strategies for scaling any application. The easiest, is vertical scaling:moving the application to larger computers. Vertical scaling works reasonably a for data but has several. The most obvious limitation is outgrowing the capacity of the largest system available. Vertical scaling is also expensive, as adding transactional capacity usually requires the next purchasing system. Vertical scaling often creates vendor lock, further adding to costs.

Horizontal scaling offers more flexibility but are also considerably more complex. Horizontal data scaling can be performed along two vectors. Functional scaling involves grouping data by function and spreading functional groups. Splitting data within functional areas across multiple, or databases sharding,1-adds to second SC Aling. The diagram in Figure 1 illustrates horizontal data-scaling strategies.

As Figure 1 illustrates, both approaches to horizontal scaling can is applied at once. Users, products, and transactions can is in separate databases. Additionally, functional area can is split across multiple databases for transactional capacity. As shown in the diagram, functional areas can be scaled independently of one another. Functional partitioning

Functional partitioning is important to achieving high degrees of scalability. Any good database architecture would decompose the schema into tables grouped by functionality. Users, products, transactions, and communication are examples of functional areas. Leveraging database concepts such as foreign keys are a common approach for maintaining consistency across Areas.

Relying on database constraints to ensure consistency across functional groups creates a coupling of the schema to a datab ASE deployment Strategy. For constraints to is applied, the tables must reside on a single database server, precluding horizontal, scaling as Transa Ction rates grow. In many cases, the easiest scale-out opportunity is moving functional the of data groups onto database discrete.

Schemas that can scale to very high transaction volumes'll place functionally distinct data on different database server S. This requires moving data constraints out of the database and into the application. This also introduces several challenges the are addressed in this later. CAP theorem

Eric Brewer, a professor at the University of California, Berkeley, and cofounder and chief scientist at Inktomi, made the Conjecture that WEB services cannot ensure all three of the following properties at once (signified by the acronym CAP): 2

Consistency. The client perceives that's a set of operations has occurred all at once.

Availability. Every operation must terminate in a intended response.

Partition tolerance. Operations would complete, even if individual components are.

Specifically, a WEB application can support, at most, only two of this properties with the any database design. Obviously, any horizontal scaling strategy are based on data partitioning; Therefore, designers are forced to decide between consistency and availability. ACID Solutions

ACID database transactions greatly simplify the job of the application developer. As signified by the acronym ACID transactions provide the following guarantees:

Atomicity. All of the operations in the transaction would complete, or none would.

Consistency. The database would be in a consistent state when the transaction begins and ends.

Isolation. The transaction would behave as if it's the only operation being performed upon the database.

Durability. Upon completion of the transaction, the operation won't be reversed.

Database vendors long ago recognized the need for partitioning databases and introduced a technique known as 2PC (Two-phas e commit) for providing ACID guarantees across multiple database instances. The protocol is broken into two Phases:first, the transaction Coordinator and asks each database involved to precommit the OP Eration and indicate whether commit is possible. If all databases agree the commit can proceed, then Phase 2 begins. The transaction Coordinator asks each database to commit the data.

If any database vetoes the commits, then all databases are asked to roll back their portions of the transaction. What is the shortcoming? We are getting consistency across partitions. If Brewer is correct, then we must being impacting availability, but how can it be?

The availability of any system are the product of the availability of the components required for operation. The last part of this statement is the most important. Components that may is used by the system but are does not reduce system required. A transaction involving two databases in a 2PC commit to have the availability of the the availability of the EAC H database. For example, if we assume each database has 99.9 percent availability, then the availability of the transaction becomes 99 .8 percent, or a additional downtime of minutes per month. An ACID alternative

If ACID provides the consistency choice for partitioned databases, then I do you achieve availability? One answer is BASE (basically available, soft state, eventually consistent).

The BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency in the end of every operation, BASE is optimistic and accepts the DA Tabase consistency would be in a state of flux. Although this sounds impossible to cope with, in reality it are quite manageable and leads to levels of the scalability of that CA Nnot is obtained with ACID.

The availability of BASE is achieved through supporting partial failures without total system failure. This is a simple example:if users are partitioned across five database servers, BASE design encourages crafting s in such a way this a user database failure impacts only the percent of the users on that particular host. There is no magic involved, but this does leads to higher perceived of the system.

So, now this you have decomposed your data into functional groups and partitioned the busiest groups across multiple ASEs, how does your incorporate BASE into your application? BASE requires a more in-depth analysis of the operations within a logical transaction than-typically to ACID. What should you are looking for? The following sections provide some direction. Consistency Patterns

Following Brewer ' s conjecture, if BASE allows for availability into a partitioned database, then opportunities to relax cons Istency have to be identified. This is often difficult because the tendency of both business stakeholders and developers are to assert so consistency is Paramount to the success of the application. Temporal inconsistency cannot is hidden from the "End user", so both engineering and product owners must is involved in pick ing the opportunities for relaxing consistency.

Figure 2 is a simple schema this illustrates consistency considerations for BASE. The user table holds user information including the total amount sold and bought. These are running totals. The transaction table holds each transaction, relating the seller and buyer of the amount. These are gross oversimplifications to real tables but contain the necessary elements for illustrating several aspects of Consistency.

In general, consistency across functional groups are easier to relax than. Within functional. The example schema has two functional groups:users and transactions. Each of the item is sold, a row are added to the transaction table and the counters for the buyer and seller are. Using an Acid-style transaction, the SQL would is as shown in Figure 3.

The total bought and sold columns at the user table can be considered a cache of the transaction table. It is present for efficiency of the system. Given This, the constraint on consistency could to be relaxed. The buyer and seller expectations can is set so their running balances don't reflect the result of a transaction immediat Ely. This isn't uncommon, and in fact people encounter this delay between a transaction and their running balance regularly (E . G., ATM withdrawals and cellphone calls).

How the SQL statements are modified to relax consistency depends upon how the running balances. are defined. If They are simply estimates, meaning that some transactions can is missed, the changes are quite simple, as shown in Figu Re 4.

We ' ve now decoupled the updates to the user and transaction tables. Consistency between the tables are not guaranteed. In fact, a failure between the "I" and second transaction would result in the user table being permanently inconsistent, The But if the contract stipulates that running totals are, this may is estimates.

What If estimates are not acceptable, though? How can you still decouple the user and transaction updates? Introducing a persistent Message queue solves the problem. There are several choices for implementing persistent messages. The most critical factor in implementing the queue, however, is ensuring this backing persistence is on the same Rce as the database. This is necessary to allow the \ transactionally committed without involving a 2PC. Now the SQL operations look a bit different, as shown in Figure 5.

This example takes some liberties with syntax and oversimplifying the logic to illustrate the concept. By queuing a persistent message within the same transaction as the INSERT, the information needed to update the running BA Lances on the user has been captured. The transaction is contained in a single database instance and therefore'll not impact system availability.

A separate message-processing component would dequeue each and apply the information to the user table. The example appears to solve all of the issues, but there is a problem. The message persistence are on the transaction host to avoid a 2PC during queuing. If the message is dequeued inside a transaction involving the user host, we still have a 2PC situation.

One solution to the 2PC in the Message-processing component be to does nothing. By decoupling the update into a separate Back-end component, preserve the availability of your customer-facing Nt. The lower availability of the message processor may is acceptable for business requirements.

Suppose, however, that 2PC are simply never acceptable in your system. How can I problem be solved? Need to understand the concept of idempotence. An operation was considered idempotent if it can be applied one time or multiple times with the same result. Idempotent operations are useful in-they permit partial failures, as applying them repeatedly not change the fin Al State of the system.

The selected example is problematic when looking for idempotence. Update operations are rarely idempotent. The example increments balance columns in place. Applying this operation more than once obviously would result in a incorrect balance. Even update operations that simply set a value, however, are not idempotent with regard to order of operations. If the system cannot guarantee that updates is applied in the order they are, the final state of the system would be incorrect. More on this later.

In the case of balance updates, need a way to track which updates have been applied successfully and which are still O Utstanding. One technique is to use a table which records the transaction identifiers that have been.

The table shown in Figure 6 tracks transaction ID which balance has been updated, and the user ID where the balance W As applied. Now we sample pseudocode is as shown in Figure 7.

This is example depends upon being able to peek a message in the queue and remove it once successfully. This can is done with two independent transactions if necessary:one in the message queue and one on the user database. Queue operations are not committed unless database operations commit. The algorithm now supports partial failures and still provides transactional guarantees-without to 2PC.

There is a simpler technique for assuring idempotent updates if the to concern is ordering. Let's change our sample schema just a bit to illustrate the challenge and the solution (S. Figure 8). Suppose you also want to track the last date of sale and purchase for the user. You can rely on a similar scheme of updating the date with a and but there is one problem.

Suppose two purchases occur within a short time window, and we message system doesn ' t ensure ordered. You are have a situation where, depending upon which order the messages are processed in, you'll have an incorrect value For Last_purchase. Fortunately, this kind of update can is handled with a minor modification to the SQL, as illustrated in Figure 9.

By simply not allowing the last_purchase time to go backward at time, you have made the update operations order Independen T. You can also the approach to protect any update from Out-of-order updates. As a alternative to using time, you can also try a monotonically increasing transaction ID. Ordering of message queues

A Short side the ordered message delivery is relevant. Message systems offer the ability to ensure this messages are delivered in the order they are. This can are expensive to support and are often unnecessary, and, in fact, the at times gives a false sense the security.

The examples provided here illustrate how to ordering can be relaxed and still provide a consistent view of the Datab ASE, eventually. The overhead required to relax the ordering are nominal and in most cases are significantly less than enforcing ordering in The message system.

Further, a WEB application is semantically a event-driven system regardless of the style of interaction. The client requests arrive to the system in arbitrary order. Processing time required per request varies. Request scheduling throughout the components of the systems is nondeterministic, resulting in nondeterministic queuing of Messages. Requiring is preserved gives a false sense of security. The simple reality are that nondeterministic inputs'll leads to nondeterministic outputs. Soft state/eventually Consistent

Up to this point, the focus has been on trading consistency for availability. The other side of the "coin is understanding the influence" soft state and eventual consistency has on application Gn.

As software engineers we tend to look at our systems as closed loops. We are about the predictability of their behavior in terms of predictable inputs producing predictable outputs. This is a necessity for creating correct software systems. The good news in many cases are that using BASE doesn ' t change the predictability of a system as a closed loop, but it does Require looking at the behavior of total.

A Simple example can help illustrate the point. Consider a system where users can transfer assets to the other users. The type of asset is irrelevant-it could are or objects in a game. For this example, we'll assume that we have decoupled the two operations of taking the asset from one user and giving it To the "other" with a message queue used to provide the decoupling.

Immediately, this system feels nondeterministic and problematic. There is a period to the asset has left one user and has to the other. The size of this time window can is determined by the messaging system design. Regardless, there is a lag between the "Begin and end" states where neither user appears to have the asset.

If We consider this from the user's perspective, however, this lag may is relevant or even known. Neither the receiving user nor the sending user may know when the asset arrived. If the lag between sending and receiving is a few seconds, it would be invisible or certainly tolerable to users who are Di rectly communicating about the asset transfer. In this situation the system behavior are considered consistent and acceptable to the users, even though we are relying n Soft state and eventual consistency in the implementation. Event-driven Architecture

What If you did need to know when the state has become consistent? You may have algorithms this need to being applied to the state but only when it has reached a consistent state relevant to a N Incoming request. The simple approach are to rely on the events that are generated as state becomes consistent.

Continuing with the previous example, what if you need to notify the user of that asset has? Creating an event within the transaction which commits the asset to the receiving user provides a mechanism for performing Further processing once a known state has been. EDA (Event-driven architecture) can provide dramatic improvements in scalability and architectural. Further discussion about the application of EDA is beyond the scope of this article. Conclusion

Scaling systems to dramatic transaction rates requires a new way of about thinking. The traditional transactional models are problematic when loads need to be spread across a large number of components. Decoupling the operations and performing them in turn provides to improved and availability at the cost of scale Ncy. BASE provides a model for thinking about this decoupling.
Q References http://highscalability.com/unorthodox-approach-database-design-coming-shard. http:// Citeseer.ist.psu.edu/544596.html.

DAN PRITCHETT is a technical fellow in EBay where he has been A's the architecture team for the past four. In this role, his interfaces with the strategy, business, product, and technology teams across EBay marketplaces, PayPal, a nd Skype. With further than years of experience at technology companies such as Sun Microsystems, Hewlett-Packard, and Silicon Graph ICS, Pritchett has a depth of technical experience, ranging from network-level protocols and operating to systems Design and software patterns. He has a B.S-computer science from the University of Missouri, Rolla.

from:http://queue.acm.org/detail.cfm?id=1394128

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More