[Translation] methods beyond distributed transactions-a rebellious point of view

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Methods beyond distributed transactions-a rebellious point of view

Pat Helland
Amazon. Com
705 th Ave South Seattle, WA 98104 USA
PHelland at Amazon.com address: Invalid Richie http://www.cnblogs.com/riccclicense Note: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5 ).
You may copy, distriative, display, and perform the work, make derivative works and make your cial use of the work, but you must attribute the work to the author and CIDR 2007.
3rd bienrix Conference on Innovative DataSystems Research (CIDR)
January 7-10, Asilomar, California USA.

The position expressed in this article only represents my personal opinion and does not reflect my company's position on Amazon.com in any form.

ABSTRACT
Distributed transactions have been studied for decades, such as protocols such as 2 PC and Paxos, and other representative implementation methods, these protocols provide an application programmer with a global serializability programming look. In my personal career, I strongly advocate the implementation and use of this platform to provide global serialization capabilities.
My experience in the past decade has made me feel that such a platform is like a maqino line of defense. Generally, application developers do not use distributed transactions to implement highly scalable applications. When they try to use distributed transactions, the project founder rejects such a solution due to performance costs and instability, in this case, you naturally choose to break into the project ). In addition, applications built using different technologies do not have the same thing guarantee mechanism, but they all meet business needs.
This article explores and names some practical methods for implementing highly scalable, mission-critical applications without using distributed transactions. The management of fine-grained application data blocks is discussed in this article. As applications grow, they may need to be re-partitioned, in addition, the design mode for sending messages between these data blocks that can be repartitioned is also discussed.
This discussion was initiated to raise awareness of the new design model for two reasons. First, I believe that these understandings can simplify those who are working on highly scalable applications. Secondly, by understanding these models, we hope the industry can build such a platform to simplify the construction of these large applications.

1. INTRODUCTION to INTRODUCTION
Let's take a look at the purpose of this article, some assumptions I made during the discussion, and the inferences drawn from these assumptions. Although I am also very interested in high availability, this article will ignore this aspect and focus only on scalability, especially for scenarios where large-scale distributed transactions cannot be used.

Target Goals
This article has three main purposes:
• Discuss Scalable Application Discuss Scalable Applications
Designers who build large systems know a lot about designing Scalable systems. The problem is that they do not have a clear understanding of the problems, concepts, and summaries of these transaction interactions and scalable systems, improper use sometimes bite us. One of the purposes of this article is to initiate a discussion, enhance understanding of these concepts, and hope to promote a series of consensus and consistent implementation solutions.
This article attempts to summarize and formalize the years of experience in implementing Scalable systems.
• Consider the Infinite Scaling Application Think about Almost-Infinite Scaling of Applications
In order to draw a picture of this scaling discussion, the article proposes an informal thinking solution (thought experiment) to achieve unlimited scaling ). I assume that customers, purchased entities (purchasable entities), orders, shipments, health care patients, taxpayers, bank accounts, and all other business concepts used in applications, the number of data blocks increases rapidly over time, but the size of a single data block does not grow too large. We only get more and more data blocks. The first overload of computer resources is really not important. The increase in demand only prompted us to start using a small number of machines instead of using a large number of machines. This is the idea of unlimited scaling implementation that will be discussed later)
Unlimited scaling is a casual, vague, and general statement. When you don't know when a machine is sufficient, and if you are not sure whether a machine is sufficient, what should you do, this makes our needs very clear (meaning linear scaling is a professional term, and it doesn't sound as intuitive as unlimited scaling ). To be accurate, we want to linearly scale the load (including data and computing scale almost linearly ).
• Describe a Few Common Patterns for Scalable Apps
What is the impact of unlimited scaling on business logic? I suppose we need to use a new concept called "entity" in the program to implement scaling. An object is located on a single machine at a time. An application can operate only one object at a time. The conclusion of unlimited scaling is that this programming concept must be exposed to developers of application logic.
I propose and discuss this concept that has not yet been officially named. I hope we can reach an agreement on the program implementation method and an understanding of the problems involved in building a scalable system.
In addition, the use of entities involves the messaging mode of connected entities. When application developers try to build scalable solutions for business problems, they must handle the consistency problem of message distribution. We create a state machine to deal with these aspects.

Suppose Assumptions
Starting from the three assumptions, they are just unproven assumptions, and we think they are correct based on experience.
• Application layering and scaling-independent Layers of the Application and Scale-Agnosticism
We assume that each Scalable Application has at least two layers. They have different understanding of the Scaling Mechanism. There may be other differences, but they are irrelevant to this discussion.
The lower layer of an application knows that many machines are combined to make the system scalable. In addition to other work, they manage the upper layer) code ing to specific machines and locations. The bottom layer is scaling-related (scale-aware), that is, they understand this ing relationship. We assume that the underlying layer provides a scale-agnostic programming abstraction for the upper layer, and you do not need to consider the scaling issue when writing the upper-Layer Code of the application. By following the scaling-independent programming abstraction, we can write application code without worrying about deploying the application in an unprecedented load growth environment.

Over time, the underlying layers of these applications may evolve into new platforms or middleware, simplify the creation of scaling-independent applications (similar to the previous development of CICS and TP-Monitor to simplify the creation of batch mode terminal applications ).
The focus of this discussion is the possibility of new scaling-independent APIs.
• Transaction serialization scope Scopes of Transactional Serializability
A lot of theoretical work has been done to provide transaction serialization for distributed systems. For example, 2 PC (two-phase commit) is prone to blocking when a node is unavailable, while other protocols such as Paxos algorithm, it is not blocked when the node fails.
These algorithms provide global transactional serializability, which aims to provide strict atomic update operations for data distributed on a series of machines, allows updates across a series of machines in a single serialization range.
Let's imagine what would happen if we didn't use distributed transactions. Real system developers, as well as the existing deployed systems, seldom use transaction serialization across machines, even on machines that are closely linked in a small scope and used as clusters. In short, in addition to being closely linked, it can be seen as a simple scenario of a cluster. We will not use transactions across machines.
Instead, we use multiple separated transaction serialization ranges (multiple disjoint scopes of transactional serializability ). Think of each machine as an independent transaction serialization range. Each data item is located in one machine or group. atomic transactions may involve this transaction sequence range (single machine or cluster) but it is impossible to execute atomic transactions across the serialization scope of these separated transactions, which is also the reason for their separation.
• Most Applications Use the "At Least Once" message mode Most Applications Use "At-Least-Once" Messaging
The TCP-IP is very good for a short-lived process like a Unix form (the connection setup process involves querying and responding), but let's take a look at the next message that needs to be processed, problems faced by application developers who modify persistent data on disks (in SQL databases or other persistent warehouses. The message does not respond immediately after receiving the message. You need to wait for the database to complete processing. If it fails, the process needs to start again and process the message again.
Note: In this scenario, the message sender determines that the processing failed and needs to be re-processed (retry). Of course, the message recipient can send the message that failed processing to the sender, however, such as timeout and occasional interruption of network packet loss, the sender cannot be identified.
The cause of the problem is that message distribution is not directly integrated with updates to persistent data, and there are applications in the middle. Although it is possible to combine message processing with persistent data updates, it is usually not possible. The lack of such combination may lead to the emergence of error windows when messages are repeatedly distributed, because other resources may cause accidental loss of messages ("at most once, at-most-once.
The result of messaging plumbing is that the application must be able to process message retries and the sequent message arrival, this article discusses some application models that business logic developers can use to handle this burden in large-scale unlimited scaling applications.

Opinions to Be Justified
The advantage of writing a position paper is that you can express crazy ideas, which will be further discussed in the following sections.
• Scalable applications Use the unique identifier "entity" Scalable Apps Use Uniquely Identified "Entities"
This article will discuss that the upper-Layer Code of each application must operate on a single data set called "entity. There is no limit on the object size, but it must be in a single serialization range (for example, one machine or one cluster ).
Each object has a unique identifier or key value. A key value can be in any form, but must uniquely identify an object and the data contained in the object.

There is no limit on the entity's representation. It can be SQL records, XML documents, files, and data contained in the file system, such as binary data blocks (blobs ), or any other presentation that is convenient and suitable for the application. One possible manifestation is the SQL record set (which may be located in many tables). Their primary keys start with the object key value.
The entity represents the separated data series (disjoint sets of data). Each data item (datum) is only located in one entity, and the data in the entity is never overlapped with the data of other entities ).
An application contains many entities. For example, an "order processing" program has many orders. Each order is identified by a unique order ID. In order to become a scalable "order processing" program, single Order data must be separated from other order data.
• Atomic Transactions Cannot Span across entity Atomic Transactions Cannot Span Entities
Next we will discuss why atomic transactions cannot span entities. Programmers must make it clear that each thing only acts on data in a single entity. This restriction applies to different entities in the same application and entities in different applications.
From the programmer's point of view, the entity with the unique identifier represents the serialization scope, which has a huge impact on Scaling design application behavior. One of the inferences we will discuss is that the infinite scaling design cannot ensure transaction consistency of the secondary index (alternate indices.
• The message is sent to the entity Messages Are Addressed to Entities
Most messaging systems do not consider the partition key of data, but send messages to a queue for processing by a stateless process.
The common practice is to include some data in a message, that is, the object key value mentioned above, to notify stateless applications where to obtain the required data, entity Data is taken from databases or other persistent warehouses.
There are many interesting trends in the industry. First, the size of the sets of entities in the application program has grown to be unable to be stored in a data warehouse. Each entity is stored in a data warehouse, but the entire entity series is not necessarily, stateless applications obtain entities based on certain partition descriptions (partitioning scheme. Second, partition information is separated to the bottom layer of the application, specifically separated from the upper layer of the application responsible for business logic.
This greatly facilitates the use of the entity key value to identify the Message destination. Stateless processes in the Unix style and the underlying layers of applications are the implementation of business logic scaling-independent APIs, the business logic unrelated to upper-layer scaling only sends messages based on the object key value. The object key value identifies the Persistence State, that is, the object.
• Entity Management each Partner's status (activity) Entities Manage Per-Partner State ("Activities ")
A message unrelated to scaling is a message from an entity to an entity. A message is sent to another entity by the sender entity (which reflects the Persistence State and is identified by the entity key value, the receiver entity contains both upper-layer (scaling-independent) business logic and object key-value access, reflecting the persistent data in its status.
We assume that messages are distributed at least once, which means that the recipient entity must be able to ignore unnecessary and invalid messages. In fact, messages are divided into two types, which will affect the entity status of the recipient and will not affect the message. Messages that do not change the physical status are easy to process. They are generally idempotent ).
To ensure idempotence (idempotence, for example, to ensure that messages that are retried do not have side effects), the entity receiver is usually designed to let them remember messages that have been processed. After this is done, duplicate messages generally generate a new response (that is, the returned message), which is the same as the message results that have been processed earlier.
The status created based on the received message is encapsulated based on each partner. The key idea here is to organize the status according to the partner, and the partner is also an entity.
We use the term "activity" to indicate the state, which manages messages of each party between two-party relationship. Each "activity" is located in one entity, and each entity has an "activity" for each partner that sends messages to them ".
In addition to message confusion, activities also manage loosely-coupled protocols. In this place where atomic transactions cannot be used, the result of a trial operation negotiation is usually used, which is managed between entities by activities.
This article does not assume that the activity can solve all kinds of known difficulties to reach the protocol as described in the workflow discussion. However, we have pointed out that infinite scaling will bring about a fine-grained workflow-style solution. The participants are entities, and each entity uses specific knowledge related to other entities to manage its own workflows, the knowledge of both parties maintained within the entity is what we call activity.
Examples of activities are sometimes confusing. The Order application sends a message to the shipping application, which contains the shipping ID and order ID. The message type will trigger the status change in the shipping application and record the relevant order as the status awaiting delivery. Generally, the implementer does not design a message retry until a bug occurs. In rare cases, the Application Designer considers and intends to design the activity.
The next part of this article will deeply analyze these assumptions, propose ideas, and elaborate on these ideas.

2. entity ENTITIES
This section explores the essence of entities in depth. First, we need to ensure that the atomic transaction is located in a single entity. Next we will discuss how to use a unique key value to access the entity and how to make the application bottom layer (scaling-related part) in the case of re-partitioning) relocates the entity, discusses what can be accessed in a single atomic transaction, and finally explores some implicit conclusions about secondary indexes in infinite scaling.

Disjoint Scopes of Serializability
An object is defined as a data set with a unique key value, which is located in a single serialization range. Because the entity is within a single serialization range, we are sure that atomic transactions can always be used in an entity.
For this reason, we use the name "entity" instead of "object. Objects can share the transaction scope, but they will never be between entities, because the re-partitioning may place them on different machines.

Unique entity Uniquely Keyed Entities
The upper-layer code of an application is usually designed around a data set with unique key values. We can see the customer ID, social security number, and product Inventory Unit (SKUs, it is the unique identifier of the product in the inventory system), and other unique identifiers that are everywhere in the application. They are used as key values to find the data processed by the application. This is a common situation. In reality, we find that the separated serialization range (for example, "entity") is always identified using a unique key value.

Repartitioning and Entities
One of our assumptions is that the increasing upper layer is unrelated to scaling. When scaling demands change, the bottom layer determines how to deploy them. This means that the position of the entity in the deployment process may change, and the upper layer of the application cannot make any assumptions about the entity location, otherwise it will not be unrelated to scaling.

Atomic Transactions and Entities
Update transactions cannot be used across different entities in a scalable system. Each entity has a unique key value, which can be easily put into a serialization range, how can I confirm that multiple independent entities are in the same serialization range (so the update transaction can be automatically completed )? Only when these entities all have the same key value can they be, then they are already an entity!
If hash is used to partition object key values, nothing can indicate that objects with different key values will fall into the same hash bucket ). If you partition object key values using the key value range, most of the time the same key value will be located on the same machine, but unfortunately sometimes the adjacent key value will be located on different machines. A simple test case for atomic testing of partitions in the key value range using adjacent key values may pass in the test deployment environment, but the entity will be moved in different serialization ranges when it is re-deployed later, there will be latent bugs, because (beyond the serialization range, the annotation) update is no longer atomic. Never expect different entity key values to be located in the same place.
In short, the underlying application ensures that each entity key value (and entity) is located on a single machine (or cluster), while different entities may be distributed anywhere.
A scaling-independent design must have an entity concept as an atomic boundary. The perception of entity existence as a design abstraction, the use of entity key values, and the lack of Atomicity in explicit declarations across entities, these are the key to providing scaling-independent upper layers for applications.
This is undoubtedly the case for highly scalable applications in the industry. We just don't have a formal name for the concept of entity. For upper-layer applications, it must be clear that the entity is the scope of serialization, and further assumptions will be broken when the deployment changes.

Secondary index Considering Alternate Indices
We often use multiple key values or indexes to search for data. For example, we sometimes use social security numbers to reference customers, sometimes use credit card numbers, and sometimes use street addresses. In extreme cases, these indexes cannot be located on the same machine or a large cluster. In this case, the data related to one customer (such as index data) cannot be stored in a single serialization range! The entity itself is in a single serialization range. The trouble is that data copies that constitute secondary indexes must be considered to be in different serialization ranges!
Consider the assumption that secondary indexes are in the same serialization range. When infinite scaling is required and the object series (set of entities) are distributed on a large number of machines, the primary index and secondary index data must be in the same serialization range. How can we ensure this? The only method is to use the primary index to search for secondary indexes, through dynamic search), so that they can be in the same serialization range! When no primary index key value exists but must be searched throughout the serialization range, each secondary index search must check an infinite number of serialization ranges to match the secondary index key value! This is always not feasible.

The only alternative solution is to use two-step search. First, find the secondary index to obtain the object key value, and then use the object key value to access the object, you can use the secondary index to find the key value of the primary index ). This is very similar to the two-step access record for secondary indexes in relational databases, but the premise of unlimited scaling is that the two indexes (primary and secondary indexes) are not in the same serialization range.

Scaling-independent applications cannot automatically update the entity and its secondary indexes! Applications unrelated to upper-layer scaling must be designed to handle situations where secondary indexes may be out of sync with entities accessed using primary indexes (such as object key values.

Secondary indexes that can be automatically maintained in the past must be manually maintained by the application. This is also true for workflows that use asynchronous messages for updates. applications that require unlimited scaling can process the indexes themselves. In the past, when secondary indexes were used to read data, it was necessary to know that they may be out of sync with the main presentation window of the object. Therefore, the functions implemented based on secondary indexes are now troublesome. This is the real life in the cruel world of large systems.

3. Message Communication across entities (MESSAGING between ss entities)
In this section, we will discuss how to use messages to connect different entities, including transactions and messages. Let's take a look at the semantics of message distribution, we also discuss the impact of physical location re-partitioning on message distribution.

Message Communication to Communicate entity SS Entities across Entities
If you cannot update data across two entities in the same transaction, you need a mechanism to complete it in different transactions. We use messages to connect these entities.

Asynchronous with Respect to Sending Transactions
A message spans entities. A message to be sent is located in one entity, and a message terminal is another entity. According to the definition of entities, we must be clear that they cannot be completed automatically ).
By sending messages, application developers may be very complicated to use transactions, send messages out, and the transactions may be interrupted. You may disagree with this, but it may indeed happen. For these reasons, the transaction message must be queued.

If the destination feedback cannot be received immediately after the sending transactions transaction is committed, we can see that the message relative to the sending transactions transaction is asynchronous. An object is converted to a new state in a transaction, and a message is a trigger. A message is sent from a transaction to another entity and a new transaction is triggered.

Determine the message terminal Naming the Destination of Messages
When developing a scaling-independent part of an application, one entity needs to send messages to another entity. The scaling-independent code does not know the location of the target entity, that is, the entity key value. This is handled by the scaling-related part of the application, which associates the object key value with the object location.

Repartitioning and Message Delivery
When a message is sent for a Part unrelated to application scaling, the underlying scaling part captures the target address and distributes the message at least once.
The system moves objects during scaling, which is usually called re-partitioning. The location of the object data, that is, the destination of the message, may change. Sometimes the message is still sent to the old address, but it is found that the object has been moved to another place. In this case, the message needs to be routed.
The mobile entity sometimes interrupts the first-in-first-out queue path between the sender and the destination, and the message is retried. Later, the message will arrive earlier than the previous one, and the world will become more chaotic.
For these reasons, we can see that scaling-independent applications support idempotent processing of messages visible to applications, which also means re-subscription (reorder) message distribution.

4. ENTITIES, SOA, AND object ENTITIES, SOA, AND OBJECTS
This section compares the viewpoint of this article with the object-oriented and service-oriented viewpoint.

Entity and Object instance Entities and Object Instances
Some may ask: "What is the difference between an object and an object instance? ", The answer is not as clear as black and white. Objects have many forms, some of which are entities, and some are not objects. There are two important prerequisites for an object to become an object.
First, the data encapsulated by the object must be strictly separated from other data, and the data to be separated cannot be automatically updated together with other data.
Some object systems use multi-encapsulation (ambiguous encapsulation) for database data. In some respects, this is neither fragile nor recommendable, but these objects are not the entity defined in this article. Sometimes the materialized views and secondary indexes are used. When the system needs to scale and your objects are not entities, they will no longer be used.
Many object systems allow transactions to span objects. This ease of development avoids many of the difficulties mentioned in this article. Unfortunately, it does not apply to unlimited scaling, unless these transaction coupling objects are deployed together. Assign them a common key value to ensure that they are deployed together, so that the two transaction coupling objects become part of the same entity!
Objects are good, but they belong to different concepts.

Message and Method Comparison Messages versus Methods
The method call is usually synchronized with the call thread, so it is also synchronized with the transaction of the call object. However, the called object and the called object may not be automatically combined. A normal method call does not record the messages to be processed, the call message is not followed at least once. Some systems encapsulate messages into method calls. I think this is a message rather than a method.
We do not clearly distinguish between the grouping and Binding. Although they are usually used to distinguish between messages and method calls, we simply point out that asynchronous communication is required on the transaction boundary, this is not common in method calls.

Entity and Service-Oriented architecture Entities and Service Oriented ubuntures
All the content discussed in this article supports SOA. Most SOA implementations adopt independent transaction scopes between services.
Here, the major enhancement to SOA (enhancement) is that each service itself may need to handle unlimited scaling. The content of this article shows how to implement it. This content applies to the design between SOA services, it is also applicable to individual services designed for independent scaling.

5. Activity: Process disordered messages activities: COPING WITH MESSY MESSAGES
This topic addresses the difficulties of retry and reorder, we have introduced the concept of activity as a necessary relationship for local information management collaboration partners ).

Retry and Idempotence Retries and Idempotence
Because any previously sent message may be distributed multiple times, we need a mechanism to process duplicate messages in the application. Although we can build an underlying layer that supports message deduplication, in an infinite scaling environment, this underlying support requires an understanding of entities. messages sent to entities must follow the transfer when moving entities in a repartition. In practice, the underlying management of this situation is rarely used, so messages may be distributed multiple times.
Generally, some mechanisms must be implemented to ensure that the received messages are idempotent. This is not essential to the nature of the problem. Of course, it can also be solved by building a mechanism to eliminate duplicates in the relevant part of application scaling. However, there are no such applications at present, so we will discuss the methods that must be used by poor scaling-independent app developers ).

Define Idempotence of essential behaviors Defining Idempotence of Substantive Behavior
If the subsequent repeated execution of Message Processing does not bring essential changes to the entity, the message processing is idempotent. This is not a rigorous definition. What is essential change remains to be determined by the application.
If the message does not change the calling entity but only reads information, the message processing is idempotent. Even if a log record describing the read operation is written, it is considered to be idempotent, because the log record does not affect entity behavior in essence. The essential definition is application-related.

Natural Idempotence
Messages that do not cause essential side effects are the key to implementing idempotence. Some messages do not affect the essence at any time. They are natural idempotence.
Messages that read data only from entities are idempotent. If Message Processing does change the entity but does not affect the essence, it is also a natural power.
The next step is more troublesome. Some messages bring about essential changes, so they are not idempotent, and applications must introduce some mechanisms to ensure that these messages are idempotent, this means to record the processed messages in some way, so that subsequent repeated calls will not cause essential changes.
This is what we will discuss next about non-natural idempotent message processing.

Record the message as status Remembering Messages as State
To ensure the idempotent processing of non-natural idempotent messages, the entity must remember which messages have been processed. This is the status, and the status is continuously recorded as the message is processed.
In addition to recording that the message has been processed, if the message needs to be replied, the same reply content must be returned, because we cannot determine whether the original sender has received the reply.

Activity: Manage the status of Each Partner Activities: Managing State for Each Partner
To track relationships and receive messages, each entity in a scaling-independent application must record the status information of the partner in a certain way and be recorded separately for each partner, we name this status activity. If an object interacts with multiple other entities, it has multiple activities. The relationship between the Activity Tracking Entity and each partner ).

Each entity may contain a series of activities, and some data may need to span multiple activities.

Consider processing an order that contains many purchase items. The reserved inventory for each shipment of each purchase item will be an independent activity. The corresponding order in the warehouse and each purchase item will maintain an independent entity, it cannot be assumed that transactions can span these entities.
Each inventory item in the order will be maintained independently, so the Message Protocol must also be managed independently, and each inventory item data in the order entity is an activity. Although not, this mode is widely used in highly scalable applications.

In an unlimited Scaling Application, you must be very clear about these relationships, because you cannot simply look at them and describe how they are associated. Anything must be effectively combined using a network of two parties. The combination element (knitting) is the object key value. Because the partner is far away from each other, you must manage the information you have learned as new knowledge when visiting the partner. This allows you to understand the local information of distant partners as an activity.

Ensuring At-Most-Once Acceptance via Activities
The processing of messages such as non-natural power must be ensured at most once (for example, the essential impact of a message is only once ). To achieve this goal, there must be a unique mechanism to ensure that messages are not processed repeatedly.
The entity must persistently record the converted messages waiting for processing to the status, so that repeated message processing will not have an essential impact.
It is important that an entity manages the status based on the activities used by each partner, because sometimes the entity has many different collaboration partners and uses specific forms of messages to interact with each partner.
Programmers can focus on interaction with partners in an effective set of usage statuses of partners.
The conclusion is that it is easy to build scalable applications only when you focus on the information of each partner, for example, on a platform that implements idempotent message processing.

6. Activity: ACTIVITIES: COPING WITHOUT ATOMICITY
This section describes how a scalable system uses arbitrary decision-making methods without distributed transactions.
Managing distributed protocols is an arduous task, which is the focus of this section. In addition, in an environment such as unlimited scaling, a fine-grained design centered on each cooperative partnership must be adopted to solve uncertainty, these data are managed using the concept of activity within the entity.

Uncertainty at a Distance
The absence of distributed transactions means that uncertainty must be considered for decisions across different entities. Currently, decisions across distributed systems cannot avoid uncertainty. When distributed transactions are used, these uncertainties occur in the Data lock and are managed by the Transaction Manager.
Systems that cannot use distributed transactions must manage uncertainty in the business logic. business semantics rather than record lock controls the impact of uncertainty. This is the workflow. There is nothing to worry about, but because distributed transactions cannot be used, workflows must be used.
These factors enable us to use entities and messages, so that we understand that if scaling-independent applications need to reach agreements across multiple entities, we must use the uncertainty of workflow management on our own.

For more information, see the negotiation methods between common business activities. The commercial contract includes the commitment time, termination terms, reserved resources, and other content. Similarly, uncertainty semantics (the mechanism for solving uncertainty in Code) is also interspersed in business function behavior, which is more difficult to implement than simply using distributed transactions, but the real world is like this.
Again, here is a brief discussion of the workflow.

Activity and Uncertainty Management Activities and the Management of Uncertainty
An entity may be subject to uncertainty when interacting with other entities. Such uncertainty must be managed by each partner, that is, in the specific cooperative partner activity state.
Most of the time uncertainty is due to the relationship between entities, but it is necessary to follow the partner to track activities as each partner enters a new state.

If the order system reserves inventory from the warehouse, the warehouse will distribute the inventory but does not know whether it will be used? If the order is canceled, why is the order changed ?), This is an issue of uncertainty. The Warehouse will know this answer in the future, so that the uncertainty problem can be solved.
The inventory manager must use some entries to maintain associated data for each order. When associating an entry with an order, it organizes the data according to the entries. Each entry contains incomplete order information related to the entry, each activity in the entry (each order corresponds to one activity) manages the uncertainty of related orders.

Processing of Tentative Business Operations Ming Tentative Business Operations
In order to reach an agreement between entities, the entity must be able to allow other entities to handle uncertainty. This is achieved by sending a confirmation message request, and at the same time, it must be able to cope with cancellation, this is called a tentative operation. Each attempted operation is eventually confirmed or canceled.
When the entity permits a tentative operation, it allows other entities to determine the operation results. In this way, the entity's handling of disputes is improved when uncertainty arises, and the arrival of canceled or confirmed messages means a reduction in uncertainty. The problem of increasing or decreasing uncertainty has been solved before, and new problems will come to you again, which is quite normal in life ).
This is also a workflow, but it is a Workflow Based on the fine-grained entity design.

Uncertainty and Almost-Infinite Scaling
The interesting aspect of this unlimited scaling solution is the uncertainty of two-party agreement management. There are often multiple two-party protocols. We still use the entity key value as the connector and use the activity to track the current status of distant partners, in this way, these two-party protocols are linked into a fine-grained two-party protocol network.

Consider buying a house with a third-party entrusted company. The buyer, the seller, the Guarantee Company, and all other participants in the transaction will sign a trust agreement with the entrusting company.
When you sign a house purchase agreement, you do not know the final result. You know that you are not sure before the entrusting company closes the case. The only decision-making party is entrusting the company.
This is the hub of a two-party relationship set, used to reach an agreement among a large number of participants without using distributed transactions.

It is very interesting to consider the relationship between two parties in unlimited scaling. Based on the relationship between the two parties, the retry/cancel/confirm operation framework (like a traditional workflow) is constructed ), we can see how distributed protocols are achieved. Just like a delegated company, many entities can participate in a protocol through an organization.
Because of the two-party relationship, the simple significance of the activity is "the information of the partner I saved", which is also the basis for managing large systems. Even if the data is stored in an object, you do not know where it is located, but you must assume that it is far away, so that you can develop it in a scaling-independent way.
In the real world, unlimited scaling applications like to enjoy the convenience of the global serialization scope implemented by two-phase commit or other algorithms. Unfortunately, it will lead to unacceptable availability pressure: performance load). Therefore, developers of scaling-independent applications are provided with the uncertainty of trial methods for management, such as reserved inventory, credit limit allocation, and other application-related concepts.

7. Conclusion CONCLUSIONS
The computer industry is developing. One trend of application development is the use of scaling to solve the situation where the size is no longer suitable for one machine or a series of closely integrated machines. We often see that a specific solution is applied to an application first, and then some general patterns are obtained. Building a toolset based on these general patterns makes the building of application logic easier.
In 1970s, many large-scale scaling applications struggled with the difficulty of multiplexing on online terminals when providing business solutions. Later, some terminal control modes emerged, some high-end applications have developed to TP-Monitor, and these models have also been used in TP-Monitor rewriting. These platforms enable business logic developers to focus on the areas they are good at: the development of business logic.
Today, we see that new design pressures are imposed on programmers who just want to solve business problems. The reality is to bring them into the world of unlimited scaling, it forces them to do a lot of design problems unrelated to the real business at hand.
Unfortunately, programmers need to constantly think about scaling without using distributed transactions when solving business goals such as e-commerce, supply chain management, finance, and health care applications, because distributed transactions are fragile and have low performance, they must do so.
We are at another time, and there is already a pattern for building scalable applications, but there is no consistent application. This article discusses how these new models can be used more consistently for unlimited Scaling Application Development. In the next few years, we may see the development of middleware and platform that provides automated management for these applications, using standard development methods to end the scaling problem for the application, which is very similar to the TP-Monitor in 1970s.
In this article, we discuss and name some new models that appear in highly scalable applications.
• An object is a named (keyed-indexed) data set that can be automatically updated within the object but cannot be updated across the object.
• An activity contains a collection of entity statuses to manage message relationships for a single partner entity.
In entity activities, workflows that have been discussed for many years are used for decision making. When you look at unlimited scaling) you will be surprised to find that it is a fine-grained workflow by nature.
We have discussed the design that many applications implicitly use entities and activities, but there is no consistent use of standardization. Through discussion and consistent use of these models, we can build better highly scalable applications. As an industry, we can further build solutions, enables business logic developers to focus on business issues rather than scaling issues.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Translation] methods beyond distributed transactions-a rebellious point of view

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Translation] methods beyond distributed transactions-a rebellious point of view

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support