Amazon Aurora interpretation (SIGMOD 2017), using rasigmod

Last Update:2017-09-05 Source: Internet

Author: User

Tags amazon dynamodb amazon cloud services

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Amazon Aurora interpretation (SIGMOD 2017), using rasigmod

Amazon published a paper in SIGMOD 2017, Amazon Aurora: DesignConsiderations for High Throughput Cloud-Native Relational Databases. for the first time, it publicly introduced Aurora's design philosophy and internal implementation, the following is my explanation of the paper. If you have any inaccurate understanding, you are welcome to criticize and correct it.

>Summary

Aurora is a relational database service in Amazon cloud services AWS, mainly for OLTP scenarios. This article will introduce Aurora's architecture and design concepts in detail.AuroraThe basic design concept is that in the cloud environment, the biggest bottleneck of databases is no longer computing or storage resources, but networks. Therefore, based on a storage and computing separation architecture, log processing is pushed down to the distributed storage layer, solve network bottlenecks through architecture optimization.The following describes how Aurora not only reduces network resource consumption, but also enables rapid fault recovery without data loss. Next, we will introduce how Aurora achieves consistency of distributed storage nodes in asynchronous mode, at last, we will introduce Aurora's experience in the production environment.

>1. Overview

In the cloud environment, storage and computing separation is becoming more and more common as a solution to system elasticity and scalability. Broadly speaking, the file system in any database is attached with a distributed storage, that is, storage and computing separation can be considered. Separated by storage and computing, You can transparently add storage nodes, remove faulty nodes, perform failover, and expand storage space. In this context, I/O is no longer the bottleneck of the database, Because I/O pressure can be dispersed on multiple storage nodes, but the network becomes the bottleneck, because the interaction between the database instance and all storage nodes must be through the network, especially to improve the database performance, the database instance and storage node may interact in parallel, which further increases the network pressure. In traditional databases, I/O operations need to be executed synchronously. When I/O is required, this will often lead to thread context switching and affect database performance. For example, for IO read operations, when you need to access a data page, if the buffer pool does not hit, you need to perform disk IO, then the read thread needs to wait for IO to complete other operations, at the same time, this action may further lead to dirty pages. Another familiar scenario is the transaction commit operation (I/O write operation). Before a transaction is successfully committed and returned, you must wait for the corresponding log to be flushed before returning the transaction. Because the transaction is a serial commit, therefore, other transactions must wait for the transaction to be committed.Two-phase transactions in traditional databases are not suitable for Distributed cloud environments, because the two-phase commit protocol has high requirements on nodes and networks involved in the system, and has limited Fault Tolerance capabilities, this is in conflict with the features of software and hardware faults in a large-scale distributed cloud environment.

Aurora described in this article is a brand new database service in the cloud environment, which can solve the problems encountered by the above traditional databases.Based on the separated storage and computing architecture, it pushes the playback log part down to the distributed storage layer. The storage node is loosely coupled with the database instance (computing node) and includes some computing functions.Database instances in Aurora still contain most core functions, such as query processing, transactions, locks, Cache Management, access interfaces, and undo log management; however, the redo log-related functions have been pushed down to the storage layer, including log processing, fault recovery, and backup and recovery. Aurora has three major advantages over traditional databases. First, the underlying database storage is a distributed storage service that can easily cope with faults. Second, the database instance writes only redo logs to the underlying storage layer, therefore, the network pressure between database instances and storage nodes is greatly reduced, which guarantees database performance improvement. Thirdly, some core functions (fault recovery and Backup Recovery) are pushed down to the storage layer, these tasks can be executed asynchronously in the background without affecting the foreground user tasks. The following describes in detail how Aurora implements these functions, including:
1. How to ensure consistency of underlying storage based on the Quorum Model
2. How to push the redo log-related functions down to the storage layer
3. How to eliminate synchronization points and perform check points and fault recovery in Distributed Storage

>2. scalable and highly available storage 2.1 replication and fault tolerance Processing

Aurora storage layer replication is based on the Quorum protocol. Assume that there are V nodes in the replication topology, and each node has a voting right. to read or write data, you must obtain Vr or Vw votes to return the data. To ensure consistency, two conditions must be met. First, Vr + Vw> V ensures that each read can read the node with the latest data. Second, Vw> V/2, each write operation must ensure that the latest data written last time can be obtained to avoid write conflicts. For example, V = 3. To meet the preceding two conditions, Vr = 2 and Vw = 2. To ensure high system availability in various abnormal situations, Aurora database instances are deployed in three different AZS (AvailablityZone). Each AZS contains two copies, with a total of six copies, each AZ is equivalent to a data center and an independent Fault Tolerance Unit, including independent power supply systems, networks, and software deployment. Combined with the Quorum model and the two rules mentioned above,V = 6, Vw = 4, Vr = 3, Aurora can tolerate any AZ failure without affecting the write service; Any AZ failure and one node in another AZ failure, the read service is not affected and data is not lost.

2.2 partition management

Using the Quorum protocol, Aurora can ensure that only AZ-level faults (fire, flood, network faults) and node faults (Disk faults, power loss, and machine damage) occur differently, the Protocol itself will not be damaged, and Database Availability and correctness will be guaranteed. If you want the database to be "permanently available", the question is how to reduce the probability of two types of failures occurring at the same time. Because the frequency (MTTF, Mean Time to Fail) of a specific fault is certain, in order to reduce the probability of simultaneous occurrence of the fault, you can find a way to improve the fault repair Time (MTTR, mean Time To Repair ). Aurora manages the storage in parts. Each part has 10 GB and six 10 Gb copies constitute a PGs (Protection Groups ). Aurora storage is composed of several PGs. These PGs are actually storage nodes composed of EC2 (AmazonElastic Compute Cloud) servers and local SSD disks. Aurora currently supports a maximum of 64 TB of storage space. After sharding, each slice acts as a fault unit. In a 10 Gbps network, a 10 Gbit/s slice can be restored within 10 seconds, therefore, the availability of database services is affected only when two or more parts fail simultaneously within 10 s. In fact, this situation does not occur.Through shard management, the database service availability is cleverly improved.

2.3 lightweight O & M

Based on shard management, the system can flexibly cope with faults and O & M. For example, if the disk I/O pressure on a storage node is relatively high, you can manually remove the node and quickly add a new node to the system. In addition, storage nodes can also be temporarily removed during software upgrade. After the upgrade, the nodes can be added to the system. All these faults and O & M management are implemented in a rolling manner at the granularity of parts, which is completely transparent to users.

>3. Storage and computing separation 3.1 Traditional database write Amplification

Let's look at the process written in traditional databases. Taking a single-host MySQL as an example, executing write operations will cause logs to be flushed into the disk, and background threads will also asynchronously fl dirty pages to the disk. In addition, to avoid page breakage, the data page needs to be written to the double-write area during dirty page refreshing. If you consider master-slave replication in the production environment, as shown in figure 2, AZ1 and AZ2 respectively deploy a MySQL instance for Synchronous image replication, and the underlying storage uses Elastic Block Store (EBS ), in addition, each EBS has its own image. In addition, Simple Storage Service (S3) is deployed to archive redo logs and binlog logs to support time point-based recovery. From the process perspective, each step requires five types of data, including redo, binlog, data-page, double-write, and frm metadata. Because it is image-based synchronous replication, here I understand it as Distributed Replicated Block Device (DRBD), soSteps 1, 3, and 5 in the figure are sequential, and the response time of this model is very bad because it requires four network IO times and three of them are synchronous serial. From the storage point of view, four copies of data are saved on EBS, and four copies are written successfully before they can be returned.Therefore, in this architecture, both the I/O volume and the serialized model will cause very bad performance.

3.2 log processing is distributed to the storage layer

In traditional databases, When you modify a data page, the corresponding redo logs are generated synchronously. You can obtain the post-image of the data page after you play back the redo logs based on the pre-image of the data page. When a transaction is committed, you must write the redo logs corresponding to the transaction to the disk before returning the logs. In Aurora, there is only one write type, that is, redo logs, and no data pages are written at any time. The storage node receives redo logs. Based on the old data page, it can obtain the new data page. To avoid redo logs generated by changing data pages from the beginning each time, the storage node regularly materialized the data page version. 3,AuroraIt is composed of one master instance and multiple replica instances across AZ. Only redo logs and metadata are transmitted between the master instance and the replica instance or storage node. The master instance sends logs to six storage nodes and replica instances concurrently. When 4/6 of the storage nodes respond, the logs are considered persistent. For replica instances, the response time is independent.From the data of the sysbench test (GB scale, write-only scenario, and stress test for 30 minutes), Aurora is 35 times the throughput of the image-based MySQL, the log volume of each transaction is 7.7 times less than that of the image-based MySQL Log Volume. Let's take a look at the fault recovery speed. After a traditional database is down and restarted, the recovery starts from the latest checkpoint and reads all the redo logs after the checkpoint for playback, make sure that the data page corresponding to the committed transaction is updated. In Aurora, the redo log-related functions are pushed down to the storage layer, and the log playback can always be done in the background. Any read disk I/O operation, if the data page is not the latest version, will trigger the storage node playback log to get the new version of the data page. Therefore, fault recovery operations similar to traditional databases are constantly carried out in the background, but there are very few things to be done in real fault recovery, so the fault recovery speed is very fast.

3.3 Key Points of storage service design

One of the key principles of Aurora storage service design is to reduce the response time written by the front-end users. Therefore, the storage node moves as many operations as possible to the backend for asynchronous execution, and the storage node will follow the request pressure on the front-end, automatically allocates resources for different tasks. For example, when the current request is very busy, the storage node will slow down the collection of old data pages. In traditional databases, background threads need to constantly promote checkpoints to avoid excessive time consumption for fault recovery, but this will affect the processing capability of foreground user requests. For Aurora, the separated storage service layer enables the background thread to push the checkpoint action without affecting the database instance at all. The faster the promotion, the more favorable the disk I/O read operations at the front end (reducing the log playback process ). Aurora writes data based on the Quorum model. After the partitions are stored, the data can be returned after the majority is reached by disk. Due to the sufficient discrete distribution, a small amount of disk I/O pressure does not affect the overall write performance. 4. The main write process is described in detail in the figure. 1 ). the storage node receives logs from the database instance and appends the logs to the memory queue. 2 ). after the logs are successfully persisted locally, respond to the instance; 3 ). classify logs by shard and confirm which logs are lost. 4 ). interact with other storage nodes to fill in lost logs; 5 ). replay logs to generate new data pages; 6 ). periodically back up data pages and logs to the S3 system; 7 ). periodically recycle expired data page versions; 8 ). periodically performs CRC checks on data pages. All the above write-related operations, only 1st) and 2nd) are serialized and synchronized, which directly affects the response time of the front-end request. Other operations are asynchronous.

> 4. consistency principle

This section describes how Aurora ensures data consistency by transmitting redo logs between storage nodes through reading copies without using the 2 PC protocol. First, we will introduce how to do this without the need to redo logs during fault recovery. Second, we will introduce common operations, such as read, write, and transaction commit operations, next, we will introduce how Aurora ensures that the data read from the database copy instance is consistent. Finally, we will introduce the fault recovery process in detail.

4.1 log processing

Currently, almost all databases on the market use the WAL (Write Ahead Logging) log model. To change any data page, you must first Write the redo log corresponding to the modified data page, aurora's MySQL-based transformation is no exception. In implementation, each redo Log has a globally unique Log Sequence Number (LSN ). To ensure data consistency among multiple nodes, we didn't adopt the 2 PC protocol because the error tolerance of 2 PC is too low. Instead, we ensured the consistency of storage nodes based on the Quorum protocol. Because some logs may be missing from each node in the production environment, each storage node uses the gossip protocol to complete local redo logs.Under normal circumstances, the database instance is in the same State. When performing disk IO read, you only need to access the storage nodes with full redo logs. However, during the fault recovery process, read operations must be performed based on the Quorum protocol to recreate the consistent state of the database during running.There are many active transactions in the database instance, and the transaction start sequence and commit sequence are also different. When the database encounters an exception and crashes, the database instance needs to determine whether to commit or roll back each transaction. This section describes several key concepts related to redo logs stored at the service layer in Aurora. Volumn Complete LSN (VCL) indicates that the storage service has all the Complete logs before VCL. When the fault is restored, all logs with LSN greater than VCL will be truncated. ConsistencyPoint LSNs (CPLs) for MySQL (InnoDB), as shown in,Each transaction is physically composed of multiple mini-transactions, and each mini-transaction is the smallest atomic operation unit. For example, the split of the B-tree may involve the modification of multiple data pages, the corresponding log group corresponding to these page modifications is atomic. When redoing the log, you also need to use mini-transaction as the unit.CPL indicates the LSN of the last log in a group of logs. A transaction consists of multiple CPL, so it is called CPLs. Volumn Durable LSN (VDL) indicates the maximum persistent LSN, which is the largest LSN among all CPLs and VDL <= VCL. To ensure the atomicity of mini-transaction is not damaged, all logs greater than VDL must be truncated. For example, if VCL is 1007, LSN is 1100, 1000, and 1000 is CPLs, we need to cut off logs earlier.VDLIndicates the latest position in the consistent state of the database. During fault recovery, the database instance confirms the VDL in PG units and truncates all logs greater than the VDL.

4.2 basic operations

1). Writes

In Aurora, the database instance transmits the redo log to the storage node. When the majority is reached, the transaction is marked as committed, and then VDL is promoted to bring the database into a new consistent state.At any time, the database runs thousands of transactions concurrently. Each redo log of each transaction is assigned a unique LSN, which must be greater than the latest VDL, to avoid concurrent execution of front-end transactions too fast and the VDL of the storage service is not timely, we have defined LSN Allocation Limit (LAL), which is currently defined as 10,000,000, this value indicates the maximum threshold value for the difference between the newly allocated LSN and VDL. The purpose of setting this value is to avoid the storage service from becoming a bottleneck and thus affect subsequent write operations. Because the underlying storage is sharded by segment, each slice manages a part of the page. When a transaction involves modifications across multiple shards, the logs corresponding to the transaction are dispersed, only some logs of this transaction can be seen for each shard.To ensure the integrity of each slice log, each log records the link of the previous log and ensures that the slice has a complete log through the forward link. Segment Complete LSN (SCL) indicates the position where the slice has the Complete log. The storage nodes compensate for the local log holes through the gossip protocol and promote.

2). Commits

In Aurora, transaction commit is completely asynchronous. Each transaction is composed of several logs and contains a unique "commit LSN". When a worker thread processes a transaction commit request, commit transaction-related logs to the persistent queue, suspend the transaction, and continue to process other database requests.When the VDL point is greater than the transaction's commit LSN, it indicates that the transaction redo log has been persistent. You can return the package to the client to notify the transaction that it has been successfully executed. In Aurora, there is an independent thread to process the Back-to-package work successfully executed by the transaction. Therefore, from the perspective of the entire commit process, all the worker threads will not be blocked because the transaction commit waits for the log to advance.They will continue to process new requests. This asynchronous submission method greatly improves the system throughput. This asynchronous submission method is widely used, and AliSQL adopts a similar method.

3). Reads

In Aurora, like most databases, data page requests are generally obtained from the buffer pool. When the corresponding data page in the buffer pool does not exist, the requests are obtained from the disk. If the buffer pool is full, a data page is replaced by a specific elimination algorithm (such as LRU). If the replaced data page is modified, first, you need to fl the data page to ensure that the latest data can be read the next time you access this page. However, Aurora does not. Instead, data pages are discarded instead of flushed to the disk.This requires that the data page in the Aurora buffer pool must have the latest data. The page-LSN of the obsolete data page must be smaller than or equal to VDL. (Note: if there is a problem described in this paper, page-LSN <= VDL can be eliminated, not greater than or equal)This constraint guarantees two points: 1. all modifications to this data page have been made persistent in the log. when the cache does not hit, the latest data page version is always obtained through the data page and VDL.

Under normal circumstances, Quorum is not required for read operations. When the database instance needs to read disk IO, use the latest VDL as the consistent point read-point, and select a node with all the logs of the VDL point as the request node, in this way, you only need to access this node to obtain the latest version of the data page.From the implementation point of view, because all data pages are managed by parts, the database instance records the parts managed by the storage node and the SCL information. Therefore, when performing IO operations, you can use the metadata to know which storage node has a data page to access and check the read-point. The database instance receives client requests and calculates the Minimum Read Point LSN in PG units. When there is a Read copy instance, each instance can perform similar computation to obtain the Point, the global per-Group MRPL between instances is obtained through the gossip protocol, which is called PGMRPL. PGMRPL is a low level of global read-point. Each storage node continuously promotes the data page version based on PGMRPL and recycles logs that are no longer in use.

4). Replicas

In Aurora, write replica instances and up to 15 read replica instances share a set of distributed storage services. Therefore, increasing read replica instances does not consume more disk I/O write resources and disk space. This is also the advantage of shared storage, with no storage cost to add new read copies.The read and write copy instances are synchronized through logs. When a replica instance sends logs to a storage node and sends logs to the read replica, the read replica is played back in the log order,If the corresponding data page is not in the buffer pool during log playback, It is discarded directly. The reason for discarding is that the storage node has all the logs. When you need to access this data page next time, the storage node can create a specific data page version based on read-point.It should be noted that the write copy instance sends logs to the read copy asynchronously, And the write copy execution and commit operations are not affected by the read copy. Two basic principles must be followed when the copy is used for log playback. 1 ). the LSN of log playback must be smaller than or equal to VDL, 2 ). during log playback, MTR is used as the Unit to ensure that the secondary instinct sees the consistency view. In actual scenarios, the delay between reading and writing copies cannot exceed 20 ms.

4.3 fault recovery

Most databases handle fault recovery based on the classic ARIES protocol. The WAL mechanism is used to ensure that transactions committed during the fault are persistent and roll back uncommitted transactions. Such systems usually perform checkpoints cyclically and include the checkpoint information into logs. When a fault occurs, the data page may contain both submitted and uncommitted data. Therefore, when the fault is restored, the system first needs to play back the log from the previous checkpoint, restore the data page to the status when the fault occurs, and then roll back the uncommitted transactions according to the undo log. From the process of fault recovery,Fault recovery is a time-consuming operation and is highly related to the frequency of checkpoint operations. By increasing the checkpoint frequency, you can reduce the fault recovery time, but this will directly affect the system's request throughput at the front-end. Therefore, you need to make a balance between the checkpoint frequency and the fault recovery time, in Aurora, this trade-off is not required.

In traditional databases, the fault recovery process advances the database status through log playback. When the logs are redone, the entire database is offline.AuroraA similar method is also used. The difference is that the logic of the playback log is pushed down to the storage node, and the function runs normally in the background when the database provides services online.Therefore, when a fault is restarted, the storage service can be quickly restored, and can be recovered within 10 s even under 10 wtps. After the database instance is down and restarted, it must be restored to obtain the consistent state during running. The instance communicates with the Read Quorum storage nodes to ensure that the latest data is Read, and re-calculate the new VDL. Logs exceeding the VDL can be truncated and discarded. In Aurora, the new LSN range is limited, and the difference between LSN and VDL cannot exceed 10,000,000. This is mainly to prevent database instances from accumulating too many uncommitted transactions, since the database needs to perform undo recovery after redo logs are played back, it will roll back uncommitted transactions. In Aurora, services can be provided after all active transactions are collected. The entire undo recovery process can be performed after the database is online.

>5. Aurora system on the cloud

In the InnoDB community, a write operation modifies the data page content in the buffer pool and writes the corresponding redo logs to WAL in sequence. When a transaction is committed, the WAL Protocol specifies that the logs of the corresponding transaction must be persistent before they can be returned. In fact, to prevent page breakage, the Data Pages modified in the buffer pool will also be written to the double-write area. Write operations on data pages are performed in the background, usually during page replacement or checkpoint. In addition to the I/O subsystem, InnoDB also includes the transaction subsystem, lock management system, B + Tress implementation, and MTR. MTR specifies the minimum transaction. logs in MTR must be executed in an atomic manner (for example, B + Tree splitting or merging related data pages ).
The database engine is improved based on the InnoDB engine of the Community edition to separate disk I/O reads and writes to the storage service layer. Redo logs are divided by PG. The last log of each MTR is the consistency point. Like the Community's MySQL version, it supports standard isolation levels and snapshot reads. Aurora replica instances continuously obtain the transaction start and commit information from the write replica instance, and use this information to provide the snapshot READ function.The database instance and the storage service layer are independent of each other. The storage service layer provides a unified data view to the database instance. The database instance obtains data from the storage service layer and reads data from the local database.The deployment architecture of Aurora on the cloud is displayed. Aurora uses AmazonRelational Database Service (RDS) to manage metadata. RDS deploys an agent on each instance, which is called Host Manager (HM ). HM monitors the health status of the cluster and determines whether abnormal switchover is required, or whether an instance needs to be rebuilt. Each cluster consists of one write copy, zero or multiple read copies. All instances are deployed in one physical Region (for example, East USA and West USA). Generally, they are deployed across AZ, and the distributed storage service layer is in the same Region. To ensure security, we have isolated the database layer, application layer, and storage layer. In fact, database instances can communicate with each other through three types of Amazon Virtual Private Cloud (VPC) networks. Through the application layer VPC, applications can access the database; through the rds vpc, the database can interact with the control node; through the storage layer VPC, the database can interact with the storage service node.
The storage service is actually composed of a group of EC2 clusters. The cluster spans at least three AZS and provides storage, read/write IO, backup and recovery services for multiple users. The storage node manages local SSDS and interacts with database instances and other storage nodes. The backup/restoration service continuously backs up new data to S3 and restores data from S3 when necessary. The storage service's control system uses Amazon DynamoDB as a persistent storage. It stores configuration, metadata, and information backed up to S3. To ensure high availability of storage services, the entire system needs to actively and quickly discover problems before exceptions affect users. All key operations in the system are monitored. If performance or availability problems occur, an alarm is triggered immediately.

>6. Performance Data

I will not discuss the performance data here. For details, refer to the original paper.

>7. Practical Experience

We found that more and more applications are migrated to Aurora clusters. They share some similarities. We hope to abstract some typical scenarios and sum up experience.

7.1 multi-tenant

Many Aurora users are working on Software-as-a-Service (SaaS) services. The underlying storage models of these services are generally relatively stable. Multiple users share a single database instance to reduce costs. In this mode, there will be a large number of tables on the database instance, resulting in a surge in metadata, which increases the burden of dictionary management. In this architecture, users generally face three types of problems: 1 ). maintain high throughput and concurrency on the instance, 2 ). it can flexibly cope with disk space problems, evaluate disk space in advance, and have fast scalability. 3) the impact between different tenants should be minimized. Aurora can perfectly solve these three problems.

7.2 high concurrency processing capability

Internet applications usually lead to sudden increases in pressure due to various reasons, which requires the system to have good scalability and high concurrency processing capabilities. In Aurora, because the underlying storage service and upper-layer computing nodes can be easily and automatically scaled up, Aurora has the ability to scale quickly. In fact, Aurora has a large number of user connections that have remained above 8000 for a long time.

7.3 table structure upgrade

Since table structure upgrades are often accompanied by table locking and table copying, the duration is also relatively long, And DDL is a daily operation, so an efficient online DDL mechanism is required. It mainly includes 2 points, 1 ). the schema has multiple versions. Each data page stores the schema information of the time. The data on the page can be parsed using the schema information. 2 ). modify the page by using the modify-on-write mechanism to reduce the impact.

7.4 software upgrade

Because you usually only have one primary instance when using Aurora, any problems may be serious. In Aurora, all persistent data is stored at the storage layer. The status information of database instances can be obtained through the storage layer and metadata. Therefore, you can easily construct a new database instance, to improve software upgrade efficiency, we use Zero-Downtime Patch (ZDP) to perform rolling upgrade.

>8. Summary

Aurora was born because traditional OLTP databases with high throughput cannot guarantee availability or durability in an auto scaling cloud environment.AuroraThe key point is to separate storage and computing in traditional databases. Specifically, the log part is pushed down to an independent distributed storage service layer. In this isolated architecture, All I/O operations are through the network, and the network will become the biggest bottleneck. Aurora focuses on optimizing the network to improve the system throughput. Aurora relies on the Quorum model to solve various exception errors in the cloud environment on the premise of controllable performance impact. In Aurora, log processing technology reduces I/O write amplification, asynchronous submission protocol avoids synchronization waiting, and separated storage service layer also avoids offline fault recovery and checkpoint operations.Aurora's storage and computing Separation Solution simplifies the overall system architecture and facilitates future evolution.

Q &

1.Generally, we know that the Quorum algorithm only needs to meet Vr + Vw> N. Why does Aurora still need to meet 2nd conditions?
It is learned from this article that the Quorum Algorithm in Aurora must meet two conditions:
A) the Vr + Vw> N and NWR algorithms ensure that the read and write operations have an intersection and can read the latest written data.
B). Vw> N/2 to avoid update conflicts.
The basic meaning of the Quorum algorithm is to ensure that there is an intersection between the number of read replicas and the number of write replicas to read the latest written data. Aurora adds the second constraint to ensure that each write set has an intersection with the last written node set and that the last update can be read. This way, logs can be appended in auto-increment mode, make sure that updates are not lost. From another perspective, the number of write replicas is set to Vw> N/2, which indirectly balances the read/write performance. If N = 5, W = 1, R = 5, and the first condition is met, each time a node is written, a success is returned. Then, five copies must be read during read, to get the latest data.

2.Each segment Shard is 10 Gb. How can a transaction handle multiple segment shards?
Each transaction is actually composed of several MTR (mini-transaction). MTR is the minimum atomic operation unit for modifying physical blocks in InnoDB, the MTR redo log is written as a whole to the global redo log area. In Aurora, storage is managed by segment shards. When logs are sent, they are also sent by sharding. In this case, an MTR is divided into multiple segement shards, MTR logs are dispersed. In essence, there is no problem with this situation becauseWhen a transaction is committed, if the transaction spans multiple segment shards, multiple PG will be required to meet the Quorum protocol before returning, and then promote VDL. Therefore, if the VDL exceeds the commit-LSN of the transaction, it indicates that all the logs involved in the transaction are persistent and there will be no loss of some segment logs, so the transaction persistence can be guaranteed.

3. AuroraHow does one implement MVCC when reading a copy?
In Aurora, when the write copy instance sends logs to the storage node, the logs are transmitted to the read copy instance, and the read copy is played back in MTR units. At the same time, writing a copy also transmits the transaction start and commit information to the read copy, so that an active transaction view can be constructed on the read copy instance. When performing read operations on the read replica, the data of the appropriate version can be read Based on the active transaction view as a record for trainee judgment. Of course, Aurora read copies and write copies are asynchronously replicated, with a latency of up to 20 ms. Therefore, can Aurora read the latest committed data on read copies.

4.Why Does Aurora have a cost advantage?
In Aurora, multiple database instances (write copies + multiple read copies) share a distributed storage layer. For the database engine, the entire storage service is a large resource pool. You can apply for a storage space based on your needs. The minimum granularity is 10 Gb. Compared with the local storage of a single instance, the storage space usage is greatly improved, and it is very convenient to expand. On the other hand, all database engines share one copy of storage, and zero storage costs increase the database engine, significantly reducing costs.

5. Aurora?
Advantages:
1) storage nodes and computing nodes can be elastically scaled and matched as needed
2) one storage, connected to multiple computing nodes, multi-tenant storage service, low cost.
3). Pass logs only, cleverly solving the write amplification problem.
4). Quick Recovery of instance faults
5). The architecture is simple, and the read capability can be quickly expanded through multiple instincts. A single write copy cleverly avoids complicated implementations such as distributed transactions.

Defects:
1). It is suitable for read-write-less applications, and horizontal write scaling relies on middleware solutions.
2). The SQL layer, like the Community version of MySQL, has weak complex query capabilities (such as OLAP scenarios.
3). Single write copy, no partition multi-point write capability (User dimension + time dimension)
4). The total capacity is limited to 64 TB

6. AuroraWhat are the similarities and differences with Spanner?

From the perspective of comparison, Aurora and Spanner take two different routes. Aurora is market-oriented, so it is fully compatible with MySQL/PostgreSQL, and users can seamlessly migrate to Aurora. In addition, Aurora is modified based on MySQL and uses the shared storage solution to avoid complicated implementations such as two-stage commit and distributed transactions. Therefore, the development cycle is relatively short and results are easier to generate. In contrast, Spanner is a re-designed database. It is complicated to implement both Paxos-based strong synchronization and distributed transaction support, and to implement globally consistent read using TrueTime mechanism, this feature is very powerful, but it does not do well in SQL compatibility. It can also explain why Aurora is widely used in cloud services, while Spanner is mostly used inside Google, I believe that even if Spanner is now available in the cloud version, its SQL compatibility is also a great challenge. Of course, Aurora's database engine is MySQL, and its shortcomings in complex queries need to be continuously improved and optimized.

Note: This article has been published on the public account. If you are interested, you can follow the public account of our team.

Https://mp.weixin.qq.com? _ Biz = MzIxNTQ0MDQxNg ==& mid = 2247484007 & idx = 1 & sn = 5b85170bf46c5e54edc1e8b8a1c211e1 & chksm = fingerprint # rd

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More