MicrosoftAzure Storage Architecture Design

Source: Internet
Author: User
SQLAzure description SQLAzure is the logical database of the Azure storage platform, and the physical database is still SQLServer. A physical SQL Server is divided into multiple partitions. Each partition is a SQLAzure instance, which is often called a sub-table (tablet) in a distributed system ). Like most distributed storage systems, SQLAzu

SQL Azure introduction SQL Azure is the logical database of the Azure storage platform, and the physical database is still SQL Server. A physical SQL Server is divided into multiple partitions. Each partition is a SQL Azure instance, which is often called a sub-table (tablet) in a distributed system ). Like most distributed storage systems, SQL Azu

SQL Azure Introduction

SQL Azure is the logical database of the Azure storage platform, and the physical database is still SQL Server. A physical SQL Server is divided into multiple partitions. Each partition is a SQL Azure instance, which is often called a sub-table (tablet) in a distributed system ). Like most distributed storage systems, SQL Azure provides three copies of data storage, one copy is Primary at the same time, and the other copies are Secondary, it can provide eventually consistent read services. The maximum data size allowed for each SQL Azure instance can be 1 GB or 5 GB (Web Edition), 10 GB, 20 GB, 30 GB, 40 GB or 50 GB (Business Edition ). Because the maximum data size of sub-tables is limited, the Azure storage platform does not support sub-Table splitting.

For example, similar to most Web system architectures, the Azure storage platform can be roughly divided into four layers, from top to bottom:

  • Client Layer: converts a user's request to an internal TDS stream in Azure;
  • Services Layer: equivalent to a gateway, equivalent to the logic Layer of a common Web system;
  • Platform Layer: storage node cluster, equivalent to the database Layer of a common Web system;
Infrastructure Layer: hardware and operating system. The hardware used by Azure is a common PC. The typical configuration in this paper is 8-core, 32 GB memory, and 12 disks. The approximate price is USD 3500;

Services Layer

The service Layer is equivalent to the logic Layer of a common Web system. Its functions include routing, billing, and permission verification. In addition, the SQL Azure service Layer also monitors the storage nodes in the Platform Layer, completes down detection and recovery, Server Load balancer, and other overall control work. The architecture of Services Layer is as follows:

For example, the service layer contains four types of components:

1. Front-end cluster: completes the routing function and includes the Anti-attack module, which is equivalent to a Web server in the Web architecture, such as Apache or Nginx;

2. Utility Layer: provides functions such as server validity verification and billing;

3. Service Platform: monitors the health status of machines in the storage node cluster, completes downtime detection and recovery, Server Load balancer, and other functions;

4. Master Cluster: Configure the server to save the information of the physical storage node where the copy of each SQL Azure instance is located;

Among them, the Master Cluster is generally configured as seven machines, using the "Quorum Commit" technology, that is, any Master operation must be synchronized to more than four copies to be successful, the failure of the following four Master machines does not affect services. Other types of machines are stateless and homogeneous. The request process is described as follows:

1. The client establishes a connection with the Front-end machine, and the Front-end verifies whether operations on the client are supported. operations such as create database can only be performed through Azure utility;

2. The Front-end gateway machine performs SSL handshake authentication with the client. If the client refuses to use the SSL protocol, the connection is closed. In this process, anti-attack protection will also be implemented, such as rejecting frequent access to a certain or a range of IP addresses;

3. The Front-end gateway machine requests Utility Layer for necessary verification, such as requesting server address whitelist authentication;

4. The Front-end gateway machine requests the Master to obtain the copy information of the physical storage node where the data shards requested by the user are located;

5. The Front-end gateway machine requests the physical storage node in the Platform Layer to verify the user's database permissions;

6. If the above authentication is successful, the client establishes a new connection with the storage node in the Platform Layer;

7 ~ 8. All subsequent client requests are directly sent to the physical storage node in the Platform Layer. The Front-end gateway only forwards requests and replies to the data and acts as an intermediate proxy.

Platform Layer

The platform layer is a storage node cluster that runs physical SQL Server servers. Client requests are forwarded to the data node at the platform layer through the Front-end gateway node. Each SQL Azure instance is a data shard of SQL Server, each data shard stores three copies on different SQL Server data nodes. At the same time, only one copy is Primary, and other copies are Secondary. Data Writing adopts the "Quorum Commit" policy. The client is returned only when at least two copies are successfully written, so that even if a data node fails, normal services are not affected. The architecture of Platform Layer is as follows:

Vcm0uanBn "height =" 412 "src =" http://www.68idc.cn/help/uploads/allimg/150830/0G0593503-2.jpg "width =" 550 "/>

For example, each SQL Server data node can serve a maximum of 650 data shards. Write operations of all data shards on each data node are recorded in one operation log file, this improves the aggregation performance of write operations. Data synchronization between multiple copies of each shard is achieved through synchronization and playback of operation logs. Because the machines where the copies of each shard are located may be different, each SQL Server storage node needs to synchronize data with up to 650 other storage nodes, and network aggregation is not enough. This is also the reason why a single storage node can serve up to 650 shards.

For example, each physical storage node runs some practical deamon programs (called fabric), which are roughly described as follows:

1. Failure detection: detects data node faults and triggers the Reconfiguration process;

2. Reconfiguration Agent: After a node fails, it is responsible for re-generating Primary or Secondary data fragments on the data node;

3. PM (Partition Manager) Location Resolution: resolves the Master Address and sends messages to the Master's Partition Manager for processing;

4. Engine Throttling: limits the proportion of resources occupied by each logical SQL Azure instance to prevent exceeding the capacity limit;

5. Ring Topology: All data nodes form a Ring, so that each node has two neighboring nodes to check whether the node is down;

Distributed Problems

1. Data Replication)

In SQL Azure, the "Quorum Commit" policy is used to store three copies of common data. If at least two copies are successfully written, success is returned. The Master stores seven copies, at least four copies must be written successfully. Update operations of each SQL Server node are written to an operation log file and sent to the other two copies over the network. Because the replicas of different data shards may be on different SQL Server machines, the operation logs of a storage node can communicate with a maximum of 650 machines. The network aggregation Effect of log synchronization is not good enough. To solve this problem, Yahoo's PNUTS adopted message-oriented middleware for Operation Log distribution to achieve the effect of aggregation operation logs.

2. downtime detection and recovery

SQL Azure's downtime detection paper is not detailed enough. It roughly means that each data node is monitored by some peer-to-peer data nodes, if a fault is detected, report the fault recovery process to the master node. If you cannot determine whether the data node is down, for example, stop replying to the command when the data node to be monitored is suspended, at this time, the arbitration node must conduct arbitration. Protocol control is required to determine whether a machine is down. The following articles will introduce it in detail.

If the data node fails, you need to start the downtime recovery process. The downtime data node serves up to 650 logical SQL Azure instances (sub-tables). These Sub-tables may be Primary or Secondary. The master node is centrally scheduled. Each time a data Shard is selected for Reconfiguration, that is, the sub-table replication process. For Secondary data sharding, you only need to copy data from Primary to add a copy. For Primary, you first need to select one Secondary from the other two replicas as the new Primary, execute the same process as the Reconfiguration of the Secondary data partition. In addition, priority control is required. For example, if a data Shard has only one copy, priority replication is required. The Primary of a Data shard cannot be used, switch Secondary from the remaining replicas to Primary first. Of course, you also need to configure some policies here. For example, if the status of only two replicas starts to replicate the third copy, SQL Azure is currently configured for two hours.

3. Server Load balancer

When a new data node is added or the load on a node is too high, the master node starts the load balancing process. Factors affecting data node load include the number of reads and writes, disk/memory/CPU/IO usage, and so on. Note that you need to control the pace of sub-Table migration when adding a new machine. Otherwise, a large number of sub-tables are migrated to the new machine at the same time, resulting in slow overall system performance.

Because SQL Azure can control each logical SQL Azure instance, that is, the size of each sub-table, it can simplify the system to a large extent without splitting sub-tables.

4. Transactions

SQL Azure supports database transactions. SQL statements related to database transactions record operation logs related to BEGIN TRANSACTION, ROLLBACK TRANSACTION, and COMMIT TRANSACTION. In SQL Azure, you only need to synchronize these operation logs to other copies. because at the same time, a single data shard can have at most one Primary to provide the write service, which does not involve distributed transactions. The transaction level supported by SQL Azure is READ_COMMITTED.

5. Multi-tenant interference

In a cloud computing system, multiple rented operations interfere with each other. Therefore, you need to restrict the system resources used by each SQL Azure logic instance:

1. system operating system resource restrictions, such as CPU and memory. When the limit is exceeded, the client needs to be replied for 10 s and then retried;

2. SQL Azure logical database capacity limit. Each logical database has a configured maximum capacity in advance. When the limit is exceeded, update requests are rejected, but deletion is allowed;

3. SQL Server physical database data size limit. If the limit is exceeded, a client system error is returned. manual intervention is required.

Difference from SQL Server

1. Unsupported operations: Microsoft Azure, as a platform for enterprise-level applications, is unable to support as many SQL features as possible. For example, the USE operation: SQL Server can switch databases through USE, but it is not supported in SQL Azure, because different logical databases may be located on different physical machines. For more information, see SQL Azure vs. SQL Server.

2. concept change: developers need to develop programs using distributed system thinking. For example, in addition to a successful connection, there is a third uncertain state of failure: no operation results are returned on the cloud, we do not know whether the operation is successful. For example, there is no free lunch like SQL in the world. For DBA personnel, daily database maintenance, such as upgrading, data backup and other work have been handed over to Microsoft, and more energy may be paid to the business system architecture.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.