Scalable database clustering technology in large data age

Last Update:2014-12-04 Source: Internet

Author: User

Keywords Large data cluster technology

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Tenkine Server channel June 3 message" http://www.aliyun.com/zixun/aggregation/32730.html "> Information system is the place where the final results are saved and processed is the database. Therefore, the database system becomes particularly important, which means that if the database is facing problems, it means that the entire application system will also face challenges, resulting in serious losses and consequences. Currently, the database faces the following challenges in large data trends:

If database performance is experiencing problems, whether it can be scaled horizontally, to achieve higher throughput by adding servers, to make the best use of existing hardware to achieve better ROI.

Whether to have a copy of real-time synchronization, when the database is facing a disaster, can be a short period of time through failover to ensure the availability of the database. In addition, when data is lost or corrupted, 0 loss of data can be achieved through so-called live replicas (hot spare).

Whether the horizontal extension of the database is transparent to the application, and if the horizontal scaling of the database requires extensive modification of the application side, the consequences are not only high development costs but also many potential and potentially unintended risks.

One obvious way to face these challenges is to make a cluster of multiple servers, so that you can take full advantage of each server's resources and distribute the client load to different servers, and as the application load increases, just add the new servers to the cluster.

Database clusters and extensions are not as easy as application extensions, because from the database side, once the cluster is involved, often involves the database level of synchronization, so from the existence of data redundancy, we can divide the database cluster into the following two kinds of forms:

Share-disk Architecture

The Share-disk architecture is the sharing of a single store through multiple server nodes to implement a database cluster. On this basis, Share-disk architecture is divided into copper and double live, which is the cluster of each node can provide services at the same time, and single live for the cluster only one node can provide services, Other servers in the cluster take over as redundancy in the case of a live node failure to become a node for external services.

The drawbacks of this approach are also obvious, as follows:

A serious waste of hardware resources, only one server is alive in the same time cluster, and other servers can only serve as redundant servers.

Clustering does not improve performance because only one server is available.

There is a single point of failure in storage, which typically requires expensive SAN storage unless it is guaranteed to be highly available at the storage level.

Therefore, this kind of scheme can only do the server level of high availability, can not bring performance improvement, and can not solve the problem of storage single point of failure. So if you don't go with other highly available or load-balanced technologies, it's not very significant.

Another kind of technology is share-disk in the dual-live technology, and the single live technology is different, dual-live technology is also a shared disk, but all nodes in the cluster can provide services, the typical product is Oracle's RAC. RAC is highly technical and requires a higher level of people to carry the system. The purpose of RAC design is not for performance, but for high availability and scalability, if the application is not designed and developed for the RAC architecture, migrating the application to RAC can result in a sharp drop in performance, and more performance degradation of the nodes.

Share-nothing Architecture

Share-nothing architecture is divided into two kinds, first, distributed architecture. Distributes data from a database to multiple machines, querying or inserting the corresponding partitions according to criteria. The other is that each node is completely independent and the nodes are connected through the network, usually through a dedicated network of light brazing.

In the share-nothing architecture, each node has its own memory and storage, preserving a full copy of the data. In general, it can be divided into two types, which can be load balanced and not load balanced.

First of all, the load-balanced cluster, in the technology of not load and balance, nodes in the cluster are divided into primary and secondary nodes, the primary node is serviced externally, the secondary node is served as hot standby (two-phase transaction submission) or warm (no transaction synchronization is required), and it is possible to make the secondary node provide read-only services.

The benefits of this architecture include:

The secondary node data and the master node are synchronized or quasi synchronized, and when paired with third party arbitration, automatic failover can be realized, thus achieving high availability.

Because the secondary node is completely independent of the primary node and data is synchronized or quasi synchronized, data corruption can be recovered from the secondary node (automatic or manual) after the primary node is corrupted.

Because the share-nothing architecture uses local storage (or Sans), the Share-disk architecture has a very high performance advantage over slow networks.

Of course, the drawbacks are obvious, because the secondary nodes are unable to provide services externally or can only provide read-only services, so the drawbacks of such clusters include:

Scalability is very limited.

Performance is not improved because it involves data synchronization for each node, and even performance degradation.

The secondary node, if readable, improves performance but needs to modify the front-end application and is opaque to the application.

In another type of share-nothing architecture, load balancing is allowed. The so-called load balance is to distribute the load of the database to multiple nodes in the cluster, each node in the cluster can provide service externally, so as to achieve higher throughput, better resource utilization and lower response time. The front end is scheduled through an agent. The benefit of a load-balanced share-nothing architecture is that each server can provide services that leverage existing resources to achieve higher throughput. Each node in such a cluster will provide services externally, and therefore has the following benefits:

Because each node can provide services externally, it can improve performance.

Extensibility is enhanced by Scale-out extensions to the cluster by adding nodes directly.

Because the front-end application is connected to the cluster through a proxy, and each node in the cluster maintains a complete dataset, it is completely transparent to the application side.

But the drawbacks of this type of scenario are also obvious, as each node requires a complete dataset, which requires more storage space.

Online Mall goods/Specifications/Promotional prices (author: Li to the Executive editor: Li Xiangjing)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More