Greenplum Learning (share-nothing) architecture

Source: Internet
Author: User
Tags failover

Today's world is an information-based world, our life, whether it is life, work, learning can not be separated from the support of information systems. The database is the place behind the information system to preserve and process the final results. Therefore, the database system becomes particularly important, which means that if the database is faced with a problem, it means that the entire application system will also face challenges, resulting in serious losses and consequences.

Now the word "big data" has become very popular, although it is unclear how the concept landed. But it is certain that with the rise of the Internet of things and mobile applications, the amount of data will be more geometric than in the past, so the problem that the database needs to solve is no longer just the correct processing result of the recording procedure, but also the following challenges need to be solved:

When database performance is experiencing problems, can scale-out be achieved by adding servers to achieve higher throughput, thereby leveraging existing hardware for better ROI.

Whether you have a copy of real-time synchronization, and when the database is in disaster, can be a short period of time through failover to ensure the availability of the database. In addition, when data is lost or corrupted, it is possible to achieve 0 loss of data through so-called real-time replicas (hot spares).

Whether the scale-out of the database is transparent to the application, and if the scale-out of the database requires extensive modification by the application side, the consequences are not only high development costs, but also many potential and non-potential risks.

One obvious approach to this challenge is to make multiple servers into a cluster of clusters, so that you can take advantage of the resources of each server and distribute the client load to different servers, and as the application load increases, you only need to add the new servers to the cluster.

This article will discuss the concept and form of cluster and the current mainstream database cluster technology.

The form of a DB cluster

The cluster and extension of the database is not as easy as the application extension, because from the database side, once involved in the cluster, often involves the database level of synchronization, so from the point of view of the existence of data redundancy, we can divide the database cluster into the following two forms from the large side:

 Share-disk Architecture

The Share-disk architecture is a database cluster with a single store shared by multiple server nodes, and the simplest of the two machines is shown in Share-disk architecture 1.


▲ Figure 1. Simple Share-disk Architecture

On this basis, the Share-disk architecture is divided into single copper and dual-live, dual-live is the cluster of each node can provide services at the same time, and single-live for the cluster only one node can provide services, the other servers in the cluster as redundancy in the "live" node fails, it becomes a node that provides services to the external server. The most typical product of this type of architecture is SQL Server Failover Cluster (SQL Server failover cluster), NEC Expresscluster, Rose HA. The drawbacks of this approach are also obvious, as follows:

Serious waste of hardware resources, the same time the cluster only one server alive, the other servers can only be redundant servers.

Cluster cannot improve performance because only one server is available

There is a single point of failure in storage, and expensive SAN storage is often required unless high availability is guaranteed at the storage level.

Therefore, this kind of solution can only be high availability at the server level, not improve the performance, and can not solve the problem of storage single point of failure. So if you're not paired with other high-availability or load-balanced technologies, there's not a lot of meaning.

Another type of technology is share-disk in the dual-live technology, unlike the single-live technology, although the dual-live technology is also shared disk, but all the nodes in the cluster can provide services, the typical product is Oracle's RAC. RAC is technically very high, and therefore requires a higher level of people to carry out the maintenance system. RAC design is not intended for performance, but for high availability and scalability, if the application is not designed and developed for the RAC architecture, migrating the application to a RAC can result in a dramatic decrease in performance due to block contention (block busy waits). And the more nodes, the more noticeable the performance degradation.

Share-nothing Architecture

The share-nothing architecture is divided into two types, the first of which is the distributed architecture. The data in the database is distributed to multiple machines according to a criterion, and the corresponding partitions are queried or inserted by criteria when queried or inserted.

The other is that each node is completely independent, and the nodes are connected through the network, usually through a dedicated network such as optical brazing. As shown in 2.


▲ Figure 2. Share-nothing Redundant architecture

In the share-nothing architecture, each node has its own memory and storage, preserving a complete copy of the data. In general, it can be divided into two types, which can be load balanced and not load balanced.

First of all, the non-load-balanced cluster, in the non-load balancing technology, the nodes in the cluster will be divided into the primary node and the secondary node, the primary node to provide services, the secondary node as a hot standby (two-phase transaction commit) or warm standby (do not need to ensure transaction synchronization), while it is possible to make the secondary node Techniques for using this architecture include: SQL Server Alwayson,sql Server Mirror,oracle Data guard the benefits of this architecture include:

The secondary node data is synchronized or quasi-synchronized with the master node, and when paired with a third-party quorum, automatic failover can be achieved, enabling highly available

The secondary node can recover data from the secondary node (either automatically or manually), since the primary node is completely independent and the data is synchronized or quasi-synchronized, so data corruption occurs on the primary

Because the share-nothing architecture uses local storage (or SAN), there is a significant performance advantage over the slow network compared to the Share-disk architecture

Of course, the drawbacks are obvious, because the secondary node cannot provide services externally or can only provide read-only services, so the drawbacks of such clusters include:

Very limited ability to expand

Performance is not improved because it involves data synchronization of each node and even performance degradation

Secondary node, if readable, improves performance, but needs to modify the front-end application to be opaque to the application

In another class of share-nothing architectures, load balancing is allowed. The so-called load balancing is that the load on the database is distributed to multiple nodes in the cluster, each node in the cluster can provide services to the outside, thus achieving higher throughput, better resource utilization and lower response time. The front end is dispatched through the agent. Techniques for using this class of architecture include: Amoeba on MySQL (architecture 3, excerpt from MySQL Master Chen Chang): http://www.cnblogs.com/gaizai/archive/2012/06/12/2546755.html, The Ha Proxy on MySQL (shown in 4), the grey Trend (www.grqsh.com) is on the SQL Server Moebius cluster (5).


▲ Figure 3. Amoeba


▲ Figure 4.HA Proxy


▲ Figure 5. Moebius Cluster

The benefit of a load-balanced share-nothing architecture is that each server can provide services that take advantage of existing resources to achieve higher throughput. Where amoeba may involve data fragmentation, the benefits of data fragmentation are more efficient processing of large amounts of data, but also introduce other issues, such as the need for application-side data fragmentation to adjust, cross-shard node query processing problems, Whether each data Shard node can withstand the peak of the respective business load. This type of architecture needs to be implemented with a high level of staffing and needs to be adjusted at the application level, making it more suitable for Internet enterprises.

Another type of architecture that does not involve data fragmentation, such as a class of composable schemes, such as Oracle Rac+f5, can be used. The other is using a single vendor-provided scenario, such as Moebius on SQL Server. Each node in such a cluster will provide services to the outside, thus benefiting from the following:

Performance can be improved because each node can provide services externally

Scalability can be improved by adding nodes directly to the cluster for scale-out expansion

Because the front-end application connects to the cluster through a proxy, and each node in the cluster maintains a complete set of data, there is no problem of performance degradation due to fragmentation, so it is completely transparent to the application side

However, compared to MySQL data sharding, the drawbacks of this kind of scheme are obvious, because each node needs a complete data set, so it needs to occupy more storage space.

Summary

This paper discusses the technology of database clustering from a relatively high level. The Share-disk cluster from the database application level to the highest form of the cluster-provides a load-balanced cluster and lists some of the mainstream commercial products. The existence of clusters is meant to ensure high availability, data security, scalability, and load balancing. If the current cluster product cannot contain these features, and the business scenario also needs to be combined with some existing technology to achieve, but after all, not everyone is a database expert, even give you a bunch of tools and materials you can not do out of the iphone, Therefore, the design of the system at the beginning of the database to consider the project will save a lot of trouble.

Greenplum Learning (share-nothing) architecture

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.