Discussion on database technology in large data age

Source: Internet
Author: User
Keywords Can therefore server provide

Introduction

Today's world is a large data age of the information world, our life, whether life, work, learning are inseparable from the support of information systems. The database is the place behind the information system for saving and processing the final result. Therefore, the database system becomes particularly important, which means that if the database is facing problems, it means that the entire application system will also face challenges, resulting in serious losses and consequences.

Now the word "Big Data age" has become very popular, although it is unclear how the concept landed. But it is certain that with the rise of the Internet of things and mobile applications, the amount of data in the past will have a geometric level of ascension, so the problem that the database needs to solve is no longer merely to record the correct processing results of the program, but also to address the following challenges:

If database performance is experiencing problems, whether it can be scaled horizontally, to achieve higher throughput by adding servers, to make the best use of existing hardware to achieve better ROI.

Whether to have a copy of real-time synchronization, when the database is facing a disaster, can be a short period of time through failover to ensure the availability of the database. In addition, when data is lost or corrupted, 0 loss of data can be achieved through so-called live replicas (hot spare).

Whether the horizontal extension of the database is transparent to the application, and if the horizontal scaling of the database requires extensive modification of the application side, the consequences are not only high development costs but also many potential and potentially unintended risks.

One obvious way to face these challenges is to make a cluster of multiple servers, so that you can take full advantage of each server's resources and distribute the client load to different servers, and as the application load increases, just add the new servers to the cluster.

This article will discuss the concept of cluster, the form and the current mainstream database cluster technology.

The form of a database cluster

Database clusters and extensions are not as easy as application extensions, because from the database side, once the cluster is involved, often involves the database level of synchronization, so from the existence of data redundancy, we can divide the database cluster into the following two kinds of forms:

Share-disk Architecture

The Share-disk architecture is to share a single storage with multiple server nodes to implement the database cluster, and the simplest share-disk architecture of the two machines is shown in Figure 1.

Figure 1. Simple Share-disk Architecture

On this basis, Share-disk architecture is divided into copper and double live, which is the cluster of each node can provide services to the outside, and single live for the cluster only one node can provide services, the other servers in the cluster as redundancy in the "live" node in the server as a node for external service. The most typical products for this type of architecture are SQL Server Failover Cluster (SQL Server failover cluster), NEC Expresscluster, Rose HA. The drawbacks of this approach are also obvious, as follows:

A serious waste of hardware resources, only one server is alive in the same time cluster, and other servers can only serve as redundant servers.

Clustering does not improve performance because only one server is available

There is a single point of failure in storage, which typically requires expensive SAN storage unless it is guaranteed to be highly available at the storage level.

Therefore, this kind of scheme can only do the server level of high availability, can not bring performance improvement, and can not solve the problem of storage single point of failure. So if you don't go with other highly available or load-balanced technologies, it's not very significant.

Another kind of technology is share-disk in the dual-live technology, and the single live technology is different, dual-live technology is also a shared disk, but all nodes in the cluster can provide services, the typical product is Oracle's RAC. RAC is highly technical and requires a higher level of people to carry the system. RAC design is not intended for performance, but for high availability and scalability, if the application is not designed and developed for the RAC architecture, the application is migrated to the RAC because block contention (block busy waits) can cause a sharp drop in performance, And the more nodes have more performance degradation is more obvious.

Share-nothing Architecture

Share-nothing architecture is divided into two kinds, first, distributed architecture. Distributes data from a database to multiple machines, querying or inserting the corresponding partitions according to criteria.

The other is that each node is completely independent and the nodes are connected through the network, usually through a dedicated network of light brazing. As shown in Figure 2.

Figure 2. Share-nothing Redundancy Architecture

In the share-nothing architecture, each node has its own memory and storage, preserving a full copy of the data. In general, it can be divided into two types, which can be load balanced and not load balanced.

First of all, the load-balanced cluster, in the technology of not load and balance, nodes in the cluster are divided into primary and secondary nodes, the primary node is serviced externally, the secondary node is served as hot standby (two-phase transaction submission) or warm (no transaction synchronization is required), and it is possible to make the secondary node provide read-only services. Techniques for using this architecture include: SQL Server Alwayson,sql Server Mirror,oracle Data guard the benefits of this architecture include:

Secondary node data and master node are synchronized or quasi synchronous, when with Third-party arbitration, can implement automatic failover, thus achieving high availability

The secondary node can recover data from the secondary node, either automatically or manually, because it is completely independent of the primary node and data is synchronized or quasi synchronized, so that data corruption occurs on the primary node.

Because the share-nothing architecture uses a local storage (or SAN), it has a significant performance advantage over the Share-disk architecture in slow-speed networks

Of course, the drawbacks are obvious, because the secondary nodes are unable to provide services externally or can only provide read-only services, so the drawbacks of such clusters include:

Very limited scalability

Performance is not elevated because of data synchronization across nodes and even performance degradation

Secondary nodes, if readable, improve performance but need to modify front-end applications, opaque to applications

In another type of share-nothing architecture, load balancing is allowed. Load balancing is the distribution of the load on the database to multiple nodes in the cluster, and each node in the cluster can provide services externally to achieve higher throughput, better resource utilization, and lower response times. The front end is scheduled through an agent. Techniques for using this type of architecture include: Amoeba on MySQL (architecture, as shown in Figure 3, from the blog of MySQL master Chen Yu, ha Proxy on MySQL (as shown in Figure 4), the grey trend is in the Moebius cluster on SQL Server (see Figure 5).

Figure 3. Amoeba

Figure 4.HA Proxy

  

Figure 5. Moebius Cluster

The benefit of a load-balanced share-nothing architecture is that each server can provide services that leverage existing resources to achieve higher throughput. The amoeba may involve data slicing, the benefit of data slicing is more efficient for the processing of massive data, but it also introduces other problems, such as the need of the application-side corresponding data fragment adjustment, the problem of query processing across the fragment node, Whether each data fragment node can withstand the peak problem of their respective business load. This type of architecture needs to be implemented with a high level of staffing and needs to be adjusted at the application level, making it more appropriate for internet companies.

Another type of architecture that does not involve data fragmentation, such as the use of combinatorial schemes, such as Oracle Rac+f5. The other is to use a single vendor-provided scenario, such as Moebius on SQL Server. Each node in such a cluster will provide services externally, and therefore has the following benefits:

Because each node can provide services externally, it can improve performance

Extensibility can be enhanced by adding nodes directly to the cluster for scale-out expansion

Because the front-end application is connected to the cluster through a proxy, and each node in the cluster maintains a complete dataset, there is no problem that fragmentation does not cause performance degradation, so it is completely transparent to the application side

But compared to MySQL data fragmentation, the drawbacks of this kind of scheme is also obvious, because each node needs the complete dataset, therefore needs to occupy more storage space.

Summary

This paper discusses the database cluster technology from a relatively high level. From the database application level to the Share-disk cluster up to the highest form of clustering-can provide a load-balanced cluster and list some of the mainstream commercial products. Clustering is meant to ensure high availability, data security, scalability, and load balancing. If the current cluster products can not contain these features, and business scenarios also need to be combined with some existing technology to achieve, but after all, not everyone is a database expert, even if you have a bunch of tools and materials you can not do the iphone, Therefore, the design of the system at the beginning of the database to consider the project will be a lot of trouble.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.