In-depth discussion on mainstream database cluster technologies

Source: Internet
Author: User
Tags ibm db2 sybase password protection
The database used to save the computing results is an important part of the entire information system, and the technology is relatively mature. However, for all databases, apart from recording the correct processing results, there are also some challenges: how to increase the processing speed, data availability, data security, and dataset scalability. Join multiple databases to form a Database Cluster

The database used to save the computing results is an important part of the entire information system, and the technology is relatively mature. However, for all databases, apart from recording the correct processing results, there are also some challenges: how to increase the processing speed, data availability, data security, and dataset scalability. Join multiple databases to form a Database Cluster

The database used to save the computing results is an important part of the entire information system, and the technology is relatively mature. However, for all databases, apart from recording the correct processing results, there are also some challenges: how to increase the processing speed, data availability, data security, and dataset scalability. Connecting multiple databases together to form a database cluster is a natural idea.

The Cluster technology uses a specific connection method to combine relatively low-price hardware devices and provides high-performance task processing capabilities. This article attempts to analyze and comment on the specific technologies and mainstream products used by the current major database clusters, so as to provide readers with a reference for evaluating a database cluster.

The database cluster technology discussed below belongs to two types of systems: the cluster technology based on the database engine and the cluster technology based on the database Gateway (middleware.

Database Engine-based cluster technology (shared disk or non-shared disk)

Cluster technology based on database Gateway (middleware) (without disk sharing)

Key Technologies

The Comparison Between Complex Database Cluster technologies is actually to compare the performance of the sub-technologies it contains and the coordination and operation capability between them, the following text introduces the core technologies that need to be paid attention to most by database clusters, and also pays attention to some technical details.

Four methods to increase processing speed

Increase disk speed:The main idea is to increase the disk concurrency. Although the implementation methods are different, their ultimate goal is to provide a storage image of a logical database.

[Comment] to speed up disk access, the system creates a virtual "big" database covering all data without considering the actual physical disk storage location of data.

Storage of Scattered Data:Multiple physical servers are used to store different parts of a dataset, making parallel computing possible for different servers.

Oracle rac is a shared disk architecture. You only need to add a server node, and RAC can automatically add the node to its cluster service, RAC automatically allocates data to this node and distributes the subsequent database access to the appropriate physical server without modifying the application; UDB is a non-shared disk architecture. You need to manually modify the Data Partition. The same is true for the MSC and ASE systems. ICX is a middleware-based database cluster technology that is transparent to clients and database servers. It can be used to cluster several database clusters.

[Comment] The system disperses data tables to multiple servers or each server is responsible for several different tables, the purpose of this operation is to increase the access speed through parallel operations between multiple servers.

Symmetric multi-processor system: Improve the processing speed of the database by using multi-processor hardware technology. All Database Engine-based clusters support this technology.

[Comment] The multi-CPU processor is reasonably scheduled to handle different access requirements at the same time, but the actual benefits of this technology in database applications are very limited.

Transaction processing load balancing:When the dataset content is synchronized, read-only operations are distributed to multiple independent servers. Because the vast majority of database operations are browsing and querying, if we can have multiple database servers for content synchronization, transaction Load Balancing has the greatest potential (which can be far greater than the symmetric multi-processor system with up to four processors described above) to improve the processing speed of the database, while at the same time having high data availability.

All Database Engine-Based Cluster Systems support only one logical database image and one logical or physical backup. The main purpose of this backup is to prevent data disasters. Therefore, the data in the backup can only be updated through the replication mechanism, and the application cannot directly update it. Transaction Load Balancing using backup data is only applicable to some very limited applications, such as report statistics, data mining, and other non-critical business applications.

[Comment] Server Load balancer is an old technology. However, increasing the performance to the maximum is also the ultimate goal of cluster design. Traditionally, transaction Load Balancing using backup data is only applicable to some very limited applications.

All of the above technologies can be used together in actual system deployment to achieve the best results.

Four Ways to Improve availability

Hardware redundancy:Allows a multi-processor to execute the same task at the same time to block transient and permanent hardware errors. There are two implementation methods: constructing a special redundant processor and using multiple independent database servers.

Database-based cluster systems use multiple independent database servers to implement a logical database. Each processor runs different tasks at any moment. This type of system can shield the damage of one or more servers, but because there is no redundant processing, each recovery takes a long time.

[Comment] in the traditional sense, the more expensive the hardware is, the higher the performance is, but it is often counterproductive. To improve hardware redundancy by appending and upgrading hardware devices, detailed requirement analysis and demonstration are required.

Communication Link-level redundancy:Redundant communication links can shield transient and permanent communication link-level errors.

The database engine-based cluster system has two structures: Shared disk and independent disk. RAC and MSC can be considered as Cluster Systems with shared disks. UDB and ASE are independent disk Cluster Systems. The shared disk cluster system has the least redundant communication.

[Comment] communication link-level redundancy is fault tolerant.

Software-level redundancy:Due to the high concurrency of modern operating systems and database engines, errors caused by competition conditions, deadlocks, and time-related errors occupy the vast majority of the causes of abnormal service downtime. Multiple redundant database processes can be used to prevent transient and permanent software errors. The database engine-based cluster system uses multiple processors to implement a logical database. They can only provide some software redundancy, because each processor executes different tasks every moment.

[Comment] Improving Software design to improve redundancy performance and SHIELD software-level errors is the dream of every technology developer. Traditional cluster systems can only provide some software redundancy.

Data redundancy:

1. Passive dataset update: All Current Data Replication technologies (synchronous or asynchronous), such as disk images, database file replication, and database backup tools provided by database vendors can only generate passive dataset replication. It is generally used only for disaster recovery.

[Comment] Most applications use the passive dataset update method. This method has poor disaster tolerance capability and a large amount of resource occupation, and is already facing elimination and innovation.

2. Actively update a dataset: This type of dataset needs to be managed by one or more backup database servers. It can be used for report generation, data mining, disaster recovery, or even low-quality load balancing. It can be divided into two types: synchronous and asynchronous.

Asynchronous Active Data Set replication: first, the transaction is handed over to the master server for completion, and then the transaction processing is serialized to the backup server for the same operation to ensure data consistency. All commercial databases support asynchronous active replication.

Synchronous Active Data Replication: requires that all concurrent transactions be processed on all database servers at the same time. The direct advantage is to solve the queue management problem and achieve higher performance and availability through Server Load balancer. RAC, UDB, MSC, and ASE are fully serialized and combined with the two-phase commit protocol. The design goal is to obtain a dataset that can be used for rapid disaster recovery.

[Comment] actively updating a dataset is currently an advanced data redundancy method. Professionals can also compare the underlying technical details. The differences in underlying technologies directly affect some important indicators.

Technologies for improving security and dataset scalability

There is little room for innovation in improving database security and dataset scalability. The most common security method for databases is password protection, either distributed or centralized. Adding a firewall in front of the database will increase extra latency. Therefore, although many security violations come from within the company, the database firewall is rarely used. If the database cluster technology is implemented based on the middleware technology, it is possible to implement the firewall function on the path of data without adding additional latency. Database dataset scalability can only be achieved by distributing data to multiple independent physical servers.

Mainstream Products

In terms of Database Cluster products, it mainly includes database engine-based cluster technologies such as Oracle RAC, Microsoft MSC, IBM DB2 UDB, Sybase ASE, and database Gateway (middleware) cluster technology such as ICX-UDS.

Oracle RAC

Oracle RAC supports all types of mainstream commercial applications that Oracle databases run on clusters. This includes popular encapsulated products, such as SAP, PeopleSoft, and Oracle E-Business Suite, as well as self-developed applications, including OLTP and DSS, and Oracle's unique capability to effectively support hybrid OLTP/DSS environments. Oracle is the only vendor that provides open system databases with this function. Oracle RAC runs on clusters and provides the highest level of availability, scalability, and low-cost computing capabilities for Oracle databases. If a node in the cluster fails, Oracle can continue to run on other nodes. If you need higher processing capabilities, you can easily add new nodes to the cluster. To maintain low costs, even the highest-end systems can be gradually built from a small low-cost cluster using standardized commercial components.

Oracle's major innovation is a technology called high-speed cache merge, which was initially developed for the real application cluster of Oracle9i. High-speed cache merge allows nodes in the cluster to synchronize their memory high-speed cache efficiently through high-speed Cluster Interconnection, thus minimizing disk I/O. The most important advantage of high-speed cache is that it allows the disks of all nodes in the cluster to share access to all data. Data does not need to be partitioned between nodes. Oracle RAC supports enterprise grids. Oracle RAC's high-speed cache merge technology provides the highest level of availability and scalability. Oracle RAC significantly reduces operating costs, increases flexibility, and gives the system superior adaptability, foresight, and flexibility. The dynamic provision of nodes, memory, CPU and memory can achieve the required service level while continuously reducing costs through improved utilization.

Oracle RAC adopts the "sharing everything" implementation mode to achieve seamless clusters between multiple nodes through CPU sharing and storage device sharing, each task submitted by the user is automatically assigned to multiple machines in the cluster for execution. You do not need to use redundant hardware to meet high reliability requirements. On the other hand, RAC can achieve CPU sharing, even if a cluster composed of common servers can achieve the high performance that only large hosts can provide in the past.

Microsoft MSC

Microsoft has been expanding the scalability, availability, and reliability of its server solutions for several years. Initially named Wolfpack and known as Microsoft Cluster Server and Microsoft cluster service, the MSC is Microsoft's first major attack in the NT cluster technology field. It is recognized as the best Microsoft cluster solution. In the MSC cluster, the MSC Software can establish connections with up to four physical computers running on the high-speed network. Generally, computers in a cluster can share the same storage subsystem and functions in the "Active-active" mode, which means that all cluster computers (nodes) you can actively complete the work by sharing the load, and share the work of a node in case of a fault. The main purpose of MSC is to improve application availability through its fault tolerance capabilities. Fault Tolerance refers to the function of porting a faulty application from a node to another healthy node in the cluster. After the faulty application is restored, the cluster should be able to implement "fault return" for the original cluster node ". On the premise that no data related to the faulty application is lost, the MSC can recover the applications running on the cluster and manage the fault return, it also maintains the user and application status during the fault recovery process. This type of cluster function is called the stateful cluster function. At the same time, you can continue to work during application upgrade. You can use the rolling Upgrade Method (for example, you can upgrade applications on one cluster node each time and ensure that applications on other nodes continue to be available) you do not have to stop using the application during the upgrade process.

SQL Server 2005 is Microsoft's next-generation data management and analysis solution. It provides better security, stability, and reliability for enterprise-level application data and analysis programs, making it easier to create, deploy, and manage. With the support capability of the Failover cluster mechanism, it can enhance the multi-instance support capability and the ability to analyze service objects and data backup and recovery, and improve the availability of the Analysis Service. It provides advanced scalability features such as Table Partitioning, Snapshot isolation, and 64-bit support, allowing you to easily build and deploy key applications. The partition function of tables and indexes significantly improves query performance for large databases.

4-node cluster implemented by Windows 2000 MSC

Performance indicators

This section describes the detailed technical indicators of the cluster system. During system planning, you can remove less important indicators in applications, or assign different weights to these indicators for professional technical performance comparison, select the database cluster system that best suits you.

Processing speed

Disk technology: all cluster systems can use the disk technology well. However, due to the negative impact of DM and FM on the disk system, they are relatively lacking.

Data segmentation: All Database Engine-based cluster systems have good data segmentation capabilities.

SMP: All Database Engine-based cluster systems have close SMP performance indicators.

Server Load balancer: Generally, the cluster system of the database engine only supports limited Server Load balancer because the backup data set is used. This indicator varies with different products.

Data availability

Processor and Software Redundancy: only some cluster systems support this function.

Communication Link redundancy: Generally, the communication link redundancy indicators of the cluster system with shared disks are low, and the cluster system indicators of Independent Disks are high.

Data redundancy:

Active asynchronous replication: In addition to disks and file images, other cluster systems support this function.

Active synchronous replication: all cluster systems support this function, and the detailed indicators are slightly different.

Passive asynchronous replication: the performance indicators of all cluster systems are similar.

Passive synchronous update: all cluster systems have similar performance indicators.

Wan replication technology:

Remote active asynchronous replication: all cluster systems support this replication technology, but the queue management capabilities are different. The performance of DM, FM, and RAID is relatively low. RAID does not support remote replication.

Remote active replication: ICX performs better in this aspect.

Remote passive asynchronous replication: DM and FM support this type of replication because DM and FM are transparent to clusters and work at the next layer of the cluster system, all cluster systems can use the functions they provide.

Remote passive synchronous replication: DM and FM support this type of replication, because this replication method can only be used when the distance is very close (using dual-mode optical fiber with a radius of five miles ). Similarly, because DM and FM are transparent to clusters, all cluster systems can use the functions they provide. If deployed, all cluster systems are similar.

Security

Password: This is the basic performance of all cluster systems. Distributed or centralized password protection ensures data security.

Database firewall: Database firewalls are rarely used in most database cluster systems, while ICX implements firewall functions on the path through which data passes.

Scalability of a dataset

Data Partition: All Database Engine-based cluster systems have data partitions to ensure the scalability of datasets.

Availability of data partitions: the performance indicators of all cluster systems are relatively close.

Cluster Management

Cluster Systems with shared disks, such as RAC and MSC, are easy to manage, with more RAC services. However, because each server in this system needs special processing, compared with the cluster system of an independent disk, it is much easier to manage (although it is not so easy to initialize and modify the configuration), but they all require the application to be non-transparent to the cluster, and the configuration and modification are also troublesome.

The Cluster Systems of Independent Disks, such as UDB and ASE, have relatively low performance. Because they all use non-shared disks, management is relatively cumbersome.

ICX has the same performance as the independent disk cluster system in terms of ease of management (initial configuration and future modification), but it has done a better job in the complexity of underlying data management. When underlying restoration of database engines and data is performed, the task needs to be directly performed on each database processor.

Disk tools, such as DM, FM, and RAID, are transparent to the cluster. Management is much simpler.

Application transparency

Because the application is not transparent in error responses and partitions, and they have some special requirements on the application, database Engine-based RAC, MSC, UDB, ASE, and ICX need to be improved. DM, FM, and RAID are completely transparent to applications.

IBM DB2 UDB

The large number of automatic or self-management features of DB2 UDB allow administrators to save more time to focus on business value-driven issues, and even eliminate the need for full-time administrators for smaller implementation projects.

The advantages of UDB are reflected in the open and unbounded nature of DB2: it supports mainstream operating systems such as Unix, Linux, and Windows, supports various development languages and access interfaces, and has good data security and stability. The high-availability Disaster Recovery Technology of DB2 V8.2 can restore key applications in a very short time. Using the DB2 Data Partition component (DPF) for horizontal scaling, it can support a large database cluster consisting of up to 1000 servers, providing a solid technical foundation for building an enterprise-level data warehouse. Using the data partition components of DB2 and the DB2 information integrator (DB2 II) technology, database operations can leverage the computing power of each server in the grid to implement grid operations in the real sense.

With more innovative technologies, the Design Advisor can help DBAs make comprehensive database Design decisions, including integrating complex functional division and materialized query tables, which greatly shortens the deployment time. The automatically generated summary of statistics represents the first deployment from the ibm leo r & D project. The policy-based Management and Maintenance function can be automatically executed for autonomous object maintenance features, such as table reconstruction, statistical information collection, and database backup. The high availability disaster recovery and client re-routing features provide 24x7 information availability and recovery capabilities required by enterprises with the on-demand capability. In addition, DB2 UDB provides in-depth integration or plug-ins with Java/Eclipse and Microsoft. net ide.

DB2 UDB Structure Topology

SYBASE ASE

The improvement of ASE performance is based on the virtual server architecture, which is a unique architecture of Sybase. The current ASE version is ASE15. Independent from the operating system and related software, so that ASE15 can perform system self-tuning more intelligently. VSA only requires a small amount of memory resources and internal switching overhead, so ASE15 can manage a large number of online users. The most important reason for enabling ASE to improve performance and control costs is that it adopts patented technology, self-tuning optimizer and query engine. It intelligently adjusts complex query operations and ignores data on partitions that do not contain relevant information. ASE15 also reduces operating costs through a series of new features used to manage and diagnose database servers.

ASE15 has high reliability and extremely low operational risks. The security of personal data is a field of special attention of ASE. It uses a unique encryption system that does not need to be modified. When applications and security software are connected, implementation costs are reduced and new security vulnerabilities are avoided. ASE15 also uses a simple, direct, and programmable scripting language to facilitate encryption and decryption. In solving unexpected shutdown problems, ASE15 has added many significant features to enhance system availability and disaster recovery, based on its proven reliability and high system utilization. The new storage engine supports four data partitioning methods for different partitioning operations on different physical devices. It helps database administrators quickly establish redundant disaster recovery nodes and synchronize databases on heterogeneous data platforms.

The new query and storage engine of the ASE15 system is designed to support the next generation of grid computing and cluster technologies. It combines the query processing mechanism that makes full use of the data partition technology and the optimizer technology that is suitable for solving cluster problems. At the same time, ASE15 provides an excellent database platform for event-driven enterprises. The architecture with web services and XML reduces the dependency between systems and provides more flexibility for application development.

ICX-UDS

ICX-UDS is not limited by database engine-based cluster technology and can support different databases.

It is similar to a common proxy server. Place ICX on a critical network path to listen to database system traffic. The ICX gateway automatically filters out stateless query access and distributes the load to all servers. Here, the gateway is like an online "compiler", which sends all database update operations to all databases for execution, the stateless query operation is only sent to one of the database servers.

For statistical reports and data mining applications, you can achieve faster processing speed through replication and read-only. You can also specify more read-only instances for Server Load balancer. The fault tolerance of the ICX gateway can be achieved through the backup gateway. Loading a non-synchronous database can create near real-time data sources that do not affect the master service cluster.

Configurations of ICX gateway and Server Load balancer

Application comments

The specific new management enhancements provided by Oracle RAC and Oracle databases enable enterprise grids. Enterprises of all sizes can use Oracle RAC to support various applications.

Enterprise grids use large-scale standardized commercial component configurations: processors, networks, and storage. Using Oracle RAC's high-speed cache merge technology, Oracle databases achieve maximum availability and scalability. Now, the use of Oracle databases and Oracle RAC will significantly reduce operating costs and further enhance flexibility, it dynamically provides nodes, memory, CPU, and memory features to maintain the service level more easily and efficiently, and further reduces costs through improved utilization. Enterprise Grid is the data center of the future, which enables enterprises to have higher adaptability, foresight and agility.

With the development of server hardware systems and network operating systems, cluster technology will gradually improve in terms of availability, high reliability, and system redundancy. We have collected mainstream products on the market and made a brief evaluation of the products from the perspective of analyzing performance indicators.

Sybase ASE is a highly popular high-performance database with an open, scalable architecture, easy-to-use transaction processing system, and low maintenance costs.

ASE can support traditional OLTP and DSS applications with key tasks and meet the development needs of Internet applications. Sybase can well meet the needs of enterprise business applications with key tasks, provides database reliability, integration, and high performance. The effective multi-clue structure, internal parallel mechanism, and effective query optimization technology of ASE provide excellent performance and scalability. It also provides advanced enterprise integration, robust and data access and data mobile technology, supports distributed transactions and queries across remote Sybase and non-Sybase databases. ASE further expands these functions to support personalized user access to business systems through enterprise information portals by distributing information and managing business transactions.

MSC is a good running method for applications such as email servers and database applications.

Suppose you decide to run Microsoft Exchange 2000 Server on a 4-node MSC cluster. After you install the MSC Software and the Exchange 2000 version for the cluster, you can configure the cluster so that Exchange 2000 can perform fault recovery on the backup node when the primary node fails. When a fault occurs, there must be a user session in the active state on the master server. However, MSC can quickly and automatically restore the fault without losing any data. The backup node will replace the workload and related data from the faulty node and continue to provide services to users.

The biggest advantage of ICX is that it has made new explorations in the challenges faced by the database cluster technology, this middleware-based database cluster technology provides a practical way to achieve high-scalability and high-performance databases, while flexibly adapting to future technological changes.

This middleware replication technology can be deployed on a critical network path to listen to all traffic in and out of the database system, to facilitate firewall and other security services, and to protect physical database servers. The processing latency is easily hidden through concurrent processing on multiple servers. Real-Time Parallel transaction replication: Once we break through the technical obstacle of Real-Time Parallel transaction replication, users can achieve high performance, high availability and high security through clusters composed of multiple database servers.

DB2 UDB is a database that can grow with the Enterprise. When the transaction demand for the website reaches the peak, it can respond quickly, and it can be expanded to accommodate the increasing number of information distributed across many different databases.

As the information infrastructure develops from one processor to multiple processors and then to multiple clusters with high parallelism, It also expands. Integrating partition and cluster technologies into the new DB2 UDB Enterprise Server Edition means that this version is flexible. DB2 UDB also adds autonomous database technology, which allows database administrators to choose to use enhanced automation technology to configure, optimize, and manage their databases. Independent database management means that administrators can spend less time managing daily tasks. Multi-dimensional clusters of tables reduce the workload of DBAs to create indexes and provide data clusters for quick query. DB2's built-in planned and unplanned availability capabilities ensure business applications are available at any time. Online utilities such as index rebuilding, index creation, and table loading, as well as configuration parameters that can be modified without stopping the database, all mean improved performance and high availability.

[Related] Characteristics of ideal database Clusters

Increase speed: You can increase the processing speed by simply adding database servers.

Data Synchronization: Multiple real-time data synchronization services are available at any time. It is best to have multiple remote data synchronization services.

Security Assurance: In addition to password protection, it is best to control illegal access to databases within the enterprise.

Scalability: ensure that we can increase the number of datasets without negatively affecting availability.

Generally, the technologies related to database clusters are very complex. What is more challenging is that the actual application requires that indicators in terms of speed improvement, data synchronization, security assurance, and scalability can be improved at the same time, rather than simply improving a certain indicator at the cost of other indicators. Comprehensively improving these technical indicators is a major topic for database cluster technologies.

[Term]

Cluster: A group of independent computers that run the same application and provide a single system image for the client and application in a collaborative way. The purpose of cluster technology is to further improve the scalability, availability and reliability through multi-layer network structure.

Scalability: a computer can process ever-increasing workloads while maintaining acceptability.

Availability: quality, backup capability, accessibility, and accessibility.

Reliability: refers to the system strength.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.