Introduction to Windows 2008 Failover Clustering

Last Update:2015-01-09 Source: Internet

Author: User

Tags failover network function

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Reprint: http://dufei.blog.51cto.com/382644/902026

Today, customers ask about Windows Clustering, after all, the cluster technology supported by Windows SERVER2008 and Windows Server2003 are a bit different, and the following knowledge is collected for reference. The following items are from the Internet.

Windows Server 2008 provides two clustering technologies: a failover cluster and a network that is responsible for balancing the cluster. Failover clustering primarily provides high availability, and Network Load Balancing clusters primarily provide scalability to improve web-based service availability while enabling scalability. Some data say there is also a component load Balancing cluster that supports high-performance computing.

Whether a failover cluster or a Network Load Balancing cluster is selected in the application depends primarily on the running application having a long-running in-memory state: The failover cluster is designed for applications that have a long-running in-memory state or have large, frequently-updated data states. These applications are called state applications, and they include database applications and messaging applications. Typical uses of a failover cluster include file servers, print servers, database servers, and messaging servers. Network Load Balancing applies to applications that do not have a long-running state in memory. These applications are stateless applications. A stateless application treats each client request as a standalone operation, so it can load balance each request independently. Stateless applications typically have read-only data or infrequently changed data. Front-end Web servers, FTP servers, and proxy servers typically use Network Load balancing. NLB can also support other TCP or UDP-based services and applications. We mainly describe the configuration of the failover cluster today. A failover cluster can be configured to use several different configurations. The servers that make up the cluster can be either active or inactive, and different servers can be configured to take over the appropriate resources immediately after an Active server failure. The general failover process takes only a few minutes, and the length of time depends primarily on the configuration and application of the cluster, and when the node is active, all resources can be used on that node. When a server fails, the resource group configured with the failover cluster on this server is taken over by the other server. When the failed server is back online, the Cluster service can be configured to allow the original server to fail back, or to allow the current server to continue processing new client requests. Windows Server supports three basic types of failover clusters:Single node cluster, single quorum multi-node cluster, majority node cluster. Single-node clusters: A single-node cluster cannot be used for failover, and is typically used for simpler shared resources and network storage management. The main advantage is to monitor and automatically restart applications and dependent resources when a failure or stop response occurs. A single-node cluster can be used as a file, print, or Web share, and this cluster is primarily about making it easier for users to access the appropriate resources without the need to provide additional complex functionality. This kind of practical application should be relatively small. Multi-node Clusters: More commonly used are multi-node clusters: The multi-node cluster model includes "active, active" and "active, inactive" modes. In "active, inactive" mode, one or more servers are active to handle requests from clients, while others are idle. In "active, active" mode, all nodes are active, can handle client requests, one of the active nodes fails, and the other node takes over the work of this node until the node returns to normal. A quorum device exists in a multi-node cluster configuration, and all node shared cluster configuration information data is stored on the storage device of the quorum device. A majority of node clusters: There is also a failover cluster type that is the majority of nodes, where nodes can have their own storage devices without having to connect to a shared storage device. The cluster configuration data can be saved on multiple disks within the cluster so that each node can have its own quorum device. A majority of node cluster configurations are typically used to distribute service baths in different locations. Most important, this is because each node can have its own storage device, as well as a local copy of the cluster configuration data. Failover cluster resources:Resources are the basis for a failover of a cluster, a relationship or interdependent resource can be considered a resource group, and all resources in the same group must reside on the same node. If one of the services fails, all services may have errors. When you add an app as a highly available resource, you need to determine whether the app can run within a clustered environment. The ability to run in a clustered environment and to support cluster events is called Cluster awareness. Cluster-aware apps can register in a failover cluster for status and notification information such as DFS, DHCP, Exchange servers, file servers, Internet storage name servers, MSDTC, SQL Servers, and so on. Applications that do not support cluster events are known as cluster-unaware, and some cluster-aware applications can also be configured as high-availability resources and fail over. Hardware requirements:The configuration of the hardware should be tuned to maximize overall throughput and optimize performance for high-demand applications and services. Different servers have different optimization requirements, such as Web servers that handle static HTML pages that may require faster hard disks and more memory to cache the pages in memory, but usually do not require high-speed CPUs. A typical database server might require a high-end CPU, a fast hard disk, and more memory. Administrators should carefully optimize each server in the cluster nodes. When optimizing, the key content that can get the most benefit is the paging file. The important rules for paging files are as follows: To prevent an overdose, the paging file size should be fixed and should not reside on a shared cluster storage device. The size of the paging file should be set according to the hardware device manufacturer's recommendations. If you have more than one local hard disk, consider placing the paging file on a separate hard disk to improve performance. It is also important to note that the same cluster of servers must be running the same hardware-schema version of the Windows Servers 2008 operating system, such as using the x64 or Itanium versions. The servers in the cluster must be members of the same Active Directory domain, and DNS is required for name resolution. Cluster objects:The Cluster service is responsible for managing all the capabilities of the failover cluster, including the physical and logical units and objects to be used in the cluster. A Cluster object contains properties that define the behavior of an object within a cluster. The Cluster API contains the control code and management capabilities required to manage objects through the Cluster service, and each node in the cluster needs to run an instance of Cluster service Lussvc.exe, cluster Network driver Clusnet.sys, and cluster disk-driven Clusdisk.sys. The cluster server transmits the heartbeat through the private network adapter and other nodes, which means that each node network driver periodically sends UDP packets between nodes to detect the network and routing status, and if a node is not responding, a problem occurs, and the cluster network driver notifies the Cluster service to fail over. The cluster disk drive for each node is primarily responsible for maintaining exclusive ownership of the shared disk. Only the node that owns the physical disk resource can access the disk, and all other nodes cannot access the disk resource. Cluster database: In the normal operation of the cluster, the transmitted information in addition to the heartbeat needs to pass the management data, which is basically stored in the cluster database, which contains the configuration of the cluster and resource usage information, known as cluster objects, the Cluster service maintains this information on a regular basis. cluster quorum resource:The quorum resource is primarily used to maintain the recovery log, and all changes to the cluster are written to the recovery log to ensure that the cluster is configured and in a state to be restored. Quorum resources generally play a role in the event of a failure. The cluster full functionality depends on quorum, and when the cluster is configured, the Cluster service automatically sets the necessary quorum settings for the cluster, and the quorum setting determines the maximum number of failures that the cluster can withstand. If an additional failure occurs outside of this number. The cluster will stop working. For example, if there are four nodes in a cluster, if two nodes fail, half of them will not continue, but if five nodes, two fail, you can fail over and continue working. cluster interface and network status:The Cluster service monitors the adapter interface for the private network and the status of the cluster network between cluster nodes in order to determine whether the other party is failing. Users can use cluster netinterface or the Failover Cluster Management tool to view the current state of the network interface. The general status is as follows: up: normal working state, can communicate with other interfaces on the network. Unknown: The state cannot be judged at this time. Unavailable: The interface is blocked for cluster, or network interface failure to connect nodes. Unreachable: The node cannot communicate through this interface. Failed: The node associated to the interface is active, but cannot communicate through the interface. Users can also use the Cluster Network command or tool to view the status of the networks. Up: normal working state. Unknown: The state cannot be judged at this time. Unavailable: The network is disabled for clustering, or all nodes connected to the network are inactive. Partitioned: Network function failure, some active clusters are unable to communicate over the network. Down: The network fails, and all active clusters are unable to communicate over the network. If a network interface is in the failed state, the Cluster service initiates a failover of all IP address resources that use the network interface. However, if the status of the network interface is unreachable, the Cluster service will not be transferred, and if the interface state is unavailable, the Cluster service will assume that the node is down. Usually the cluster network should be up, in which case the cluster network is working perfectly and all active clusters can communicate. If the network becomes a partitioned state, this means that one or more nodes have a problem communicating, or have recently failed. The down state indicates that the current network has failed and is not functioning properly. In the down state, the cluster is unable to communicate over the network.

Introduction to Windows 2008 Failover Clustering

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More