When an organization adds applications and services, centralization of authentication and password services can increase security, reduce management overhead, and reduce the burden on developers. However, clustering all services on a single server may cause reliability problems. High Availability is especially critical for Enterprise Authentication Services, because in many cases, when authentication stops, the entire enterprise will be stuck.
We use the Lightweight LDAP Directory Access Protocol, Lightweight Directory Access Protocol) server to provide authentication services, and various applications can subscribe to these services. To provide highly available LDAP servers, we use the heartbeat software package from the Linux-HAwww.linux-ha.org) initiative. We also provide an example to set up an Apache Web server to use LDAP authentication.
Some background knowledge about LDAP
We use OpenLDAP Software Package www.openldap.org), which is part of Several Linux distributions. It is provided along with RedHat 7.1, and its current download version is 2.0.11.
The LDAP standard is defined in RFC 2251 and 2253. There are several LDAP business implementations, including the implementation of University of Michigan and the implementation of Netscape. The OpenLDAP foundation is created at www.openldap.org for "developing robust, commercial-level, fully functional, and open-source LDAP applications and development tool kits ). OpenLDAP V1.0 was released on December 1, August 1998. Its current main version was released in 2.02000), with support for LDAPv3 added.
Like all good network services, LDAP is designed to run across multiple servers. This article uses two LDAP features-Replication)AndReferenceReferral ).
The reference mechanism allows you to split LDAP namespaces across multiple servers and arrange LDAP servers in a hierarchical manner. LDAP allows only one master server in a specified directory namespace.
Copy the daemon by using OpenLDAPSlurpdDriver. Slurpd wakes up regularly and checks the log files on the master server to check for updates. The update is then passed to the slave server. Read requests can be responded by any server, but updates can only be performed on the master server. A reference message is generated for an update request from the slave server. The address of the master server is provided. Tracking, referencing, and retrying updates are the responsibility of the client. OpenLDAP does not have a built-in method to distribute queries across Replicated Servers. Therefore, you must use an IP transmitter sprayer)/fanout) for programs suchBalance).
To achieve the reliability goal, we organize a pair of servers together in a cluster. We can use shared storage between servers to maintain a copy of data. But for the sake of simplicity, we choose to implement shared-nothing. The LDAP database is usually very small, and the update frequency is low. Tip: If your LDAP data set is indeed large, consider dividing the namespace into smaller parts by reference ). When restarting a faulty node, you must note that all new changes must be added to the database on the faulty node before the node is restarted. We will demonstrate an example later.
Cluster software and Configuration
Before getting started, let's clarify a subtle confusion. Most HA High Availability (High Availability) clusters have a system named "heartbeat" to support the system-keepalive function. The HA software uses heartbeat to monitor the health status of nodes in the cluster. The Linux-HA www.linux-ha.org) Group provides open source cluster software. Their package name is Heartbeat currently Heartbeat-0.4.9 ). This may lead to some understandable obfuscation, and sometimes it also makes me confused ). In this article, we will refer to the Linux-HA package as "Heartbeat" and the general concept as "heartbeat ".
The Linux-HA project was launched in 1998 as a product written by Linux-HA HOWTOHarald Milz. The project is currently led by Alan Robert son and many other volunteer developers. Version 0.4.9 was released in the early 2001 S.
Heartbeat monitors the availability of nodes through communication media. The media is usually a serial line or Ethernet. We use a serial line and an Ethernet link. Each node runs a daemon process named "heartbeat "). The main daemon sends a child process to read and write each heartbeat media and derives the state process. When it is detected that the node is terminated, Heartbeat runs the shell script to start or stop the service on the secondary node. During design, it is required that these scripts use the same syntax as the system init script usually located in/etc/init. d. It also provides default scripts for file systems, Web servers, and virtual IP failover.
Suppose there are two matching LDAP servers. We can use several configurations. First, we can implement cold backup '. The master node has a virtual IP address and a running server. The secondary node is idle. When the master node fails, the server instance and IP address will be transferred to the "cold" node. This is easy to implement, but data synchronization between the master server and the secondary server is a problem. To solve this problem, we can configure the cluster in this way, that is, the running server is configured on both nodes. The master node runs the master LDAP server, and the slave node runs the slave instance. Updates to the master server are immediately transmitted to the slave server through slurpd.
The failure of the master node causes the secondary node to respond to the query, but we cannot update it now. To provide updates, We need to restart the secondary server and upgrade it to the primary server during failover.
This gives us the complete LDAP service, but adds a problem-if an update is made to the secondary server, the master server must be repaired before it can be restarted. Heartbeat supports the 'Nice failback' option, which prohibits faulty nodes from re-obtaining resources after failover, which is more satisfactory. In this article, we will manually demonstrate restarting. Our sample configuration uses the virtual IP address tool provided by Heartbeat. If you need to support heavy query load, the virtual IP address is replaced by an IP transmitter that distributes queries to the master and slave servers at the same time. In this case, an update request to the slave server will generate a reference. The referenced trace is not automatic; the feature must be built into the client application. In addition to copying pseudo commands, the master and slave nodes are configured in the same way. The configuration file of the master server specifies the location of the log file to be copied to 16th rows), and there is a list of slave servers, these slave servers are the destination for copying 34th-36 rows with creden。 information ).
34 replica host=slave5:389 35 binddn="cn=Manager,dc=lcc,dc=ibm,dc=com"; 36 bindmethod=simple credentials=secret
|
The slave server configuration file does not specify the master server; instead, it lists the 33rd lines of creden。 required for replication ).
33 updatedn "cn=Manager,dc=lcc,dc=ibm,dc=com"
General Heartbeat preparation
There are several good basic Heartbeat configuration examples for use. See the end of this Article ). The following content is related to our configuration. Our configuration is very simple, so there is not much content. By default, all configuration files are saved in/Etc/ha. d/.
Ha. cfIncludes the global definition of the cluster. We use the default value for all timeouts.
# Timeout intervals keepalive 2 # keepalive could be set to 1 second here deadtime 10 initdead 120 # define our communications # serial serialportname ... serial /dev/ttyS0 baud 19200 # Ethernet information udpport 694 udp eth1 # and finally, our node id's # node nodename ... -- must match uname -n node slave5 node slave6
|
HaresourcesThis file is used to configure failover. There is something interesting at the bottom of the file.
slave6 192.168.10.51 slapd
Three things are specified here. The main owner of the resource is the node 'slave6 '. The name must match the 'uname-n' output on the master node ). The virtual IP address of our service address is '192. 168.10.51 '. This example is completed in the dedicated lab network, so we use the 192 IP address ). The service Script Name Is 'slapd '. Hearbeat searches for scripts in/etc/ha. d/resource. d and/etc/init. d.
Service script
For simple "Cold backup" cases, we can use the standard/etc/init. d/slapd script without modification. We want to do something special, so we have created our own slapd script, which is stored in/Etc/ha. d/resource. d/. Heartbeat places the directory first in its search path, so we don't have to worry about running the/etc/init. d/slapd script. However, you should check to ensure that slapd is no longer started at boot from the/etc/rc. d tree structure to remove all S * slapd files ). First, specify the Server Load balancer STARTUP configuration file in lines 17th and 18th.
The script follows the standard init. d syntax, so the startup information is included in the test_start () function starting from row 21st. First, we stop all currently running slapd instances. In row 3, we use the master server configuration file to start the master server. Our design will follow this rule: if both the master node and the secondary node are started, slapd will be started as the primary service script on the master node, and slapd will be started as the slave service script on the secondary node, and start the replication daemon. If only one node is started, slapd is started as the master service script. Bind the virtual IP address to the Server Load balancer master service script. To accomplish this, we must know which node is executing the script. If it is the master node, we need to know the status of the secondary node. The important content is in the 'start' branch of the script. Because we have already specified the master node in the Heartbeat configuration, we know that when the test_start () function runs, it runs on the master node of Heartbeat because Heartbeat uses/etc/init. d/script, so all scripts are called with the parameter "start | stop | restart ). When a script is called, Heartbeat sets many environment variables. We are interested in the following:
HA_CURHOST=slave6
You can use the 'Ha _ curhost' value to know when it is being executed on the master node slave6) and when the HA_CURHOST is in failover should be 'slave5 '). Now we need to know the status of another node. To understand this, you can "Ask" Heartbeat. We will useApi_test.cFile, and create a simple client to "ask" the node status api_test.c. A lot of content is related to the client. We only need to remove unnecessary content and then add an output Statement ). Note that the program executes the query of the 31st rows.
Heartbeat
After compilation, we install the file in/Etc/ha. d/resource. d/. The program name is 'other _ state '. The following is a link to access the complete failover script. We will start with the sample script provided with Heartbeat and add some modifications:
Start script
Test
Now we can start Heartbeat on both servers. The Heartbeat document contains some information about the basic test settings, so we will not repeat it here. When two heartbeat media are connected, you should see that Six heartbeat processes are running. To verify failover, we have performed several tests. To provide clients for testing, we have created a simple KDE application that queries the server and displays the connection status. In this case, the real client only queries virtual IP addresses, but we query all three IP addresses for demonstration. For this test, we send 10 thousand queries per hour.
S6 is the master LDAP server, and S5 is the active backup server. The box below represents the virtual IP address. Under normal conditions, both S5 And S6 are green, indicating a successful query.
First, stop the heartbeat process on the master node. In this case, resources are obtained after the node times out in 10 seconds on the machine, as shown in the log excerpt: The takeover process includes an additional 2 seconds delay in the startup script.
Sep 7 10:28:21 slave5 heartbeat: info: Running /etc/ha.d/rc.d/shutdone shutdone Sep 7 10:28:32 slave5 heartbeat[3381]: WARN: node slave6: is dead Sep 7 10:28:32 slave5 heartbeat[3381]: info: Link slave6:/dev/ttyS0 dead. Sep 7 10:28:32 slave5 heartbeat[3381]: info: Link slave6:eth1 dead. Sep 7 10:28:32 slave5 heartbeat: info: Running /etc/ha.d/rc.d/status status Sep 7 10:28:32 slave5 heartbeat: info: Running /etc/ha.d/rc.d/ifstat ifstat Sep 7 10:28:32 slave5 heartbeat: info: Running /etc/ha.d/rc.d/ifstat ifstat Sep 7 10:28:32 slave5 heartbeat: info: Taking over resource group 192.168.10.51 Sep 7 10:28:32 slave5 heartbeat: info: Acquiring resource group: slave6 192.168.10.51 slapd Sep 7 10:28:32 slave5 heartbeat: info: Running /etc/ha.d/resource.d/IPaddr 192.168.10.51 start Sep 7 10:28:32 slave5 heartbeat: info: ifconfig eth0:0 192.168.10.51 netmask 255.255.255.0 \ broadcast 192.168.10.255 Sep 7 10:28:32 slave5 heartbeat: info: Sending Gratuitous Arp for 192.168.10.51 on eth0:0 [eth0] Sep 7 10:28:32 slave5 heartbeat: info: Running /etc/ha.d/resource.d/slapd start Sep 7 10:28:32 slave5 heartbeat: info: /etc/ha.d/resource.d/slapd: Starting
|
The following is the query flow of the application:
The master node acts as the machine, and now the secondary node provides a virtual IP address. S5 and virtual IP are displayed in green, server S6 is unavailable, and the indicator is red.
After the cluster is restarted, a fault is caused by power failure on the master node. After 10 seconds of timeout, the secondary node obtains the resource again. Finally, we unplugged the serial interface and the Ethernet interface to simulate a complete failure in connecting two nodes. The failure of Inter-node communication causes both machines to attempt to act as the master node. This situation is calledSplit brain split-brain )". The default behavior of Heartbeat in this case shows why Heartbeat requires multiple interconnected media that use different media. In shared memory settings, memory interconnection can also be used as a heartbeat media, which reduces the chance of split brain. The following is a sample from ha-log that shows the shutdown process:
heartbeat: 2001/09/07_14:49:46 info: mach_down takeover complete. heartbeat: 2001/09/07_14:50:36 ERROR: TTY write timeout on [/dev/ttyS0] (no connection?) heartbeat: 2001/09/07_14:52:53 WARN: Cluster node slave6 returning after partition heartbeat: 2001/09/07_14:52:53 info: Heartbeat shutdown in progress. heartbeat: 2001/09/07_14:52:53 ERROR: 105 lost packet(s) for [slave6] [191:297] heartbeat: 2001/09/07_14:52:53 ERROR: lost a lot of packets! heartbeat: 2001/09/07_14:52:53 info: Link slave6:eth1 up. heartbeat: 2001/09/07_14:52:53 WARN: Late heartbeat: Node slave6: interval 211920 ms heartbeat: 2001/09/07_14:52:53 info: Node slave6: status active heartbeat: 2001/09/07_14:52:53 info: Giving up all HA resources. heartbeat: 2001/09/07_14:52:53 info: Running /etc/ha.d/rc.d/status status heartbeat: 2001/09/07_14:52:53 info: Running /etc/ha.d/rc.d/ifstat ifstat heartbeat: 2001/09/07_14:52:53 info: Running /etc/ha.d/rc.d/shutdone shutdone heartbeat: 2001/09/07_14:52:53 info: Releasing resource group: slave6 192.168.10.51 slapd heartbeat: 2001/09/07_14:52:53 info: Running /etc/ha.d/resource.d/slapd stop heartbeat: 2001/09/07_14:52:53 info: /etc/ha.d/resource.d/slapd: Shutting down heartbeat: 2001/09/07_14:52:53 info: Running /etc/ha.d/resource.d/IPaddr 192.168.10.51 stop heartbeat: 2001/09/07_14:52:53 info: IP Address 192.168.10.51 released heartbeat: 2001/09/07_14:52:54 info: All HA resources relinquished. heartbeat: 2001/09/07_14:52:54 info: Heartbeat shutdown in progress. heartbeat: 2001/09/07_14:52:54 info: Giving up all HA resources. heartbeat: 2001/09/07_14:52:54 info: All HA resources relinquished. heartbeat: 2001/09/07_14:52:55 info: Heartbeat shutdown complete.
|
This issue should be considered when the timeout value is selected. If the timeout time is too short, the system with heavy load will trigger the take-over error, resulting in a clear split-brain shutdown. For more information, see the Linux-ha FAQ document.
Recovery after failover
If the LDAP namespace is updated when the master LDAP server is on the machine, you must re-Synchronize the LDAP database before restarting the master server. There are two ways to do this. If the service can be interrupted, You can manually copy the database after the LDAP server is stopped. By default, data files are stored in/usr/local/var ). You can also use OpenLDAP replication to restore the database without service interruption. First, the LDAP server was started as a slave server on the previous master node. Then start the slurpd daemon on the current master node. The changes received when the previous master node exited the service will be "Pushed" to the original master node. Finally, stop the LDAP server from the master node and start Heartbeat. This will cause a fault response to the original configuration.
LDAP configuration for Apache
Here is an example of an application that subscribes to the LDAP server. This application is an Apache Web server and uses the mod_auth_ldap software package.
Conclusion
This article is a very simple example, using open source software to create some highly available basic network services. Network services, including LDAP, rarely require large servers. The additional reliability provided by the cluster and the replication of servers and data files can increase service availability. The system has gone through all tests and has been performing failover within 15 seconds under all circumstances. If you have a good understanding of the system load and utilization, you can reduce the failover time to below this threshold.