DHCP Failure Recovery (hot standby)

Last Update:2018-12-06 Source: Internet

Author: User

Tags failover

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

DHCP Failure Recovery (hot standby)
This version of the ISC dhcpserver supports the dhcpfailure recovery agreement, which is described in draw.ietf-dhc-failover-07.txt. This is not the final Protocol document and does not perform interactive compatibility operations with other products. In this way, you cannot assume that the product is suitable for standards. If you want to use this Failure Recovery Protocol, make sure that each machine that fails to recover runs the isc dhcp server of the same version.
The Failure Recovery Protocol allows two DHCP servers (and no more than two) to share a common address pool. Each server has half of the allocable addresses in the pool at any specified time, if one server fails, once the other server finds that the communication between the two servers is lost, it will continue to update the lease that is not in its own range, and allocate addresses that do not belong to their own address ranges. After a server becomes invalid for a long time, another valid server will need to allocate addresses managed by the other server and start to use these addresses. In this status, the server is placed in the partner-down (the partner fails) status.
You can use the command omshell (1) to set the server to the partner-down state. You can also set it to "invalid", edit the Last Server (PEER) status in the lease file, and restart the server. If you use this method, make sure that the start status Date and time are left blank.

Failover peer name state {

My state partner-down;

Peer state at date;

}

When an Invalid server is started and online, it should automatically detect that it has been offline and will request a server in the partner-down state to send complete update data, then the two servers are restored to the state of jointly processing data.
It is possible to enter a dangerous failure state: If you set one server a to the partner-down state, then the other server B will shut down. When the other server B recovers, B Does not know that A is in the partner-down state, and B will allocate addresses. Some of these addresses are allocated by A. That is to say, an address may be allocated to two different clients, this causes an IP address conflict. Therefore, if you set one server to the partner-down state, you must make sure that the other server is not automatically restarted.
The Failure Recovery Protocol defines the master server role and the secondary server role. The two tasks are different, most of the differences are in the communication method with the other party. In this way, one server must be configured as the master server, and the other server must be configured as the secondary server. It does not care which machine fails to be recovered (Failover startup.
When a server starts to work, if it has not resumed communications with its failed slave, it must first establish communications with the failed slave machine and then synchronize data, before you can provide services. This may happen when you have just configured your DHCP server to become one of the failed recovery slave servers, or one of the failed recovery partner servers fails and all of its data is lost. The process of initialization recovery is designed to ensure that one of the companion servers has lost its data and needs to be re-synchronized. All the leases assigned to an Invalid server before they expire will be considered valid. After it is started, it finds that it does not save the failed recovery data, it will try to communicate with its partner. After communication is established, the partner server is required to provide copies of all the lease databases, and then the partner server sends the complete database. After sending, a message is sent, indicating that the transfer is complete, the failed server waits until the mclt Information arrives. Once the mclt arrives, the two servers are switched to normal. The waiting time is determined by the mclt length set by the two servers.
When a server is restored from its failed State, its partner remains in the partner-down state, which means it provides services for all clients, and the failed server does not provide any services, until it is switched to normal.
When both servers find that they have never communicated with their partners, they are all restored as described above. In this case, the client can obtain the DHCP service only when the mclt expires.

Configuration Failure Recovery
In order to configure Failure Recovery, You need to configure the declaration statement of the Failure Recovery Protocol, and configure the recommendation server in each address pool that requires failure recovery. In a specified network, you do not need to configure Failure Recovery for each pool. One server cannot be configured to perform Failure Recovery on an address pool, while the other server cannot perform Failure Recovery on this address pool. For a specified address pool, whether or not the configuration for failed recovery should be the same. The statement of an address pool with recovery failure is as follows:

Pool {

Failover peer "foo ";

Deny dynamic BOOTP clients;

Pool specific parameters

};

The dynamic BOOTP lease is not suitable for failure recovery. As shown above, BOOTP customers in the address pool of failed recovery should not be allowed.
The server currently only performs a few checks on the configuration file. Therefore, if your configuration is incorrect, the server will be very strange. I would like to recommend that you do either fail to recover, or don't do it on the server. In addition, the same master configuration file is used for the two servers, and the files related to failure recovery are referenced separately. This helps reduce many configuration errors.
With the increase in use, the failure to restore the server will become fewer and fewer problems. The DHCPD. conf file of a basic master server is as follows:

Failover peer "foo "{

Primary;

Address anthrax.rc.vix.com;

Port 519;

Peer address trantor.rc.vix.com;

Peer Port 520;

Max-response-delay 60;

Max-unacked-Updates 10;

Mclt 3600;

Split 128;

Load Balance Max seconds 3;

}

Include "/etc/DHCPD. Master ";

The preceding statement Declaration has the following meanings:

Statement of primary and secondary server declaration:

[Primary | secondary];

It determines whether the role of the server is the master or the secondary node, which is described in the previous DHCP failover section.

Address statement

Address;

The address statement declares the IP address or domain name of the failed recovery partner that the server should listen on or connect to. It is also the identifier of the DHCP Failure Recovery Protocol server. This value is used to identify the position of the DHCP server in the failed recovery partner. It cannot be omitted. (This should be the IP address or domain name of the local machine, because the peer address statement below defines the IP address or domain name of the Peer. If it is interpreted literally, these two statements are actually conflicting. Let's look at the above example, which should also be the IP address or domain name of the local machine. We need to confirm the experiment)

Peer address statement

Peer address;

Indicates that the IP address or domain name of the partner fails to be restored.

Port statement

Port port-number;

The TCP port number that the local machine listens. The current option cannot be ignored because the TCP port number is not retained by the failed Recovery Protocol.

Peer Port statement

Peer Port port-number;

Specifies the TCP port number of the companion server to be connected. The current option cannot be ignored because the TCP port number is not retained by the failed Recovery Protocol. It can be a port number with the port statement.

Max-response-delay maximum response latency statement

Max-response-delay seconds;

The maximum response latency statement indicates the number of seconds after which the DHCP server does not receive the information from the partner. The DHCP server determines that the partner has expired. This value should be small enough. If a short network failure interrupts the connection between partners, it will not affect the communication between servers for too long, but it should also be large enough, allows the server to establish an interrupted connection. This parameter must be specified. (The explanation here is also problematic. You can set it as an example. The specific solution may depend on the environment for optimization)

Max-unacked-Updates statement

Max-unacked-Updates count;

The max-unacked-Updates statement tells the DHCP server how many bndupd messages can be sent before receiving bndack messages from the partner server. I don't have enough operation experience to show that this value is much better, but 10 seems to work, and this value must be specified.

Mclt statement

Mclt seconds;

The mclt statement defines the maximum client boot time (maximum client lead time). It must be specified on the master server and not on the secondary server. This is the lease update duration assigned by the partner server without communication. The longer the setting, the longer the running server will be able to resume to the partner-down; the shorter the setting, the larger the load when the server cannot communicate. It may be better to set it to 3600, but we have no practical experience.

Split statement

Split index;

The split statement specifies the load balancing between the master and slave servers. When a client sends a DHCP request, the DHCP server sets the IP address for the client based on the hash algorithm. If the value obtained by the hash algorithm is less than the delimiter value, the primary server responds. If the value is greater than or equal to the delimiter value, the secondary server responds. The unique valid value is 128 and can only be configured on the master server.

HbA statement

HbA colon-separated-hex-list;

The HbA statement specifies the separation between the primary and secondary servers as a bitmap instead of an interrupt (as a bitmap rather than a cutoff), which theoretically allows more refined control. In practice, such fine control is generally not required. An example of an HbA statement is as follows:

HbA FF:

00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00: 00;

This is equivalent to "split 128". Only one split or HbA can be defined, but not both. In most cases, better control of HbA is not required. Generally, split is used and HbA is rarely used.

Load Balance Max seconds statement

Load Balance Max seconds;

This statement allows you to configure a boundary. After the boundary is reached, the Server Load balancer is disabled. This limit is based on the number of seconds after the client sends dhcpdiscover or dhcprequest, And the secs field needs to be assembled on the client-fortunately, most clients do this. We recommend that you set this value to 3 or 5. The function of this statement is: if a failed recovery partner enters a state, it responds to another partner but does not respond to the client, A normal server will take over the load of another server.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More