Distributed Theory (4): Leases an efficient fault-tolerant mechanism for solving distributed cache consistency (RPM)

Source: Internet
Author: User

Cary G.gray and David R. Cheriton 1989

Translator: [Email protected] 2011-5-7

Source: http://duanple.blog.163.com/blog/static/70971767201141111440789/

[

Order: The so-called lease (leases), in fact, is a contract, that is, the server gives the client within a certain period of time can control the modification of the operation of the power. If the server wants to modify the data, the client that owns the lease of the data must first be consulted before it can be modified. When the client reads the data from the server, it often acquires the lease at the same time, and if no modification request is received from the server, the content in the current cache is guaranteed to be up-to-date in the lease term. If you receive a request to modify data within the lease term and agree, you need to clear the cache. After the lease expires, the client must re-acquire the lease if it also reads the data from the cache, which we call the "renew" {! Introduction to the << lease mechanism >>}. The lease is a very figurative term. Although the initial lease is used to resolve the Distributed file cache consistency, over time, the mechanism is gradually applied to more scenarios, such as in Gfs,chubby can see its application, of course, this is also the important reason for translating this article, for in-depth understanding of gfs,chubby, lease is a foundation. Further introduction and extension of the entire lease mechanism can be obtained through the introduction of the 1 lease mechanism. This article is just the content of the original paper that presented the lease mechanism.

]

Summary

The caching mechanism also brings the overhead and complexity required to ensure consistency, which in some way reduces the performance benefits achieved. In a distributed system, the caching system must also handle the additional complexity of communication and host failures.

The lease mechanism (leases), as a time-based mechanism, provides efficient and consistent access to cached data in distributed systems. By using it, you can ensure that non-Byzantine failures only affect performance, but do not damage correctness, while minimizing this impact by using short leases. Our analysis model and the file access evaluation results in System V show that short leases can provide good performance. As the size of the system increases and processor performance increases, the impact of leases on performance is even more pronounced.

1. Guidance

The cache introduces a consistency issue, which is the need to ensure consistency between the cached data and the major version data. Consistency here means that the cache, in addition to the performance improvements, should be equivalent to the case where only one data is present. For large caches, the overhead of maintaining consistency is a major factor affecting cache performance.

In a shared storage multiprocessor architecture, cache consistency has been deeply researched, and these studies are based on a reliable synchronous broadcast communication provided by a system bus. However, in a distributed system, a partial failure may occur: A host may be crash, and messages may be lost. The existing consistency policy for file caching can be divided into two categories: some of which assume that the broadcast is reliable, so there is no tolerance for communication failures, and others require a consistency check for each read operation, so there is no good performance.

In this article, we propose a lease as a consistency protocol that handles host and communication failures through physical clocks. Analysis model and file access evaluation in System V the results show that the short lease can provide near-optimal performance for large-size systems under fault-tolerant conditions. As the gap between processor performance and network latency increases and the overall failure rate increases, the benefits of the lease are more pronounced.

In the next section we will describe the lease mechanism and how to use it to achieve cache consistency. The 3rd section introduces a simple analysis model for determining the lease duration, and applies it to the V system. The 4th section describes some of the optimizations in lease management. The 5th section examines the fault tolerance of the mechanism. In the 6th section, leases is compared with other distributed cache consistency related research results and related issues. The final conclusion summarizes some possible applications of the lease mechanism and some future research directions.

2. Lease and cache consistency

A lease is, in effect, a contract that grants its holder a right in certain respects within a certain period of time. In the context of the cache: the lease grants its holder the right to write control of the corresponding data within the lease term, requiring the server to obtain permission from its lease holder before writing the data. When a lease holder is authorized for a write operation, it invalidates its local copy of the data at the same time.

A caching system that uses a lease mechanism needs to obtain a lease (in addition to that data) for the data before it returns data or responds to a write in response to a read operation. When a data item is obtained from the server side (the main version of the data item), the server returns a lease to ensure that the data is not modified by any client during the lease term unless the server has previously been authorized by the lease holder. If the data item is read again within the lease term (and the data is still in the cache), the cache can directly provide access to the data item without having to communicate with the server. After the lease expires, a read to the data item first requires the cache to extend the lease for the data item, and if the data item has changed after the lease expires, the cache needs to be updated. When a client modifies a data item, the server must defer the operation until all the lease holders have authorized it or its lease has expired.

For the convenience of explanation, we only consider writing direct (Write-through) caching, and it is easy to extend it to non-write-direct situations. Write-through provides a clear failure semantics: No client has observed a loss of write operations. Although some studies have pointed out that write-direct spending on file caches should be avoided as much as possible, this cost can be greatly reduced by special processing of hotspot files because they centralize most of the write operations.

To explain how a file cache using the lease mechanism operates, consider a non-hard disk workstation for document generation. When the workstation first executes latex, it acquires a lease for a certain period of time (for example, 10 seconds) for a latex binary file. After 5 seconds of access to the file, the cached version of the file can be used directly, without the need to communicate with the file server. When the lease term expires for 10 seconds, access to the file requires the cache and server traffic to check if the file changes. When a new version of Latex is installed, the write operation to the file is deferred to all lease holders after the write authorization. If some hosts that hold the lease are unreachable, you will need to wait until the lease expires.

In the previous example, the associated read-write operation object does not have to restrict the content of the file. In order to support a duplicate open operation, the cache must hold the binding and permission information to the file name to the file, and a lease for that information is required to perform the open operation with that information. Similarly, changes to this information, such as renaming a file, will result in a write operation.

A short lease (meaning a short validity period) has several advantages. One is that they minimize the latency caused by client and server failures (and failures due to network partitioning). When the server is unable to communicate with the client, the server must defer write operations on a file until the client holding the lease expires. When a server recovers from crashing, it must acknowledge that it crash the lease that was previously checked out. If it can remember that the maximum age of a lease that it has checked out is simple, it only needs to allow all write operations to wait for the end of the term, so the full recovery time depends on the maximum lease duration. In addition, the server can maintain a more detailed lease information through persistent storage, but this creates additional IO overhead, which is necessary unless the lease term is much larger than the recovery time.

Short leases can also minimize the occurrence of "pseudo-sharing". Even if a client no longer needs to read the data, but before its lease expires, any modification operation still needs to ask for its consent, which is called "pseudo-sharing". In particular, when a client writes a file, the lease of the file is held by another client, and the client does not have access to the file at this time. Pseudo-sharing introduces a callback cost for the lease holder (which in turn increases the client's request latency and the load on the lease holder and the server), in which case there is no conflict at all without a lease. In extreme cases, if a client does not access it at all before another client modifies the file, the lease duration should be set to 0. The short lease also means a larger renewal cost, so long leases are more effective for data that is rarely modified to be read repeatedly. Therefore, the choice of the lease period to weigh the failure delay, false sharing costs and renewal costs, and other factors, the server can be based on the data access characteristics and the nature of the client flexibility to set deadlines. In fact, if we set the lease term to zero, it is the equivalent of polling, at which time the modification is ready to be made, and the data is always contacted by the server. If the lease duration is set to infinity, it is equivalent to a callback, and each write needs to notify the client to invalidate its cache. }

Finally, short leases reduce the storage requirements on the server side because the associated records for expired leases can be reused. However, the storage overhead required by the server to track the issued leases is not very large. The server needs to record the identifier of each lease holder and a list of leases held by each lease holder, and each lease only needs to record a pair of pointers. Assume that each client holds approximately 100 leases, so that each client needs approximately 1k bytes. Even if storage is a problem, you can reduce storage by increasing the granularity of the recording leases, so that each client only needs to store fewer leases, but this can lead to increased competition. Later, we'll also explain how to reduce this overhead for the most widely shared files.

Long leases are more effective if files are often accessed concurrently with write sharing (write-sharing) for both the client and the server. This can be seen in the Andrew File System project, where a lease with a duration of 0 is used in the original prototype of the system, and in its revision, an infinity lease is used. In the next section, a lease performance model is described, and the appropriate lease duration is determined based on the data parameters of the V Distributed system.

3. Choice of lease term

The choice of lease duration requires a tradeoff between minimizing renewal overhead and minimizing pseudo-sharing. This tradeoff is ultimately designed to minimize service-side load and client response latency. Storage space is a secondary consideration and is usually ignored. Also, assuming a low failure rate does not have a noticeable impact on the average response time, especially when using short leases. Finally, we will only consider on-demand renewal instead of periodic renewals or other methods mentioned in section 4th.

3.1 A simple analytical model

Consider a system that consists of a single server, with performance parameters as shown in table 1:

That is, there is only one file on the server, there are n clients for the file, and the read and write operations of each client obey the Poisson distribution with r,w frequency. The contents of the file are shared through the S cache. Each client has a maximum of one lease for the file.

We assume that the message processing time for each sender and receiver is mproc, with the same message transmission delay Mprop for all hosts. Therefore, a message is sent from the time it is received to Mprop+2mproc, so that a unicast request and response takes 2mprop+4mproc. The multicast message is sent only once, and then the receiving end is accepted by the multicast mechanism, so sending a multicast message and receiving N responses requires 2mprop+ (N+3) Mproc. The cost of multicast is actually calculated in this way, the whole process is equivalent to 1->1 1->n. That is, the sending side first sends 1 messages in parallel to the N receiver, so that the cost of sending is Mprop+2mproc, the receiving end received, and then return the response, need to mproc time processing, mprop time transmission, and the sending side need to handle n response at this time, so is nmproc. So the total is mprop+2mproc+mproc+mprop+nmproc, namely 2mprop+ (n+3) Mproc. }

For a lease with a term of TS, the true validity period for the cache TC is actually:

Tc=max (0,ts-(Mprop+2mproc)-e)

For this cache, the TC must subtract the lease transfer time MPROP+2MPROC and the clock error e{! For example, the server issues a lease for a period of 10 seconds, but the lease goes to the cache for 2 seconds, and the clock at the cache is 1 seconds slower than the service-side clock. Then the real time for the cache is 7 seconds, because if the cache is also considered that the lease is 10 seconds, then when the server has already considered it expired, and the cache still think it is valid, if there is a write request, then the server can not get the authorization of the cache, because it has expired, However, when a request arrives at the cache, it still thinks the lease is valid, which results in access to the expired data, so this is not possible}. Therefore, if there is a large transmission delay and clock skew in the system, then the server must provide a sufficiently large lease term TS to ensure that the actual validity of the lease duration of the client is still more than 0.

If a delay exists within a lease term that handles the desired RTC read operation, and then the read operation that causes the lease request, the cost of a lease request is divided into 1+RTC read operations. Then the message processing rate associated with the service-side renewal is:

2nr/(1+RTC)

{! A total of n clients, each client's read request rate is R, plus the response, so if the server responds to each other is the rate of 2NR, but because there is now a cache and lease mechanism, each 1+RTC need a renewal process, so its pressure becomes 2nr/(1+RTC)}

and an average delay for each read request:

2 (Mprop+2mproc)/(1+RTC)

When a write request is received, the server sends a multicast request to all the lease holders and then processes the responses. Suppose writer is one of the lease holders, and if the write request itself implies authorization for its own cache, then you can omit one more authorization message {! That is, the server does not need to get the writer's authorization, because it is requesting write}. Therefore, only one multicast request is required to +s-1 the authorization information, which is a total of S messages. The time to obtain authorization is as follows:

ta=2mprop+ (s+2) Mproc

. {! Compared to the previous multicast cost, just at this time n=s-1}. Therefore, the delay is TA, the maximum load is NSW.

In the context of file caching, we usually assume that the lease duration is counted in seconds, and the message transmission time (including TA) is measured in milliseconds. Therefore, we do not consider TA and TS very close to the situation. In a special case, it is ts=0: it is better to know that 0 leases are preferable to short leases because a non-zero TS and a zero TC mean that the write operation is punished and the read operation is not profitable. {! That is, when TS is very small, it causes tc=0, so the lease is meaningless, because the cache exists but it is always invalid, so the read is not profitable, but the write operation still needs to continue to obtain the license of the lease holder, so instead of having a lease. }

When the lease term is 0, or there is no sharing, the load and delay are entirely dependent on the renewal cost. At S>1 and ts>0, the server is accepted and sent per unit time:

Consistency-related messages, and the average latency for each read and write increase is:

For a 0 term lease, the payload is 2nr{! write without obtaining a lease authorization}; If the following conditions are met, then a lease that grows up to TA can produce a smaller load:

2nr>2nr/(1+RTC) +NSW

The lease monetization factor is defined as:

Alpha = 2r/(SW)

If Alpha >1 and TC>1/R (A-1)

Then the conditions ahead will be set.

{! You can see the following derivation process

2nr>2nr/(1+RTC) +NSW

2r>2r/(1+RTC) +SW

2R-SW >2r/(1+RTC)

1+RTC > 2r/(2R-SW)

rtc>2r/(2R-SW)-1

rtc>sw/(2R-SW)

tc>sw/(2RR-RSW)

tc>1/(2RR/SW-R)

tc>1/(AR-R)

TC>1/R (A-1)

}

Therefore, the service-side load can be reduced by simply using a lease that is long enough. The larger the Alpha or R value, the better the short lease performance. Intuitively, Alpha indicates the ratio between read-write ratios and the additional overhead associated with sharing {! That is, alpha = (2r/w): S}.

When these analyses are extended to multiple files, we find that the load is basically a simple and multiple lease. Caches can process their renewal requests in batches, so that a single request may actually contain multiple leases. R,w to correspond to the ratios of all related files, so their values will be greater. The increase in the ratio of read operations increases the alpha value, so the effect is better. Typically, the cache should renew all leases it holds.

The load that the server uses to resolve the consistency basically depends on the number of messages it processes. If we know the load ratio that the service side uses to resolve the consistency in the case of 0 leases, then we can calculate the overall load based on the consistent load. In addition, the actual response time is added to other processing times in addition to the time to ensure consistency.

Given the appropriate parameters, the above formulas can be used to predict the performance of a system. This is what is in the following section.

Performance prediction in a 3.2 v system

We use the above formula to predict the performance of the lease in the V file cache mechanism, and the system performance parameters are collected through measurements, as shown in table 2.

Figure 1 shows the use of Equation 1 in section 3.1, resulting in a consistent service-side relative load with the duration of the lease change curve.

The curve labeled trace is the result of a simulated trace of the cache and the service, which is very close to the S=1 curve in our analytical model, which verifies the correctness of the model in this case. As you can see, the trace is sharper when the term value is very low, which is normal because the actual file access is definitely different than the Poisson distribution that we assumed. This discrepancy suggests that the performance of short leases is even better than we expected.

According to Figure 1, you can see that the advantage of a non-0 lease is basically just a certain number of seconds to obtain. For example, at S=1, at 10 seconds, the consistency overhead has been reduced by 0 of the lease 10%. You also need to consider the impact of the load used for consistency on the overall load across the service side. In the 0 lease term, consistency occupies 30% of the total cost of the service, so the real benefit is that 27%{! is 30% of 90%, which means that the previous 10% means that the consistency overhead is reduced to the original 10%, but the consistency overhead itself accounts for only 30% of the total cost, So it actually lowers the total cost of 27%}, which counts as much as 4.5% more than when the lease term is infinitely large. At S=10, when the lease duration is 10 seconds, the entire service-side load decreases by 20% compared to the 0 lease, which is more than a percent of infinity. It can be seen that as the lease time increases, the load reduction can be relatively small, while introducing the disadvantages of long leases. So a short lease (for example, 10 seconds) looks more cost-effective because it has the advantage of the short lease we described earlier, and somewhat reduced the server load.

Figure 2 shows the change in the average latency to each read and write operation due to consistency, and the lease duration. Because the write operation accounts for only a small portion of all operations, the delay caused by the write operation on the shared part has little effect on the overall average latency. , from S=1 to s=40, the curve is almost unchanged. At the same time, most of the benefits of a lease are basically available on the 10-second boundary. Because many programs have a certain amount of computing time between file accesses, there is no need to improve the response time for longer lease durations.

We hope that this conclusion can be applied to UNIX-like systems, although the measurement of our frequency of access is different from the published measurement of UNIX. For example, our read and write ratios are almost an order of magnitude larger than their results. There are several factors that lead to this difference, first of all we do not have intermediate file operations, because they have been processed by the V file cache system, in a way similar to the use of local disk cache files. Second, unlike many measurements, our measurement results include program loading and file information access (such as directory lookups), which are considered read operations. Finally, the number of read and write operations corresponds to the number of times a file is opened or closed due to write, rather than the number of times the data block is read and written, and the directory operation takes up a chunk of the read and write operations.

Given these factors, the results are basically close to the UNIX system. If you follow Unix semantics, read-write corresponds to block-level operations, which makes the read absolute frequency higher, but the ratio of read and write decreases. The performance of the lease mechanism in this system is still similar, and higher read frequency will cause the curve to have a more sharper trend, more suitable for short leases, and higher write frequencies will make it more sensitive to sharing.

Application of 3.3 in the future distributed system

Future distributed systems will have several trends: the system expands on a wider network, and communication latency increases. The processor speed will continue to increase. In one system, there will be a larger number of hosts, either client or server.

Greater transmission latency between the client and the server means that the lease is renewed and the time to expire becomes longer. Figure 3 shows the additional delay when the network round-trip time becomes 100ms, and all other parameters remain unchanged until the previous value. In this case, a 10-second lease can reduce the response to 10.1%, while a 30-second lease can be reduced to 3.1%. As a result, a slightly longer lease will be more appropriate with a significant increase in transmission latency, but a 10-30-second lease is basically sufficient.

A faster client processor slows down the computation time between read and write requests, which increases the number of read and write operations that occur during a lease period. The higher the frequency, the lower the load curve pressure.

The increase in the number of client-side services is not significantly affected unless this increases the level of write sharing (write-sharing), which we expect will not occur. In fact, according to Multics measurements over the past 12 years, the level of write sharing has not increased significantly.

4. Lease Management Configuration

The server-side lease management allows for some configuration that may improve performance. The service side controls the duration of the lease it is checking out, and it can also freely choose to wait for a lease to expire instead of requesting authorization for a write operation. The client is free to decide when to apply for a renewal, when to give up a renewal, and when to consent to a write operation. These configurations combine to provide a different tradeoff between load and response time.

For example, the client can expect the lease to expire and renew it before the corresponding file is accessed. This can increase response time by reducing the latency of read operations, but increases the server load. In particular, an idle client may continue to request a renewal, even if its files are not accessed, but because the cache continues to hold leases this may increase the competition caused by pseudo-sharing.

The server side can use these configurations to optimize installed files, and installed files typically have a large amount of shared access. Installed files usually refer to those such as command-line programs, header files, and standard libraries, which are usually standard in the system. These files are widely shared, often read but rarely written. The statistical results of the V system show that they occupy almost half of the read operation, but almost no write operations. For the processing of such installed files, it is possible to optimize by using a very small number of leases on top of them, such as a single lease on a directory, and periodically proactively sending renewal authorizations to all clients for these installed files to reduce client renewal requests for those leases. In addition, if the corresponding file of a lease is to be modified, the server will simply remove the lease from the periodic multicast. When the lease expires, the write operation can be performed. In this way, even if the file is updated, the server does not need to communicate with a large number of clients and does not cause the aggregation of the response. This optimization can also reduce unnecessary delays if the server has a high probability of waiting for the lease to expire due to a client unreachable. This optimization also reduces the amount of information that the server needs to keep for the lease holder of the installed files. Finally, it also reduces the delay in client-side caching of read operations against these installed files, because their leases do not expire if the files are not changed.

Finally, in addition to the transmission delay to the client, the server can also set the lease duration based on the file access characteristics. In particular, a file that has a heavy write share should have a lease duration of 0. For a further client, the lease duration should be extended to compensate for the reduction in the actual validity period of the lease due to the additional delay caused by the transmission delay and client renewal. More generally, if all the necessary performance parameters can be monitored, the server can dynamically select the appropriate lease duration for each file and client cache based on the analysis model.

5. Fault tolerance

If the host and the network do not encounter Byzantine failure and clock failure, then the lease mechanism can guarantee consistency, especially even if there is a loss of information (including network partition), the client or the server failed (assuming the write operation can ensure that the server can still be in a consistent state when crash). However, availability is not degraded by caching because an unreachable client can cause a short delay to be generated by the other client's write operations.

Leases are dependent on good clocks. In particular, if the service-side clock goes too fast, it may produce an error because it may allow a write operation before the client-held lease has expired. Similarly, if the client clock goes too slowly, it may still hold a lease that the server thinks has expired. In contrast-a slow service-side clock and fast client clock-do not cause inconsistencies. However, additional overhead may be incurred because the client may decide in advance that a lease expires. This error is relatively uncommon compared to process crash or communication failures, which can be detected quickly by a synchronization protocol or by including a definite timestamp in the lease-related information.

We think that in distributed systems, the clock error of a node is usually guaranteed to fall within a value E, which is less than the second length of the lease term. This synchronous clock is also often required by other file access requirements, such as the Make tool for UNIX. At a minimum, if the lease mechanism is to be guaranteed to be correct, at least the clock error has a known upper bound, which can then be adjusted to the length of the lease term.

6. Related Research

Slightly.

7. Conclusion

The lease mechanism is a kind of efficient fault tolerant method for maintaining cache consistency in distributed system. In this paper, we have analyzed its performance and evaluated it in a real system environment, verified its fault tolerance, and considered its application in other distributed systems, especially the future large-scale high-performance system.

The analysis model estimates the consistency load of the service side and the cache request delay due to the consistency by the lease duration, read/write ratio, sharing degree and message transmission time parameters. This model provides a basis for dynamically determining the lease duration based on the feature of file access.

Short leases have many advantages over long leases, which can reduce the write latency that results when the client crash, but also reduce the recovery time when the server crash, and also reduce pseudo-sharing. The lease mechanism can be applied to large-scale distributed systems well. Its improved response time is better for fast processors and high latency networks, in which case round-trip latency with the client becomes a major overhead and affects the choice of lease duration. Handling the lease overhead generated by large-scale clients can be reduced by classifying them according to the file access characteristics. For example, installation files-which are highly shared and accessed, but rarely written-can be renewed on the service side through periodic multicast, and deferred updates to avoid the overhead of displaying the lease invalidation.

The lease mechanism provides for strict consistency under non-Byzantine failure. The failure of some components only degrades performance, and this effect can be minimized through short leases. Another key hypothesis is that the clock has a proper accuracy and at least guarantees that it will float within a certain range without the time synchronization mechanism. We consider synchronized physical clocks to be critical to systems that use this lease mechanism.

There are still some deficiencies in the current work. First, we used a simplified model of file sharing, with a focus on the low level of sharing. While most systems share a relatively low degree of sharing, there may be some special cases. Second, our analysis model is just an approximate model, ignoring many important factors such as queue delay (queueing delays). Finally, the application experience of the lease mechanism in the actual system is very limited. We are currently using it to extend the file caching service of the V system, and more data will be available through the service in the future. We also plan to explore an adaptive mechanism to change the duration of the lease and overwrite the file to replace the current static mode.

In addition to file consistency, the lease mechanism has some other applications. In particular, it can also be applied to large-scale shared memory multiprocessor. Of course, the effect depends on the memory and cache overhead, as well as fault tolerance.

As a communication coordination and decision Mechanism, the lease depends on the timing, the accuracy of the message transmission measurement, and the ability to obtain the conclusion after the interruption of communication. We are applying this mechanism to other areas, such as distributed transaction management protocols and transport protocols. In addition, we believe that leases will be used as a basis for a variety of distributed systems in more places.

Translated documents:

1. Introduction to the http://blog.csdn.net/kevinfankai/archive/2009/03/25/4024937.aspx lease mechanism

Reference documents:

Slightly.

Original address: http://duanple.blog.163.com/blog/static/70971767201141111440789/

Distributed Theory (4): Leases an efficient fault-tolerant mechanism for solving distributed cache consistency (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.