Java Interview 6-10

Source: Internet
Author: User

6, database and cache inconsistency problem

This article mainly discusses a few questions:

(1) The Origin of "cache and database" requirements

(2) "Retire Cache" or "Update Cache"

(3) Operation timing of cache and database

(4) Analysis of cache and database schema

first, the origin of demand

Introduction to the scene

Caching is a common technique for improving the read performance of a system, and we often use caching to optimize for applications where there are few read-write scenarios.

For example, for the user's Balance Information table account (UID, money), the business requirements are:

(1) Query the user's balance, SELECT money from Account WHERE uid=xxx, which accounts for 99% of the request

(2) Change user balance, UPDATE account SET money=xxx WHERE uid=xxx, which accounts for 1% of requests


Since most of the requests are queries, we set up the UID-to-money key-value pairs in the cache, which can greatly reduce the pressure on the database.

Read operation Flow

With the database and cache two places to store the data (Uid->money), the operation flow is generally the same whenever the relevant data needs to be read (money):

(1) Whether there is any relevant data in the read cache, Uid->money

(2) If there is data money in the cache, then return "This is called the data hit" Hits "

(3) If there is no relevant data money in the cache, then read the relevant data from the database Money "This is called the data miss" Miss ", put in the cache Uid->money, and then return

Cache Hit Rate = number of HIT cache requests/Total cache Access Request = hit/(Hit+miss)

The above example of the balance scenario, 99% read, 1% write, this cache hit rate is very high, will be more than 95%.

So here's the problem.

When data money has changed:

(1) Is it to update the data in the cache or to retire the data in the cache?

(2) Do you manipulate the data in the database before manipulating the data in the cache, or manipulate the data in the cache before manipulating the data in the database?

(3) Cache and database operations, the structure of the optimization of the space?

This is the three core issues that this article focuses on.

second, update the cache VS Retire Cache

What is the update cache: data is written not only to the database but also to the cache

What is an obsolete cache: data is written only to the database, not to the cache, and only the data is eliminated

Advantages of updating the cache: Cache does not increase miss once, high hit rate

The advantages of retiring the cache: simple (I go, update the cache I also feel very simple, landlord you too perfunctory it)

Whether you choose to update the cache or retire the cache depends primarily on the "Update cache complexity".

For example, the above scenario simply sets the balance money to a value, then:

(1) The operation to retire the cache is Deletecache (UID)

(2) The operation to update the cache is Setcache (UID, money)

The cost of updating the cache is minimal, and we should be more inclined to update the cache to ensure a higher cache hit ratio

If the balance is calculated from very complex data, for example, in addition to the account form accounts in the business, there are commodity table product, discount list discount

Account (UID, money)

Product (PID, type, price, pinfo)

Discount (type, Zhekou)

The business scenario is that the user buys a product, the price of the commodity, which belongs to the type of commodity, the type of goods in the promotional activities to be discounted Zhekou, after the purchase of goods, the balance of the calculation is complex, need:

(1) First, the product category, the price out: SELECT type, prices from product WHERE pid=xxx

(2) Take out the discount for this category: SELECT Zhekou from discount WHERE type=xxx

(3) Check out the original balance from the cache money = GetCache (UID)

(4) Write the new balance to the cache again Setcache (UID, Money-price*zhekou)

The cost of updating the cache is high, and we should be more inclined to retire the cache at this time.

However, the elimination of cache operation is simple, and the side effect of only adding a cache miss, recommended as a common approach.

third, operate the database first vs operating the cache first

OK, when the write operation occurs, it is assumed that the obsolete cache as a common way of handling the cache faces two different choices:

(1) write the database first and then retire the cache

(2) First retire the cache, then write the database

What kind of timing is used?

Do you remember the conclusion of "How to write a positive table or a counter-table first" in the article "How to ensure data consistency in redundant tables" (Click to view)?

For an operation that is not guaranteed to be transactional, it must involve the question "which task is done first, which task is done after," and the direction to solve the problem is:

If there is inconsistency, who first does the business impact is small, who first executes.

As the write database and the elimination of the cache can not guarantee atomicity, who first who after the same should follow the above principles.


Suppose the database is written first, then the cache is eliminated : the first write database operation succeeds, the second step to eliminate the cache fails, the DB is the new data, the cache is the old data, inconsistent data.


Suppose you first retire the cache and then write the database : The first step to retire the cache succeeds, the second write database fails, and only one cache miss is raised.

Conclusion: The data and cache operation sequence, the conclusion is clear: first eliminate the cache, and then write the database.

Four, Cache architecture optimization

The caching architecture has one drawback : Does the business need to focus on both the cache and the DB, and is there any further space for optimization? There are two common scenarios, a mainstream scenario, a non-mainstream scenario (Opinion, don't shoot).


The mainstream optimization scheme is service-based : Join a service layer, provide a handsome data access interface upstream, shielding the details of the underlying data storage upstream, so that the business line does not need to focus on whether the data is from the cache or DB.


The non-mainstream scenario is an asynchronous cache update : line-of-business all writes go to the database, all read operations are cached, and an asynchronous tool synchronizes the data between the database and the cache, with the specifics:

(1) to have an init cache process, the full amount of data that needs to be cached is written to the cache

(2) If the DB has a write operation, the asynchronous updater reads the Binlog and updates the cache

In cooperation with (1) and (2), the cache has all the data, so that:

(a) Line of business read cache, must be able to hit (within a short period of time, may have dirty data), no need to focus on the database

(b) An asynchronous update is available in the Business Line write Db,cache, without concern for caching

This greatly simplifies the call logic of the line of business , and the disadvantage is that if the cached data business logic is more complex, the logic for async-update asynchronous updates may be more complex.

V. Other matters not yet done

This article only discusses the cache architecture design needs to pay attention to a few points of detail, if the database schema using a master multi-slave, read and write separation of the architecture, in the special time series, it is also likely to trigger inconsistent database and cache, this inconsistency how to optimize, the following article to discuss it again.

Vi. Conclusion emphasizes

(1) Obsolete cache is a common way of cache processing

(2) Eliminate the cache first, then write the timing of the database is beyond doubt

(3) service is a common way to shield the underlying database and cache complexity from the business party

This article mainly discusses a few questions:

(1) Why database master-slave delay causes inconsistent cache data

(2) Optimization ideas and solutions

First, the origin of demand

In the previous article, "Cache architecture design details two or three" there is a small optimization point, when only the main library, through the "serialization" thinking can solve the cache and the data in the database inconsistent. The point that aroused heated discussion is that "under the master-slave synchronization, the read and write separated database schema, there is the possibility of dirty data into the cache, the serialization scheme no longer applies", this is the topic to be discussed in this article.

Second, why the data will be inconsistent

Why do you read dirty data, there are several situations:

(1) In the case of a single database, the service layer of concurrent read and write, the cache and the operation of the database cross

Although there is only one db, it is possible to dirty the data into the cache in the odd sequence of anomalies described above:

1) Request A to initiate a write operation, the first step is to eliminate the cache, and then this request for various reasons at the service layer stuck (a lot of business logic calculations, such as calculated for 1 seconds), such as Step 1

2) Request B to initiate a read operation, read Cache,cache miss, such as Step 2

3) Request B to continue reading DB, read out a dirty data, and then dirty data into the cache, such as Step 3

4) Request a card a long time after finally write the database, write the latest data, such as Step 4

Although this situation is rare, it is theoretically present, and the latter initiated request B was completed in the middle of the earlier request a.

(2) Master-slave synchronization, read and write separation, reading from the library to the old data

When the database schema does a master-slave, read-write separation, more dirty data into the cache is the case:

1) Request A to initiate a write operation, the first step is to eliminate the cache, such as Step 1

2) Request A to write the database, write the latest data, such as Step 2

3) Request B to initiate a read operation, read Cache,cache miss, such as Step 3

4) Request B to continue reading DB, read from the library, when the master-slave synchronization is not completed, read out a dirty data, and then dirty data into the cache, such as Step 4

5) The master-slave synchronization of the final database is completed, such as Step 5

In this case, the timing of request A and request B is completely no problem, is the delay of the active synchronization (assuming a delay of 1 seconds) in the middle of Read requests read from the library to dirty data caused by the inconsistency.

So how to optimize it?

Three, inconsistent optimization ideas

Some students said, "that can first operate the database, and then eliminate the cache," This is not possible, in the "cache and database operations who first" article introduced.

The underlying cause of the inconsistency arises:

(1) In the case of a single library, the service layer may read the old data into the cache during the logical calculation of 1s

(2) master-slave library + read/write separation, in the 1s clock master-slave synchronization delay process, may read the old data into the cache

Since the old data is in that 1s gap into the cache, is it possible to write the request after the completion, then hibernate 1s, again to eliminate the cache, you can write this 1s of dirty data to be eliminated again?

The answer is yes.

The steps to write the request are upgraded from 2 steps to 3 steps:

(1) Retire Cache first

(2) write the database again (these two steps are the same as the original)

(3) Hibernate for 1 seconds and retire cache again

In this case, 1 seconds of dirty data, such as cache, will be eliminated again, but the problem is:

(1) All write requests are blocked for 1 seconds, greatly reducing the throughput of write requests, increasing the processing time, the business is not acceptable

Re-analysis, in fact, the second elimination of the cache is "in order to ensure cache consistency" and do the operation, rather than the "business requirements", so there is no need to wait, with an asynchronous timer, or using the message bus asynchronous to do this thing:

Write requests are upgraded from 2 steps to 2.5 steps:

(1) Retire Cache first

(2) write the database again (these two steps are the same as the original)

(2.5) no longer sleeps 1s, but sends a message to the message bus ESB, which can be returned immediately after it is sent.

In this case, the processing time of the write request has hardly increased, this method eliminated the cache two times, so it is called "cache Double Elimination" method. The cost of this approach is that the cache is incremented by 1 caches miss (the cost can be almost negligible).

Downstream, there is a consumer that asynchronously eliminates the cache, after receiving the message, asy-expire the cache after 1s. In this way, even if the 1s has dirty data into the cache, there is a chance to be eliminated again.

The above scenario has a disadvantage, need to write operations line to add a step, there is no solution to the line of business code without any intrusion, there is, this program in the "Chat redundant table data Consistency" also mentioned, through the analysis line binlog to asynchronous phase out cache:

Line of business code will not need to move, add a line of Read Binlog asynchronous elimination module, read into the Binlog data, asynchronous elimination of the cache.

Question: Why does the above always say 1s, how does this 1s come from?

Answer: 1s is just an example, you need to set this value based on the amount of data and concurrency of the business, observing the master-slave synchronization delay. For example, the master-slave synchronization delay of 200ms, this asynchronous phase-out cache set to 258ms is OK.

Iv. Summary

When "abnormal timing" or "read from library" causes dirty data into the cache, it is possible to use the "cache double elimination" method of two asynchronous elimination to solve the problem of inconsistent data between cache and database, and implement at least three kinds of scenarios:

(1) Timer asynchronous elimination (This article does not say, the essence is to start a thread specifically asynchronous two elimination cache)

(2) Bus asynchronous elimination

(3) Read Binlog asynchronous elimination

7, the application of Linkedhashmap

linkedhashmap maintains a list of double links running on all items. This list of links defines the order of the iterations, which can be either the Insertion Order (Insert-order) or the access order, where the default iteration access order is the insertion order, in which the elements can be traversed in the order in which they were inserted. Based on the characteristics of the Linkedhashmap access order, it is possible to construct an LRU (Least recently used) least recently using a simple cache. There are also some open source cache products such as Ehcache's elimination strategy (LRU) that is extended on the linkedhashmap.

public class lrucache<k, v> extends Linkedhashmap<k, v> {              /** maximum capacity */              private int maxcapacity;                       Public LruCache (int maxcapacity) {                  super (0.75f, true);                  this.maxcapacity = maxcapacity;              }                       public int getmaxcapacity () {                  return this.maxcapacity;              }                       public void setmaxcapacity (int maxcapacity) {                  this.maxcapacity = maxcapacity;              }                       /**              * Returns True when the number of elements in the list is greater than the specified maximum capacity, and deletes the oldest element.              */              @Override              protected Boolean removeeldestentry (Java.util.map.entry<k, v> eldest) {                  if ( Super.size () > Maxcapacity) {                      return true;                  }                  return false;              }          }  

  

public class Lrucachetest {public                       static void Main (string[] args) {                  lrucache<string, object> cache = new Lr Ucache<string, object> (ten);                           for (int i = 1; i <=; i++) {                      cache.put (i + "", I);                  }                           The element                  Cache.get ("ten") of the specified key is accessed at this time;                           iterator<entry<string, object>> Iterator = Cache.entryset (). Iterator ();                  for (; Iterator.hasnext ();) {                      entry<string, object> Entry = Iterator.next ();                      System.out.println ("key=" + entry.getkey () + ", value=" + entry.getvalue ());}}}            

The output is as follows:

key=7,value=7  key=8,value=8  key=9,value=9  key=11,value=11  key=12,value=12  key=13,value=13  key=14,value=14  key=15,value=15  key=10,value=10  

  

 

8. Git-generated conflict solution

 

Java Interview 6-10

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.