A thread blocking problem caused by using the local cache. The cache causes thread blocking.

Last Update:2017-02-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A thread blocking problem caused by using the local cache. The cache causes thread blocking.
Symptom

A colleague's java System experienced request blocking after running for a period of time (504 is returned). From the perspective of the only memory dump file, most threads are blocked in a local cache (jodd cache) (ReentrantReadWriteLock $ ReadLock. lock ).

Troubleshooting Phase 1

The instinctive reaction should be that this happens only when the write lock is occupied. So I started to search for the write lock with the keyword "WriteLock. lock", but I couldn't find it. In fact, it is normal to not find it, because the write lock has been occupied, of course, it is impossible to stop on WriteLock. lock.

I started to translate the jodd LRUCache code and found that it was implemented by using javashashmap. I searched the mongohashmap write code on the dump file and found that a thread was executing the put Method of LRUCache, the Code stays in the pruneCache method of LRUCache (that is, when put, the cache is full and some locations are recycled ):

Protected int pruneCache () {if (isPruneExpiredActive () = false) {return 0;} int count = 0; // cacheMap is an instance of LinkedHashMap Iterator <CacheObject <K, v> values = cacheMap. values (). iterator (); while (values. hasNext () {CacheObject <K, V> co = values. next (); if (co. isExpired () = true) {values. remove (); count ++;} return count ;}

This proves that the original conjecture is correct. Only when the write lock is occupied can so many read threads be blocked.

It can be seen that jodd uses javashashmap + ReentrantReadWriteLock to implement LRUCache performance problems. A write operation locks the entire cache and blocks all read operations. This is the first problem.

Phase 2

Obviously, this cannot end. We need to pursue a higher level and continue to analyze the specific implementation of LRUCache. The main logic is to add a write lock to put and a read lock to get, internally, A javashashmap with accessOrder enabled is used as data storage.

At first glance, it seems quite normal. In fact, the javashashmap multi-threaded get with accessOrder enabled has a concurrency problem, because it will move the get element to the beginning of the two-way linked list. See the get method of javashashmap:

public V get(Object key) {    Entry<K,V> e = (Entry<K,V>)getEntry(key);    if (e == null)        return null;    e.recordAccess(this);    return e.value;} void recordAccess(HashMap<K,V> m) {    LinkedHashMap<K,V> lm = (LinkedHashMap<K,V>)m;    if (lm.accessOrder) {        lm.modCount++;        remove();        addBefore(lm.header);    }}

We can see that there is no concurrency control for changing the linked list structure here, so the parallel hashmap concurrent get is not OK, when jodd adds a read lock to get, there is a concurrency problem (if you do not understand it, Please study the ReentrantReadWriteLock mechanism on your own ). This is the second problem.

It can be imagined that the linked list is broken into various strange situations when high concurrency occurs (I will not describe it if it is more laborious). It is entirely possible to make the values in the pruneCache () method above. hasNext () is always true. This happens to be in LRUCache # pruneCache. Next time, it may be on javashashmap # transfer. Once the code block hang in the write lock is occupied, all the read threads are blocked, and the chances of such a problem may vary, it is difficult to simulate and reproduce.

JUC Bug

Some bugs in earlier JDK versions are also mentioned.

The ReentrantReadWriteLock may be hang without any threads holding the lock:
Http://bugs.sun.com/view_bug.do? Bug_id = 6822370
Http://bugs.sun.com/view_bug.do? Bug_id = 6903249

Summary

Do not use Jodd cache
Gauva cache is recommended.
Implemented based on concurrentjavashashmap and has been integrated into guava.
Do not trust open-source components. You must thoroughly study them before using them.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More