Storm uses a data structure called timecachemap to store recently active objects in the memory. It is highly efficient and automatically deletes expired and inactive objects.
Timecachemap uses multiple buckets to narrow the lock granularity, in exchange for high concurrent read/write performance. Next, let's take a look at how timecachemap is implemented internally.
Bucket linked list: each element in the linked list is a hashmap used to save data in key and value formats.
PrivatePartition list
Lock Object: used for get/put operations on timecachemap to ensure atomicity.
Private FinalObject _ Lock =NewObject ();
Background cleanup thread: clears data after timeout.
PrivateThread _ cleaner;
Time-out callback interface: this interface is used to perform function callback after time-out and perform some other processing.
Public Static InterfaceExpiredcallback <K, V>{Public VoidExpire (K key, V Val );}PrivateExpiredcallback _ callback;
With the above data structure, let's take a look at the specific implementation of the constructor:
1. First, initialize a specified number of buckets and store them as chained linked lists. Each bucket contains an empty hashmap;
2. Then, set the cleanup thread. The process is as follows:
A) sleep expirationmillis/(numBuckets-1) millisecond time (I .e. expirationsecs/(numBuckets-1) S );
B) Lock the _ Lock Object and remove the last element from the buckets linked list;
C) add an empty hashmap bucket to the head of the buckets linked list to remove the _ Lock Object lock;
D) if the callback function is set, callback is performed.
Public Timecachemap ( Int Expirationsecs, Int Numbuckets, expiredcallback <K, V> Callback ){ If (Numbuckets <2 ){ Throw New Illegalargumentexception ("numbuckets must be> = 2" );} _ Buckets = New Partition list (); For ( Int I = 0; I <numbuckets; I ++ ) {_ Buckets. Add ( New Hashmap <K, V> ();} _ Callback = Callback; Final Long Expirationmillis = expirationsecs * 1000l ; Final Long Sleeptime = expirationmillis/(numBuckets-1); _ Cleaner = New Thread ( New Runnable (){ Public Void Run (){ Try { While ( True ) {Map <K, V> dead = Null ; Time. Sleep (sleeptime ); Synchronized (_ Lock) {dead = _ Buckets. removelast (); _ buckets. addfirst ( New Hashmap <K, V> ());} If (_ Callback! = Null ){ For (Entry <K, V>Entry: dead. entryset () {_ callback. expire (entry. getkey (), entry. getvalue ());}}}} Catch (Interruptedexception ex) {}}); _ cleaner. setdaemon ( True ); _ Cleaner. Start ();}
The constructor must pass three parameters: expirationsecs: timeout time, in seconds; numbuckets: Number of buckets; callback: timeout callback function.
For ease of use, three types of constructor are provided, which can be selected as needed:
// This default ensures things expire at most 50% past the expiration time Private Static Final Int Default_num_buckets = 3 ; Public Timecachemap ( Int Expirationsecs, expiredcallback <K, V> Callback ){ This (Expirationsecs, default_num_buckets, callback );} Public Timecachemap ( Int Expirationsecs, Int Numbuckets ){ This (Expirationsecs, numbuckets, Null );} Public Timecachemap ( Int Expirationsecs ){ This (Expirationsecs, default_num_buckets );}
2.
Performance Analysis
Get operation: traverses each bucket. If a specified key exists, it is returned. The time complexity is O (numbuckets)
PublicV get (K key ){Synchronized(_ Lock ){For(Hashmap <K, V>Bucket: _ buckets ){If(Bucket. containskey (key )){ReturnBucket. Get (key );}}Return Null;}}
Put operation: Put the key, value in the first bucket of _ buckets, then traverse other numBuckets-1 buckets, remove the records whose key is key from hashmap, the time complexity is O (numbuckets)
Public VoidPut (K key, V value ){Synchronized(_ Lock) {iterator<Hashmap <K, V> it =_ Buckets. iterator (); hashmap<K, V> bucket =It. Next (); bucket. Put (Key, value );While(It. hasnext () {Bucket=It. Next (); bucket. Remove (key );}}}
Remove operation: traverses each bucket. If a record with the key as the key exists, it is deleted directly. The time complexity is O (numbuckets)
PublicObject remove (K key ){Synchronized(_ Lock ){For(Hashmap <K, V>Bucket: _ buckets ){If(Bucket. containskey (key )){ReturnBucket. Remove (key );}}Return Null;}}
Containskey operation: traverses buckets. If a specified key exists, true is returned. Otherwise, false is returned. The time complexity is O (numbuckets)
Public BooleanContainskey (K key ){Synchronized(_ Lock ){For(Hashmap <K, V>Bucket: _ buckets ){If(Bucket. containskey (key )){Return True;}}Return False;}}
Size operation: traverses buckets and accumulates the hashmap size of each bucket. the time complexity is O (numbuckets)
Public IntSize (){Synchronized(_ Lock ){IntSize = 0;For(Hashmap <K, V>Bucket: _ buckets) {size+ =Bucket. Size ();}ReturnSize ;}}
3.
Timeout
After analyzing put operations and _ cleaner threads, we know that:
A) The put operation places the data in the first bucket of _ buckets, then traverses the buckets of other numBuckets-1, and removes the record whose key is key from hashmap;
B) The _ cleaner thread removes the data in the last bucket of _ buckets from timecachemap every expirationsecs/(numBuckets-1) second.
Therefore, if the _ cleaner thread just clears data and the put function call puts the key in the bucket, the timeout time for a piece of data is:
Expirationsecs/(numBuckets-1) * numbuckets = expirationsecs * (1 + 1/(numBuckets-1 ))
However, if the put function call is just completed and the _ cleaner thread begins to clean up data, the timeout time for a piece of data is:
Expirationsecs/(numBuckets-1) * numbuckets-expirationsecs/(numBuckets-1) = expirationsecs
4.
Summary
1. The efficiency of timecachemap is that the lock granularity is small. The O (1) Time can complete the lock operation. Therefore, get and put operations can be performed most of the time.
2. The get, put, remove, containskey, and size operations can be completed within the O (numbuckets) Time. numbuckets is the number of buckets. The default value is 3.
3. The time-out for unupdated data is between expirationsecs and expirationsecs * (1 + 1/(numBuckets-1.