Recently in the analysis of a potential memory leak problem, jmap out in a lot of fastthreadlocalthread instances, looked at the next Javadoc, as follows:
A special variant of this ThreadLocal
yields higher access performance when accessed from a FastThreadLocalThread
.
Internally, a FastThreadLocal
uses a constant index in an array, instead of the using hash code and hash table, to the look for a variable. Although seemingly very subtle, it yields slight performance advantage over using a hash table, and it's useful when ACCE Ssed frequently.
To take advantage of this thread-local variable, your thread must is a FastThreadLocalThread
or its subtype. By default, all threads created by is due to this DefaultThreadFactory
FastThreadLocalThread
reason.
Note that the fast path was only possible on threads this extend FastThreadLocalThread
, because it requires a special field to store the NE Cessary state. An access is kind of thread falls back to a regular ThreadLocal
.
To put it simply, it is inFastThreadLocalThread线程内访问性能会更快的ThreadLocal的一种实现。其使用常量索引而非hash值作为索引进行变量查找。根据之前对比java测试c++各种map、unordered_map的记忆,一般来说map中值越多、各种实现的差距越大(因为潜在的冲突增加以及底层的实现为b*或者链表或者线性等)。
In order to understand how much of the gap will be, searched, a post (https://my.oschina.net/andylucc/blog/614359) was tested, the results of the example are as follows:
1000 threadlocal corresponds to a timed read operation for 100w times of a thread object:
Threadlocal:3767ms | 3636ms | 3595ms | 3610ms | 3719ms
fastthreadlocal:15ms | 14ms | 13MS | 14ms | 14ms
1000 threadlocal corresponds to a timed read operation for 10w times of a thread object:
Threadlocal:384ms | 378ms | 366ms | 647ms | 372ms
Fastthreadlocal:14ms | 13MS | 13MS | 17ms | 13ms
1000 threadlocal corresponds to a timed read operation for 1w times of a thread object:
threadlocal:43ms | 42ms | 42ms | 56ms | 45ms
fastthreadlocal:15ms | 13MS | 11ms | 15ms | 11ms
100 threadlocal corresponds to a timed read operation for 1w times of a thread object:
Threadlocal:16ms | 21ms | 18ms | 16ms | 18ms
fastthreadlocal:15ms | 15ms | 15ms | 17ms | 18ms
The above experimental data can be seen, when the number of threadlocal and read and write threadlocal frequency is high, the performance of the traditional threadlocal decline faster, and Netty fastthreadlocal performance is relatively stable. The above experimental simulation of the scene is not specific enough, but to a certain extent, we can think that fastthreadlocal compared to the traditional threadlocal in high-concurrency high-load environment performance is relatively excellent.
In summary, according to the experience, the individual believes that 99% of the applications do not use more than tens of thousands of thread local variables, so unless very special applications, for the sake of subsequent maintenance costs, the use of traditional threadlocal, there is no need to use fastthreadlocal.
PS: About the threadlocal scene, do not repeat the elaboration, you can refer to the following two posts:
https://my.oschina.net/clopopo/blog/149368
http://blog.csdn.net/lufeng20/article/details/24314381
http://lavasoft.blog.51cto.com/62575/51926/
Fastthreadlocal Comparison summary of threadlocal and Netty extensions from JDK