This is a creation in Article, where the information may have evolved or changed.
The implementation of a function in the Catena (Timing storage engine) is controversial, and it gets one from the map according to the specified name
metricSource
. This function is called at least once for each insert operation, and the function call is more frequent and spans multiple threads in a real-world scenario, so we have to consider synchronization.
The function map[string]*metricSource
obtains a pointer from within according to the specified name
metricSource
, if not obtained, creates one and returns it. The key point to note is that we will only insert this map.
The simple implementation is as follows: (To save space, omit the function header and return, just paste the important part)
var source *memorysourcevar present Boolp.lock.Lock ()//Lock the Mutexdefer p.lock.unlock ()//Unlock The mutex at the EN DIF source, present = P.sources[name];!present {//The source wasn ' t found, so we ' ll create It.source = &memorysource {name:name,metrics:map[string]*memorymetric{},}//Insert the newly created *memorysource.p.sources[name] = source}
After testing, the implementation can reach approximately 1,400,000 insertions/sec (via the concurrent call, GOMAXPROCS
set to 4). It looks fast, but in fact it is slower than a single co-process, because there is a lock contention between multiple processes.
Let's simplify the situation to illustrate the problem, assuming that the two processes are getting "a", "B", and "a" and "B" are already present in the map. At runtime, a process acquires a lock, takes a pointer, unlocks it, and resumes execution, at which point the other process is stuck in acquiring the lock. Waiting for a lock to release is time-consuming, and the more the process is, the worse the performance.
One way to make it faster is to remove the lock control and ensure that only one of the threads accesses the map. This method is simple, but not scalable. Let's look at another simple approach and ensure thread safety and scalability.
var source *memorysourcevar present Boolif source, present = P.sources[name];!present {//Added this line//the source W ASN ' t found, so we'll create it.p.lock.lock ()//Lock the Mutexdefer p.lock.unlock ()//Unlock at the endif source, Presen t = P.sources[name];!present {Source = &memorysource{name:name,metrics:map[string]*memorymetric{},}//Insert the n Ewly created *memorysource.p.sources[name] = source}//If present is true, then another goroutine have already inserted//t He element we want, and source is set to what we want.} Added this line//Note if the source is present, we avoid the lock completely!
The implementation can reach 5,500,000 inserts/second , 3.93 times times faster than the first version. There are 4 processes in the run test, the results are basically consistent with the expected value.
This implementation is OK because we did not delete or modify the operation. We can safely use the pointer address in the CPU cache, but be aware that we still need to lock it. If not, one of the threads may already be in the process source
of being inserted when the other process is created, and they will be in a competitive state. In this version we only lock in very few cases, so the performance has improved a lot.
John POTOCNY recommends removal because the unlock defer
time is delayed (to unlock when the entire function returns), and an "ultimate" version is given below:
var source *memorysourcevar present Boolif source, present = P.sources[name];!present {//The source wasn ' t found, so we ' LL create It.p.lock.lock ()//Lock the mutexif source, present = P.sources[name];!present {Source = &memorysource{nam e:name,metrics:map[string]*memorymetric{},}//Insert The newly created *memorysource.p.sources[name] = source} P.lock.unlock ()//Unlock the mutex}//Note that if the source is present, we avoid the lock completely!
9,800,000 Insert/sec ! Changed 4 lines to 7 times times !! There is wood!!!
Update:(the original author is very good and progressive)
Is the above implementation correct? No! With Go Data Race Detector We can easily discover the conditions of the condition and we cannot guarantee the integrity of the map while reading and writing.
The following gives the non-existent conditions, thread safety, should be considered "correct" version. When used RWMutex
, the read operation is not locked and the write operation is kept in sync.
var source *memorysourcevar present Boolp.lock.RLock () if source, present = P.sources[name];!present {//The source wasn ' t Found, so we ' ll create It.p.lock.runlock () P.lock.lock () if source, present = P.sources[name];!present {Source = &memo rysource{name:name,metrics:map[string]*memorymetric{},}//Insert The newly created *memorysource.p.sources[name] = Source}p.lock.unlock ()} else {P.lock.runlock ()}
After testing, this version of the performance for its previous version of the 93.8%, in order to ensure the correctness of the premise to reach this is very good. Perhaps we can think that there is no comparability between them, because the previous version was wrong.
This article translated from: Optimizing Concurrent Map Access in Go
the article syncs from The art of minimalist design