Distributed read-write mutex for Go language

Last Update:2015-05-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

English Original: Distributed Read-write Mutex in Go

Default sync for Go language. Rwmutex implementations do not perform well in multi-core environments, as all readers preempt the same memory address when doing atomic incremental operations. This paper explores a n-way Rwmutex, also known as a "big reader" lock, which allocates independent Rwmutex for each CPU core. The reader only needs to handle the read lock in its core, while the writer must process all the locks in turn.

Find Current CPU

The reader uses the CPUID instruction to determine which lock to use, which only needs to return the apicid of the currently active CPU without issuing a system call instruction or changing the runtime. This is possible on Intel or AMD processors, and the ARM processor needs to use the CPU ID register. For systems with more than 256 processors, x2apic must be used, plus the EDX register with EAX=0XB in addition to the CPUID. When the program starts, a mapping is built (called by the CPU affinity system) Apicid to the CPU index, which is statically present throughout the processor's lifecycle. Because the cost of the CPUID instruction can be quite expensive, goroutine will update the status results on a regular basis only in the kernel in which it runs. Frequent updates can reduce kernel lock blocking, but it also results in increased CPUID instruction time spent in the lock process.

Stale CPU information. If you add a lock, the CPU information that runs goroutine may be outdated (Goroutine will be transferred to another core). While reader remembers which is locked, this only affects performance and does not affect accuracy, and of course, such transfers are unlikely, just as the operating system kernel tries to keep threads in the same core to improve the cache hit rate.

Performance

The performance characteristics of this pattern are affected by a large number of parameters. In particular, the frequency of CPUID detection, the number of readers, the ratio of readers to writers, and the time of the readers holding the lock are all important factors. When there is only one writer active at this time, the time when this writer holds the lock does not affect sync. Performance differences between Rwmutex and Drwmutex.

Experiments have shown that Drwmutex performance is better than multi-core systems, especially when writer is less than 1%, the CPUID is called between up to 10 locks (this varies depending on the duration of the lock being held). Even in the case of low nuclei, Drwmutex is better than Sync.rwmutex in the general choice of using Sync.rwmutex applications with Sync.mutex.

Displays the number of cores used to increase the average performance per 10:

Drwmutex-i 5000-p 0.0001-w 1-r 100-c 100

The error bar represents the 25th and 75th percentile. Note The descent of each 10th nucleus; This is because 10 cores make up a NUMA node on a machine running a standard inspection system, so once you add a NUMA node, cross-thread traffic becomes more valuable. For Drwmutex, because of the contrast sync. Rwmutex more reader can work in parallel, so performance increases as well.

View Go-nuts tread further discussion

Cpu_amd64.s

#include "textflag.h" // func cpu() uint64TEXT 路cpu(SB),NOSPLIT,$0-8    MOVL    $0x01, AX // version information    MOVL    $0x00, BX // any leaf will do    MOVL    $0x00, CX // any subleaf will do     // call CPUID    BYTE $0x0f    BYTE $0xa2     SHRQ    $24, BX // logical cpu id is put in EBX[31-24]    MOVQ    BX, ret+0(FP)    RET

Main.go

Package main import ("Flag" "FMT" "Math/rand" "OS" "Runtime" "Runtime/pprof" "Sync" "Syscall"  "Time" "unsafe") Func CPU () UInt64//implemented in CPU_AMD64.S var CPUs map[uint64]int//Determine mapping from APIC    ID to CPU index by pinning the entire process to//one core at the time, and seeing that its APIC ID is.func init () { CPUs = Make (Map[uint64]int) var aff uint64 Syscall. Syscall (Syscall. Sys_sched_getaffinity, UIntPtr (0), unsafe. Sizeof (AFF), uintptr (unsafe. Pointer (&aff)) N: = 0 Start: = time. Now () var mask uint64 = 1outer:for {for (AFF & mask) = = 0 {Mask <<= 1 if M Ask = = 0 | | Mask > AFF {break Outer}} ret, _, Err: = Syscall. Syscall (Syscall. Sys_sched_setaffinity, UIntPtr (0), unsafe. Sizeof (Mask), uintptr (unsafe. Pointer (&mask))) if RET! = 0 {Panic (err.        Error ())}//What CPU does we have? <-Time. After (1 * time.millisecond) c: = CPU () If oldn, OK: = Cpus[c]; OK {fmt.    PRINTLN ("CPU", n, "= =", Oldn, "--both has CPUID", c)} Cpus[c] = n Mask <<= 1 n++ } FMT. Printf ("%d/%d CPUs found in%v:%v\n", Len (CPUs), runtime. NUMCPU (), time. Now (). Sub (start), CPUs) ret, _, Err: = Syscall. Syscall (Syscall. Sys_sched_setaffinity, UIntPtr (0), unsafe. Sizeof (AFF), uintptr (unsafe. Pointer (&aff))) if RET! = 0 {Panic (err. Error ())}} type RWMutex2 []sync. Rwmutex func (MX RWMutex2) Lock () {for core: = Range mx {mx[core]. Lock ()}} func (MX RWMutex2) Unlock () {for core: = Range mx {mx[core]. Unlock ()}} func main () {cpuprofile: = flag. Bool ("Cpuprofile", False, "enable CPU profiling") Locks: = flag. Uint64 ("I", 10000, "number of iterations to perform") Write: = flag. Float64 ("P", 0.0001, "Probability of Write Locks") Wwork: = flag. Int ("W", 1, "Amount of work for each writer")    Rwork: = flag. Int ("R", "Amount of work for each reader") Readers: = flag. Int ("n", runtime. Gomaxprocs (0), "Total number of readers") Checkcpu: = flag. Uint64 ("C", +, "Update CPU estimate every n iterations") flag. Parse () var o *os. File if *cpuprofile {o, _: = OS. Create ("Rw.out") pprof. Startcpuprofile (o)} readers_per_core: = *readers/runtime. Gomaxprocs (0) var wg sync. Waitgroup var mx1 sync. Rwmutex Start1: = time. Now () for n: = 0; n < runtime. Gomaxprocs (0); n++ {for r: = 0; r < Readers_per_core; r++ {WG. ADD (1) go func () {defer WG. Done () r: = Rand. New (Rand. Newsource (Rand. Int63 ()))) for N: = UInt64 (0); n < *locks; n++ {if R.float64 () < *write {Mx1. Lock () x: = 0 for I: = 0; i < *wwork;   i++ {x + +}                     _ = x mx1. Unlock ()} else {mx1. Rlock () x: = 0 for I: = 0; i < *rwork; i++ {x + +} _ = Mx1. Runlock ()}}} ()}} WG. Wait () End1: = time. Now () T1: = End1. Sub (Start1) fmt. Println ("Mx1", runtime. Gomaxprocs (0), *readers, *locks, *write, *wwork, *rwork, *checkcpu, T1. Seconds (), T1) if *cpuprofile {pprof. Stopcpuprofile () o.close () o, _ = os. Create ("Rw2.out") pprof. Startcpuprofile (o)} MX2: = Make (RWMutex2, Len (CPUs)) Start2: = time. Now () for n: = 0; n < runtime. Gomaxprocs (0); n++ {for r: = 0; r < Readers_per_core; r++ {WG. ADD (1) go func () {defer WG.           Done () c: = Cpus[cpu ()]     r: = Rand. New (Rand. Newsource (Rand. Int63 ()))) for N: = UInt64 (0); n < *locks;                    n++ {if *checkcpu! = 0 && N%*checkcpu = = 0 {c = cpus[cpu ()] } if R.float64 () < *write {MX2. Lock () x: = 0 for I: = 0; i < *wwork; i++ {x + +} _ = Mx2. Unlock ()} else {Mx2[c]. Rlock () x: = 0 for I: = 0; i < *rwork; i++ {x + +} _ = x MX2[C].R Unlock ()}}} ()}} WG. Wait () End2: = time. Now () pprof. Stopcpuprofile () O.close () t2: = End2. Sub (Start2) fmt. Println ("MX2", runtime. GOMaxprocs (0), *readers, *locks, *write, *wwork, *rwork, *checkcpu, T2. Seconds (), T2)}

All translations in this article are for learning and communication purposes only, please be sure to indicate the translator, source, and link to this article.
Our translation work in accordance with the CC agreement, if our work has violated your rights and interests, please contact us promptly

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Distributed read-write mutex for Go language

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support