Catalogue [−]
- The most common scenario
- BYTE Substitution Rune
- Use remainder
- Mask
- Mask-enhanced version
- Source
- Benchmark Code
- Other promotion
How to produce a random string efficiently? This may seem like a simple question, but icza, by example, is progressively optimized to achieve a more efficient algorithm for random strings. This is from a question from StackOverflow: How to generate a random string of a fixed length in Go?, everyone together, put forward a good plan and feedback, especially Icza answer. This article is translated and collated from this question and answer.
The problem is this:
I want a go implementation of a fixed-length random string (including uppercase and lowercase letters, but no numbers), which is the quickest and easiest way?
Optimization is based on a scenario presented by Paul Hankin (the first scenario), which is the most basic and easy to understand scenario, and Icza is optimized based on this scenario.
The most common scenario
The most common scenario is to randomly produce each character, so the whole string is also random. The advantage is that you can control which characters you want to use.
12345678910111213 |
func init () { rand. Seed (time. Now (). Unixnano ())}var letterrunes = []rune("ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ" )funcintstring {make ([]rune, N) for Range B { b[i] = Letterrunes[rand. INTN (len(letterrunes))] } returnstring(b)} |
BYTE Substitution Rune
If the requirement is to use only English alphabetic characters (including case), then we can replace Rune with byte because the English alphabet and byte in the UTF-8 encoding correspond to one by one.
123456789 |
Const "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ" func int string {Make ([]byte, n) for range B { b[i] = Letterbytes[rand. INTN (len(letterbytes))] } returnstring(b)} |
Use remainder
In the previous step we used rand.Intn to randomly select a character that would be called, and would be called, which would be rand.Intn Rand.Intn Rand.Intn Rand.Int31n slower than a direct call rand.Int63 , which would produce a random integer of 63bit.
We can use rand.Int63 and then divide len(letterBytes) by the remainder to select the characters:
1234567 |
func int string {Make ([]byte, n) for range B { int64( Len(letterbytes))] } returnstring(b)} |
This implementation is obviously faster than the solution above, but with a small flaw: the probability that a character is chosen is not exactly the same. But the difference is very very small (the number of characters is 52 far less than 1<<63-1),
Only theoretically there will be differences, in practice can be neglected.
Mask
From the previous scenario, we can see that we don't need too many bits to determine the average distribution of the characters, in fact we just need the last few bits of a random integer to select the letters. For 52 English letters (case), only 6 bit can be used to achieve uniform distribution ( 52=110100b ), so we can use the rand.Int63 latter 6 bit to achieve, we only accept the following six bits in 0..len(letterBytes)-1 the random number, if not in this range, discard the re-election. The last 6 bits of an integer can be obtained through a mask.
12345678910111213141516 |
ConstLetterbytes ="ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"Const(Letteridxbits =6 //6 bits to represent a letter indexLetteridxmask =1<<letteridxbits-1 //All 1-bits, as many as Letteridxbits)funcRandstringbytesmask (nint)string{b: = Make([]byte, N) forI: =0; I < n; {ifIDX: =int(Rand. Int63 () & Letteridxmask); IDX <Len(letterbytes) {B[i] = Letterbytes[idx] i++}}return string(b)} |
Mask-enhanced version
There is a bad place on it that will produce a lot of discarded case, resulting in re-election and waste. rand.Int63generates a random number of 63bit, and if we divide it into 6 parts, we can generate 10 6bit random numbers at a time. This reduces waste.
12345678910111213141516171819202122232425 |
ConstLetterbytes ="ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"Const(Letteridxbits =6 //6 bits to represent a letter indexLetteridxmask =1<<letteridxbits-1 //All 1-bits, as many as LetteridxbitsLetteridxmax = the/letteridxbits//# of indices fitting in + bits)funcRANDSTRINGBYTESMASKIMPR (nint)string{b: = Make([]byte, N)//A rand. INT63 () generates random bits, enough for Letteridxmax letters! forI, cache, remain: = N-1, Rand. Int63 (), Letteridxmax; I >=0; {ifremain = =0{cache, remain = rand. Int63 (), Letteridxmax}ifIDX: =int(Cache & Letteridxmask); IDX <Len(letterbytes) {B[i] = Letterbytes[idx] i--} cache >>= letteridxbits remain--}return string(b)} |
Source
The code above is really good, there is not much to improve, even if it can be improved, it will cost a lot of complexity.
We can optimize from another aspect, which is to increase the generation of random numbers (source).
crypto/randThe package provides a Read(b []byte) method that can randomly generate the bytes of the bit we need, but because of the security aspects of design and inspection, its random number generation is slower.
We turn back math/rand and rand.Rand use rand.Source to generate a random bit. rand.Sourceis an interface that provides Int63() int64 , exactly what we need.
So we can use it directly rand.Source instead of the global or shared random source.
1234567891011121314151617181920 |
varsrc = rand. Newsource (time. Now (). Unixnano ())funcRANDSTRINGBYTESMASKIMPRSRC (nint)string{b: = Make([]byte, N)//A src. INT63 () generates random bits, enough for Letteridxmax characters! forI, cache, remain: = N-1Src. Int63 (), Letteridxmax; I >=0; {ifremain = =0{cache, remain = src. Int63 (), Letteridxmax}ifIDX: =int(Cache & Letteridxmask); IDX <Len(letterbytes) {B[i] = Letterbytes[idx] i--} cache >>= letteridxbits remain--}return string(b)} |
The global (default) random source is thread-safe, which uses locks, so it's better without us directly rand.Source .
The following code is a global random source that you can see Lock/Unlock using.
12345678910111213141516171819202122 |
func Int64 return globalrand.int63 ()}var globalrand = New (&lockedsource{src:newsource(1). ( SOURCE64)})typestruct {lk sync. Mutexsrc Source64}funcint64) {R.lk.lock () n = r.src.int63 () r.lk.unlock ()return} |
With the addition of rand.Read() methods and Rand.Read() functions in Go1.7, we can try to use it to get a set of random bits for higher performance.
One small question is how many bytes of random numbers are better? We can say: As much as the output character. This is an upper-bound estimate because the character index will be less than 8bit.
In order to maintain the uniform distribution of characters, we have to discard some random numbers, which may fetch more random numbers, so we can only estimate n * letterIdxBits / 8.0 random byte that needs bytes.
Of course, the best way to verify is to write a benchmark, the appendix is the benchmark code, the following is the result of the test:
123456 |
Benchmarkrunes 1000000 1703 ns/opbenchmarkbytes 1000000 1328 Ns/opbenchmarkbytesrmndr 1000000 1012 ns/opbenchmarkbytesmask 1000000 1214 ns/ OPBENCHMARKBYTESMASKIMPR 5000000 395 ns/opbenchmarkbytesmaskimprsrc 5000000 303 ns/op |
Benchmark Code
Benchmarkrandomstring_test.go
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465 6667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611 7118119120121122123124125126127128129130131132133134135 |
PackageMainImport("Math/rand""Testing""Time")//ImplementationsfuncInit () {rand. Seed (time. Now (). Unixnano ())}varLetterrunes = []Rune("ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ")funcRandstringrunes (nint)string{b: = Make([]Rune, N) forI: =Rangeb {B[i] = Letterrunes[rand. INTN (Len(Letterrunes))]}return string(b)}ConstLetterbytes ="ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"Const(Letteridxbits =6 //6 bits to represent a letter indexLetteridxmask =1<<letteridxbits-1 //All 1-bits, as many as LetteridxbitsLetteridxmax = the/letteridxbits//# of indices fitting in + bits)funcRandstringbytes (nint)string{b: = Make([]byte, N) forI: =Rangeb {B[i] = Letterbytes[rand. INTN (Len(Letterbytes))]}return string(b)}funcRandstringbytesrmndr (nint)string{b: = Make([]byte, N) forI: =Rangeb {B[i] = Letterbytes[rand. INT63 ()%Int64(Len(Letterbytes))]}return string(b)}funcRandstringbytesmask (nint)string{b: = Make([]byte, N) forI: =0; I < n; {ifIDX: =int(Rand. Int63 () & Letteridxmask); IDX <Len(letterbytes) {B[i] = letterbytes[idx]i++}}return string(b)}funcRANDSTRINGBYTESMASKIMPR (nint)string{b: = Make([]byte, N)//A rand. INT63 () generates random bits, enough for Letteridxmax letters! forI, cache, remain: = N-1, Rand. Int63 (), Letteridxmax; I >=0; {ifremain = =0{cache, remain = rand. Int63 (), Letteridxmax}ifIDX: =int(Cache & Letteridxmask); IDX <Len(letterbytes) {B[i] = Letterbytes[idx]i--}cache >>= letteridxbitsremain--}return string(b)}varsrc = rand. Newsource (time. Now (). Unixnano ())funcRANDSTRINGBYTESMASKIMPRSRC (nint)string{b: = Make([]byte, N)//A src. INT63 () generates random bits, enough for Letteridxmax characters! forI, cache, remain: = N-1Src. Int63 (), Letteridxmax; I >=0; {ifremain = =0{cache, remain = src. Int63 (), Letteridxmax}ifIDX: =int(Cache & Letteridxmask); IDX <Len(letterbytes) {B[i] = Letterbytes[idx]i--}cache >>= letteridxbitsremain--}return string(b)}//Benchmark functionsConstn = -funcBenchmarkrunes (b *testing. B) { forI: =0; i < B.N; i++ {randstringrunes (n)}}funcBenchmarkbytes (b *testing. B) { forI: =0; i < B.N; i++ {randstringbytes (n)}}funcBenchmarkbytesrmndr (b *testing. B) { forI: =0; i < B.N; i++ {Randstringbytesrmndr (n)}}funcBenchmarkbytesmask (b *testing. B) { forI: =0; i < B.N; i++ {randstringbytesmask (n)}}funcBENCHMARKBYTESMASKIMPR (b *testing. B) { forI: =0; i < B.N; i++ {RANDSTRINGBYTESMASKIMPR (n)}}funcBENCHMARKBYTESMASKIMPRSRC (b *testing. B) { forI: =0; i < B.N; i++ {RANDSTRINGBYTESMASKIMPRSRC (n)}} |
Other promotion
In fact, if you can replace a better performance of the random number generation algorithm, may be better performance, I use the xorshift algorithm to achieve a fast random number generator, and the previous implementation of the comparison, found that performance will be better.
1234567 |
BenchmarkRunes-4 1000000 1396 ns/opbenchmarkbytes-4 2000 799 ns/opbenchmarkbytesrmndr-4 3000000 627 ns/opbenchmarkbytesmask-4 2000000 719 ns/opbenchmarkbytesmaskimpr-4 10000000 260 NS/OPB EnchmarkBytesMaskImprSrc-4 10000000 227 ns/opbenchmarkbytesmaskimprxorshiftsrc-4 10000000 205 ns/op |