Source
https://blogs.unity3d.com/cn/2015/01/07/aprimeronrepeatablerandomnumbers/(English version)
Http://www.manew.com/thread3714411.html
No matter what kind of program you create, you can almost do without random numbers. If you want to generate the same results multiple times, this requires that the random number is repeatable.
In this article we will describe the use of levels or the world of generation as an example, but the principles also apply to many other things, such as procedural textures, models, music, and so on. However, this does not apply to applications with stringent requirements, such as encryption.
Why do you want to produce the same results multiple times?
 To be able to access the same level or world again. For example, create a deterministic level/world with a specific seed. If you reuse the same seed, you can create the same level/world repeatedly. Minecraft, for example, is using this principle.
 In order to dynamically generate a lasting world. If you want to dynamically generate the world as the player moves around, you may want the coordinates of the player to be accessed again in the same way as the original state (like Minecraft (My World), no man's Sky (unmanned Deep space), and so on), rather than without logic each time.
 All players are the same world. You may want the world in the game to be the same for all players, as if it were not a program generated. There are examples of no man's Sky (unmanned Deep space). This is basically the same level or scene as mentioned above for repeated visits, unlike repeated accesses that always use the same seed.
We have mentioned the word "seed" many times. A seed can be a numeric value, a text string, or other data type, which is used as an input parameter to obtain a random output result. The seed is characterized by the same seed always producing the same output, and a slight change can result in a very different result.
In this article, we will review two methods for generating random numbersrandom number generators and random hash (hash) functionsand the reasons for choosing to use them. As far as I know, these things are not common, and there are no similar resources elsewhere, so I am writing here to share with you.
Random number generator
The most common way to generate random numbers is through a random number generator (RNG). Many programming languages contain RNG classes or functions and have "random" in their names, so it is obvious that this is the preferred method to start using random numbers.
The random number generator generates a set of random numbers according to the initial seed. In objectoriented languages, the random number generator is typically an object that uses seed initialization. A method in the object is then called repeatedly to generate a random number.
The code for generating random numbers in C # is as follows:
[AppleScript]
Plain Text View
Copy Code?
123456 
Random randomSequence
= new Random
(
12345
)
;
int randomNumber
1 = randomSequence.Next
(
)
;
int randomNumber
2 = randomSequence.Next
(
)
;
int randomNumber
3 = randomSequence.Next
(
)
;

In this case we will get a random integer between the 0~2147483647 (int type maximum), we can also specify the range of random integers, or specify the number of floats between the generated 0~1 and so on. A common approach to implementing this functionality is shown below.
is the 65,535 random number that was first generated by the Random class in C # after seed 0 was initialized. Each random number is represented by an image, with a luminance between 0 (black) and 1 (white).
It is important to understand that a third random number cannot be obtained until the first and second random numbers are obtained. This is not just a manifestation of its implementation mechanism. Essentially, each random number generated by the RNG is used as part of the next build calculation. Let's say a random sequence.
This means that if you want a string of randomly generated random numbers, the RNG can satisfy you, but if you want to get a specific random number (for example, the 26th number in a sequence of random numbers), then it's a break. However, you can still call the next () function 26 times and take the last result, of course it's just a joke.
Why do you want to get a particular random number in a sequence?
If you generate everything at the same time, you may not need to get a particular random number in the sequence, at least I don't think it's necessary. However, if it is a little dynamic generation, it is necessary.
For example, suppose you have three regions in your world: A, B, and C. The player starts in zone A, so use 100 random numbers to generate region a. The player then continues to use another 100 different numbers to generate area B. At the same time, the previously generated zone a will be destroyed and free of memory. Another 100 random numbers are used to generate C and release B.
However, if the player is now going back to Area B, then the region B should be generated according to the first 100 random numbers generated, thus making the region consistent with the original.
Can I use the random number generator to specify a different seed to implement?
The answer is NO! This is a very common misconception about the rng. In fact, in fact, although the correlations between the different numbers in the same sequence are completely random, the numbers in the same index in different sequences are correlated
So if the first number is taken out of the 100 sequences, they are not random, and the 10th, 100th, or 1000th are equally coherent.
There is some doubt about that. You can take a look at the discussion on the rng generation on Stack overflow, if you feel more reliable. To make this article more interesting and useful, let's do some experiments to see the results.
We compare the number of sequences obtained by the first number of each sequence with the random number generated in the same sequence as the reference, and then with the 65,536 sequences generated in the seed fetch 0~65535.
Although the image is more evenly distributed, it is not random. In fact, I have compared and demonstrated the output through a purely linear function, and it is clear that the random numbers generated using the seed sequence are no better than using linear functions directly.
Is that enough for random? Is that good enough for you?
In this regard, a better way to measure randomness is a good choice, because the naked eye is not too reliable. Why? Doesn't the result seem to be random enough?
Yes, our ultimate goal is to make the results sufficiently random. However, the results of random numbers generated vary depending on how they are used. Your build algorithm may generate random values in a variety of ways, and when you view the final values in a simple sequence, the hidden patterns are discovered.
Another way to look at random output values is to create a 2D coordinate system and draw the random numbers into pairs to paint the final image in the coordinate system. The more random values at a pixel point, the higher the brightness of the point.
Let's take a look at the distribution of random numbers in the same sequence and the distribution of the random numbers in each of the different sequences. A linear function diagram is also attached to the comparison.
You may be surprised by the use of different sequences of different seeds to create a coordinate chart of a value that is depicted on a thin line rather than any nearuniform distribution. As stated above, it is very similar to a linear function.
If you want to create coordinates with random numbers, use them to lay out trees on the terrain. Now all your trees are arranged in a straight line and leave a lot of space.
We can conclude that the random number generator is only useful if you do not need to access random values in a particular order. If you need to, you may want to learn more about the following random hash function.
Random hash function
In general, the hash function can be any function, as long as it can be used to map any range of data to a fixed range, and the input parameters a small change can lead to different output results.
For program generation, a typical use case is to provide one or more integer data as input, and then get a random number as the output. For example, a larger world can generate only one part at a time, and a typical requirement is to get a random number that is related to the input vector (for example, the coordinates in the world), and if the input is the same, the random number remains unchanged. Unlike the random number generator (RNG), it is not sequentialyou can get random numbers in any order you like.
The sample code in C # is as follows (note that you can get random numbers in any order):
[AppleScript]
Plain Text View
Copy Code?
123456 
RandomHash randomHashObject
= new RandomHash
(
12345
)
;
int randomNumber
2 = randomHashObject.GetHash
(
2
)
;
int randomNumber
3 = randomHashObject.GetHash
(
3
)
;
int randomNumber
1 = randomHashObject.GetHash
(
1
)
;

The hash function may also take multiple inputs, which mean you can get a random number for a given 2D or 3D coordinate:
A hash function can also receive multiple inputs, which means you can get random numbers at a given 2D or 3D coordinate:
[AppleScript]
Plain Text View
Copy Code?
12345678 
RandomHash randomHashObject
= new RandomHash
(
12345
)
;
randomNumberGrid[
20
, 40
]
= randomHashObject.GetHash
(
20
, 40
)
;
randomNumberGrid[
21
, 40
]
= randomHashObject.GetHash
(
21
, 40
)
; randomNumberGrid[
20
, 41
]
= randomHashObject.GetHash
(
20
, 41
)
;
randomNumberGrid[
21
, 41
]
= randomHashObject.GetHash
(
21
, 41
)
;

Programgenerated random numbers are not a typical use of hash functions, and not all hash functions are suitable for programgenerated random numbers, because they may not be sufficiently randomly distributed, or are too costly to perform.
One of the applications of hash functions is as part of the data structure implementation, such as a hash table and a dictionary. These are usually efficient but not sufficiently random, because they are not born to be random but only make the algorithm more efficient. In theory this approach should be random, but in fact, I haven't found a resource to compare their randomness, and my test results show that it's very random (see Appendix C for details).
Another application of the hash function is encryption. This is usually very random, but inefficient, because the randomness required for encryption is not just a random look.
The goal we use to generate a program is to create a random and efficient hash function, which means that it should not be less efficient than it should be. The programming language does not have the right functions in place to choose from, and you need to find one for your project, and that's the opportunity.
I have tested several different hash functions according to the online recommendation and a lot of relevant knowledge. I chose the following three to compare.
 Pcghash: I saw this function provided by Adam Smith in a discussion about program content generation in the Google Groups forum. He offers some skills to suggest that it is not difficult to create a random hash function himself, and he also provides his own code snippet Pcghash as an example.
 MD5: This is probably the most wellknown hash function of all. It is also used for encryption and is more expensive than our target (it is also a cipherlevel algorithm strength, which is overkill for our goal). First, we usually only need to return a 32bit integer value, but MD5 returns a larger hash value, and in most cases we discard the extra number of bits. But we still have to compare them.
 Xxhash: This is a highperformance noncryptographic hash function that satisfies our needs for good randomness and good performance.
 In addition to generating the image and map of the noise point sequence, I have also tested the pseudorandom number sequence test program using the random test site ent–. I included statistics on the selection of ENT in the image, and a statistic that I thought was called diagonals Deviation. The latter mainly shows the sum of the pixels on the diagonal of the coordinate chart and measures the standard errors.
Here are the results of the comparison of the above 3 hash functions:
The final Pcghash is more prominent, although the random number noise in the sequence from the above picture is very random, the coordinate chart shows a clear pattern, that is, it does not live some simple transformations. I conclude that it is difficult to implement my own random hash function, or leave it to the experts to solve it.
MD5 and Xxhash seem to rival each other in randomness, where Xxhash is about 50 times times faster. Another advantage of Xxhash is that although it is not an rng, there is a concept of seed, which is not available for all hash functions. You can set the seed to be more powerful for program generation, because you can use different properties of different entities, grids, or similar objects as different seeds, and then use only the index (or grid) coordinates of that entity as input to the hash function. The key is that, using Xxhash, the sequences generated by different seeds are also randomly correlated (the sequences of different seeds are also random) (see Appendix 2 for details).
Optimization implementation of hashtoprogram generation
From my study of hash functions, it is obvious that, while equivalent to the general hash function benchmark, this is a good choice, but it is important to optimize it to meet the requirements of the program generation rather than using the hash function as is.
The following is a very important optimization of two points
 Avoid type conversions between int (integral type) and byte (byte type). The most commonly used hash function is to enter a byte array as input and then return an integer or some byte hash value. However, some highperformance functions convert the input byte to int, because they operate int internally. Since the most common program generation is to return a hash value based on an int input, there is absolutely no need to convert to byte. Removing the dependency on byte can increase the performance by twice times and ensure the output is fully consistent.
 Implements a method that does not use loops with only one or several inputs. The most commonly used hash function is to receive data of different lengths as input, in the form of an array. This is also useful for program generation, but the most common possibility is that only 1, 2, or 3 integers are used as input to generate a hash value. By optimizing functions with fixedlength integers instead of arrays as inputs, there is no need to use loops, which can significantly improve performance (I'm testing about two to two times faster). I am not an expert at the bottom of the optimization, but this significant difference may be caused by an implicit branch of the for loop or an array that needs to be allocated.
Currently, the hash function I recommend is the Xxhash optimized for program generation, see Appendix C for more details. (The hash function I'm recommending is the Xxhash that is optimized for programgenerated random numbers)
You can get the Xxhash and other hash functions I wrote on BitBucket. This is my own thing that I write in my spare time, not belonging to Unity Technologies.
In addition, I have added additional methods to optimize the generation of integers or floatingpoint numbers within a specified range, which is also critical for program generation.
Note: At the time of writing this article I only added a single integer for the input optimizations to Xxhash and MURMURHASH3. Subsequent free I'll add an overloaded function to optimize two and three integer inputs.
hash function and RNG combination
Random hash functions and random number generators can be used in combination. It is wise to use a random number generator of different seeds, but these seeds are converted through a hash function instead of being used directly.
Suppose there is a great maze, close to infinity. There is a large grid and each grid unit is also a maze. As the player moves around the world, the maze in the grid unit is generated dynamically in the vicinity of the player.
In this case, you may want each maze to be generated the same way each time it is accessed, so the generation of random numbers is irrelevant to the random numbers that were generated earlier.
However, the maze is generated all at once, so it is not necessary to control the sequence of random numbers for a maze.
Here's an ideal way to create a maze by using a random hash (hash) function that creates a seed from the coordinates of a maze grid cell and then uses it as a seed of a random number generator to generate a sequence of random numbers.
The code examples in C # are as follows:
[AppleScript]
Plain Text View
Copy Code?
01020304050607080910 
RandomHash randomHashObject
= new RandomHash
(
12345
)
;
int mazeSeed
= randomHashObject.GetHash
(
cellCoord.x
, cellCoord.y
)
;
26
Random randomSequence
= new Random
(
mazeSeed
)
;
int randomNumber
1 = randomSequence.Next
(
)
;
int randomNumber
2 = randomSequence.Next
(
)
;
int randomNumber
3 = randomSequence.Next
(
)
;

Conclusion
If you want to control the order of the random numbers of queries, use the appropriate generation of optimized random hash functions (such as Xxhash) for your program.
The simplest way to do this is to use a random number generator, such as the System.Random class in C #, if you just want a bunch of random numbers instead of the order. For dous random correlation (random) between all random numbers, only one sequence is generated (initialized with only one seed), or multiple seed initializations are processed using a random hash function (such as Xxhash).
The source code of the random number test mentioned in this paper, as well as the large number of rng and hash function source code, can be obtained on bitbucket. This is what I wrote myself in my free time, and it's nothing to do with unity technologies.
This article was originally published in Runevision blog, a blog focused on game development and some of my free time research.
Appendix A: Annotations on continuous noise
For some scenarios, you may want to query for continuous noise values, which means that the input values are similar and the output values are similar. A typical application is a terrain or texture.
These requirements are quite different from the discussion in this article. For continuous noise, study the Perlin noiseor more advanced simplex Noise.
However, be aware that these only apply to continuous noise. Querying a continuous noise function just to get random numbers unrelated to other random numbers produces a poor result because this is not the direction in which these algorithms are optimized. For example, I found that querying the simplex noise function at an integer location would return the result 0 every 3 inputs.
In addition, the continuous noise function is usually calculated with floatingpoint numbers, and its stability and accuracy are not as high as the original.
Appendix B: More about seed and input test results
I've heard all kinds of misunderstandings over the years and I'll try to list them here.
Isn't it best to use a large number as a seed?
No, there is no evidence to prove this. From the test picture ke in this article, we see that the size of the seed value has no effect on the output result.
Does the random number generator require several numbers to "work"?
No. Again, from the test picture in this article, it can be seen that the sequence of random numbers from beginning to end is the same pattern (the top left corner starts one line at a time)
Here are the No. 0 and 100th generated images in the sequence of 65,535 random numbers I tested. As you can see, the randomness of the two is not much different.
There are no rng, for example, in Java, is it better to use random sequence randomness of different seeds?
There may be a little bit of an advantage, but it's far from enough. Unlike the Random class in C #, the Random class in Java does not work with the original seed, but rather disrupts the seed's bit order before storing the seed.
The random numbers in different sequences may seem a bit random, and we can see serial correlation better from the test diagram. However, it is clear from the coordinate chart that the numbers will be arranged in single lines when coordinates are used.
This means that we have enough reason for the rng to use the seed that has been processed by the random hash function. Actually, it seems like a good idea to do this, and it seems to me that it's not wrong. It's just that the current rng that I know is not doing it, so you have to do it yourself as described earlier.
Why use different seeds for random hash functions
There is no particular reason for this, except that functions such as Xxhash and MURMURHASH3 are similar to seed and input, meaning that it is essentially a highquality random hash function applied to the seed. Because it is implemented in this way, it is also random to take the nth random number from a sequence of different seed generation.
Appendix C: More Comparison of hash functions
In the first edition of this article I compared Pcghash, MD5 and MURMURHASH3, and recommended the use of MURMURHASH3.
The randomness and speed of MURMURHASH3 are excellent. The author also provides a framework called Smhasher to test the hash function, which has been widely used.
Here's a good way to stack Overflow question about good hash functions for uniqueness and speed, where many hash functions are compared and the resulting image is as good as MurmurHash.
After releasing this article I saw the Aras Pranckevi?ius recommendation and learned about Xxhash, and I learned about Wang's Hash from Nathan Reed.
Xxhash defeated MurmurHash in his own territory because of its high quality in the Smhasher test and also for better performance. For more information, please refer to Xxhash on its Google Code page.
In my initial implementation, after the byte conversion was removed, it was lighter and faster than MURMURHASH3, although the results seen in Smhasher were not.
I also realized the Wanghash. The result proves that the quality is not high, because it shows the obvious pattern in the coordinate chart, but it is 5 times times faster than the Xxhash. I try to achieve the "Wangdoublehash", and the results of the feedback, testing the quality is good and still more than the Xxhash 3 times times faster.
However, since Wanghash (and Wangdoublehash) only receive an integer as input, I decided to implement the same xxhash and MURMURHASH3 that only receive a single input to see how performance would change. The results showed a significant increase in performance (about a few times the size). All actually xxhash is faster than Wangdoublehash.
When it comes to quality, my own test framework has obvious drawbacks, but it's not as complex as the Smhasher test framework, so a hash function test score can be assumed to be random and not just look good in my tests. In general I would say that the functions tested by my test framework are sufficient for program generation, but since Xxhash (the optimized version) is the fastest hash function and passed my test, use it without consideration.
There are also a lot of hash functions that can be used to make more comparisons. However, I focus on some of the bestknown and widely used methods of randomness and performance, and then optimize them for program generation.
I think that using this version of Xxhash has the best performance, and the likelihood of finding or using a better approach is minimal. That said, expand the test framework at will to achieve more.
Unity repeatable random Number