This article describes an efficient algorithm for generating non-repeated random series, which is much faster than commonly used hashtable deduplication methods.
First, let's look at the proposition:
Given a positive integer n, an array with a length of n needs to be output. The array element is a random number and the range is 0-n-1, and the elements cannot be repeated. For example, if n is 3, an array with a length of 3 must be obtained. The element range is 0-2,
For example, 0, 2, 1.
The general solution to this problem is to design a hashtable, and then obtain the random number cyclically, and then find it in hashtable. If hashtable does not have this number, the output is. The code for this algorithm is given below
- public static int[] GetRandomSequence0(int total)
- {
- int[] hashtable = new int[total];
- int[] output = new int[total];
-
- Random random = new Random();
- for (int i = 0; i < total; i++)
- {
- int num = random.Next(0, total);
- while (hashtable[num] > 0)
- {
- num = random.Next(0, total);
- }
-
- output[i] = num;
- hashtable[num] = 1;
- }
-
- return output;
- }
The code is very simple. Get the random number from 0 to total-1 and try to match it in hashtable. If this number does not exist in hashtable, output it and set this number to 1 in hashtable, otherwise, the random number is obtained cyclically until a number not in hashtable is found.The problem with this algorithm is that we need to constantly try to obtain random numbers. When hashtable is close to full, the probability of this attempt to fail increases.
Is there any algorithm that does not need to be repeatedly tried? The answer is yes.
As shown in, we design an ordered array, assuming n = 4
In the first round, we take the random number between 0 and 3, assuming it is 2. Then, we take the number with the array position of 2 and output it, and delete this number from the array, this array becomes
In the second round, we will take the random number between 0 and 2, assume it is 1, and output the number at this position, delete this number from the array, and so on, until the length of this array is 0. In this case, we can obtain a random non-repeating sequence.
The advantage of this algorithm is that you do not need to use a hashtable to store the obtained numbers, and you do not need to repeat them. The algorithm code is as follows:
- public static int[] GetRandomSequence1(int total)
- {
- List<int> input = new List<int>();
- for (int i = 0; i < total; i++)
- {
- input.Add(i);
- }
-
- List<int> output = new List<int>();
-
- Random random = new Random();
- int end = total;
- for (int i = 0; i < total; i++)
- {
- int num = random.Next(0, end);
- output.Add(input[num]);
- input.RemoveAt(num);
- end--;
- }
-
- return output.ToArray();
- }
This algorithm has changed two cycles into one loop, greatly reducing the complexity of the algorithm. It should be faster than the first algorithm, but the reality is often beyond our imagination, when total = 100000, tested, the first algorithm took 44 ms, the second took 1038 ms, a lot slower! Why? The key to the problem lies in the input. removeAt, we know that if we want to delete an array element, we need to move all the elements behind this array element forward by 1, which is very time-consuming, this algorithm is slow here. Some people may have said that we don't need an array or a linked list. Isn't it very fast to delete it? Yes, the linked list can solve the efficiency problem of deleting elements, but the search speed is greatly reduced, and the elements cannot be directly located according to the subscript of the array elements as in the array. Therefore, linked lists cannot be used. It seems that we have reached a dead end. Can we only use hashtable for repeated attempts? Before reading the following content, please think about it for five minutes.