The shuffle method below claims that it will fairly disrupt the order of its input arrays. In other words, assuming that the pseudo-random number generator is fair, it will generate various arrays with equal probability. Does it really fulfill its promise? If not, how will you correct it?

`import java.util.Random;public class Shuffle { private static Random rnd = new Random(); public static void shuffle(Object[] a) { for(int i = 0; i < a.length; i++) swap(a, i, rnd.nextInt(a.length)); } private static void swap(Object[] a, int i, int j) { Object tmp = a[i]; a[i] = a[j]; a[j] = tmp; }}`

Look at this shuffle method. It has no obvious errors. It traverses the entire array and swaps randomly selected elements. This will disrupt the array fairly, right? No. "It has no obvious errors" and "It has no obvious errors" are different. Here, there is a very serious error, but it is not obvious unless you specialize in algorithms.

If you call the shuffle method using an array with a length of N as a parameter, this loop will be executed n times. In each execution, this method selects one of the N integers from 0 to n-1. Therefore, this method has different NN execution actions. We assume that the random number generator is fair, so the probability of each execution action is equal. Each execution action generates an array arrangement. However, here is a small problem: For an array with a length of N, There is only n! . (The exclamation point after N represents the factorial operation: the factorial of N is defined as N x (n-1) x (n-2) x... × 1 .) The problem is that for any n greater than 2, NN cannot be n! Division, because n! Contains all prime numbers from 2 to N, while NN only contains the prime numbers of N. This undoubtedly proves that the shuffle method will produce more orders.

To make this problem more specific, let's consider an array containing the string "A", "B", and "C" with a length of 3. In this case, the shuffle method has 33 = 27 execution actions. These actions have the same probability and will produce a certain arrangement. The array has 3! = 6 different Arrays: {"a", "B", "C" },{ "a", "C", "B" },{ "B ", "A", "C" },{ "B", "C", "a" },{ "C", "", "B"} and {"C", "B", ""}. Because 27 cannot be divisible by 6, some sort will certainly be generated by more execution actions than other sort, so the shuffle method is not fair.

One problem here is that the above proof only proves that the shuffle method does not provide any perceptual material for this deviation. Sometimes the best way to gain insight is to experiment. Let us use this method to operate the "constant array" (identity array, that is, an array a that satisfies a [I] = I ), then the test program calculates the expected values of the elements at each position (expected value ). Loose, this expectation is the average value of all values that you can see at a certain position in the array when you repeatedly run the shuffle method. If the shuffle method is fair, the expected values of the elements at each position should be equal: (n-1)/2 ). Figure 10.1 shows the expected values of each element in an array with a length of 9. Note the special shape of this image: It was relatively low at the beginning, then increased above the fair value (4), and then dropped to the fair value in the last element.

Why is this image in this shape? We don't know the specific details, but we will have some intuitive understanding. Let's focus on the first element of the array. When the loop body is executed for the first time, it has the correct expected value (n-1)/2. However, in 2nd executions, there is a possibility that the random number generator will return 0 and the value of the first element of the array will be set to 1 or 0. That is to say, 2nd executions systematically reduce the expectations of the first element. In the first execution, there is also the possibility of one out of n. The value of the first element will be set to 2, 1, or 0, and then continue. In the first n/2 executions of the loop, the expected value of the first element is reduced. In the next n/2 executions, its expected value is increased, but it can no longer reach its fair value. Note that the last element of the array must have correct expectations, because the last step in method execution is to select a value for all elements of the array.

Okay, our shuffle method is broken. How can we fix it? Use the shuffle method provided in the Class Library:

`import java.util.*;public static void shuffle(Object[] a) { Collections.shuffle(Arrays.asList(a));}`

If there are methods in the library that can meet your needs, be sure to use it [J item 30]. In general, the database provides efficient solutions, and you can make the minimum effort.

In addition, after you endure all these mathematical things, it would be unfair to tell you how to fix this broken shuffle method. The solution is very direct. In the loop body, swap the current element and all the elements randomly selected between the current element and the elements at the end of the array. Do not touch those elements that you have already exchanged values. This is essentially the algorithm used by the methods in the database:

`public static void shuffle(Object[] a) { for(int i = 0; i < a.length; i++) swap(a, i, i + rnd.nextInt(a.length - i));}`

It is easy to prove that this method is fair by using induction. In the most basic case, let's look at the array with a length of 0, which is obviously fair. According to the induction step, if you use this method on an array of N> 0 in length, it will randomly select a value for the element at the zero position of the array. Then, it will traverse the remaining elements of the array: at each position, it will randomly select an element in the "sub array, this sub-array starts from the current position to the end of the original array. For the child array whose length is n-1 from position 1 to the end of the original array, if this method is applied to this child array, it is actually doing the above. This completes the proof. It also provides recursive shuffle methods, and its details are left to the reader as exercises.

You may think that this is all the content of the story, but there is still some content. Have you ever imagined that this repaired shuffle method would generate an array of 52 elements representing 52 cards with equal probability? After all, we just prove that it is fair. Here you may not be surprised to find that the answer is obviously "no ". The problem here is that at the beginning of the puzzle, we made the assumption that "the pseudo-random number generator used is fair. But it is not.

This random number generator, java. util. Random, uses a 64-bit seed, and its random number is completely determined by this seed. 52 cards! Sorts the seeds, but there are only 264 seeds. What is the percentage of the data that can be covered? Do you believe it is 2.3 × 10-47? This is just a euphemism for saying "In fact, it is not covered ". If you use Java. security. securerandom replaces Java. util. random, you will get a 160-bit seed, but it brings you nothing surprising: For arrays with more than 40 elements, this shuffle method still cannot return some sort of it (because 40!> 2160 ). For an array of 52 elements, you can only get 1.8X10-18 of all possible arrays.

Does this mean that you cannot trust these pseudo-random number generators when shuffling? This depends on the situation. They can only produce a negligible portion of all possible arrangements, but they do not have the systemic deviations we have seen before. To be fair, these generators are useful in informal scenarios. If you need a sophisticated random number generator, you need to find somewhere else. In short, like many algorithms, it is necessary to be careful to disrupt an array. It is easy to make mistakes and difficult to find errors. When other conditions are similar, you should prioritize the use of class libraries instead of handwritten code. If you want to learn more about the topic of this puzzle, see [knuth98 3.4.2].

Puzzle 94: Lost in chaos