[Reprinted] randomness in testing

Source: Internet
Author: User
Randomness in testing

This article is excerpted from msdn
Author: James McCaffrey


Content on this page

Generate unified random numbers
Random Analysis Mode
Mixed Projects
Generate normal/Gaussian numbers
Summary

Creating and using random test case data is a basic software test skill. Although most test case data consists of specific input data of the tested system and specific expected values/states, you almost always want the system to be tested with random test case input data. Generally, you do this to see if sending a large number of different inputs to an application causes system crashes or exceptions. In this month's column, I will explain four common tasks for processing random test case data in the Microsoft. NET Framework environment:

Generate pseudo-random numbers (knuth algorithm)

Random Analysis Mode (Wald-Wolfowitz test)

List of mixed projects (Fisher-Yates algorithm)

Generate Gaussian numbers (Box-Muller algorithm)

Let's take a look at the example in 1. The first part of the output shows the result of using the. NET Framework's random object to generate basic random numbers. Although you may be familiar with this method, I still want to point out how to avoid common defects. The second part of the output shows a very practical but little-known method, which is used to analyze whether a pattern composed of any symbols is random. Generally, this method is widely used in software development, not just testing. The third part of Figure 1 shows the results of the mixed projects list, which is very complicated.

Figure 1 random method demonstration

I will explain in detail why many mixed-rank implementation methods seem to be correct on the surface, but in fact they are completely wrong. The last part of the output in Figure 1 shows the result of generating a group of numbers distributed according to the normal bell-shaped curve. In addition to being a very practical method, the implementation details of this algorithm are concerned by its own performance and will become a valuable supplement to your personal coding toolkit.

Generate unified random numbers

The most basic task in random test case generation is to generate a random number (integer or floating point number) in a specific value range ). This is usually implemented through the system. Random class. Assume the following code is available:

Random objran = new random (5 );
Int n = objran. Next (7 );
Console. writeline ("[] the random integer in the value range is" + n );

N = objran. Next (3, 13 );
Console. writeline ("[3, 12] the random integer in the value range is" + n );

Take the random object as an example to pass in a seed value (5 in this example ). This seed value is used to set the starting point for a sequence of numbers that exhibit many characteristics of real random numbers. The sequence is determined (these numbers are generated from the mathematical formula used when the input seed value or the first few digits in the sequence are used), and therefore are determined by the system. the number generated by random is technically a pseudo-random number, but it is usually called a random number (as shown in this example) when it is informal or context-specific ). The seed value I selected is arbitrary. If I use a overload random constructor that does not accept the seed value, a value derived from the system clock will be used. If you need to re-create a random number sequence during subsequent testing, a seed value should be provided. The discussion about the pseudo-random number generator seed value is an important and complex topic. Sorry, it is not covered in this column.

The simplest way to generate a random integer is to call the random. Next method and pass in a single integer parameter. The return value is the next integer in the pseudo-random list. The value is greater than or equal to 0 and definitely less than this parameter. Therefore, the following call returns a number between 0 and 9 (including 0 and 9) instead of between 0 and 10 (including 0 and 10:

   int n = objRan.Next(10);

The overload of the random. Next method accepts two Integer Parameters and returns an integer greater than or equal to the first parameter and definitely less than the second parameter. If you want to simulate a test case data that is similar to a rolling common six-sided dice, to get a random number between 1 and 6 (including 1 and 6), the call may be as follows:

int roll = objRan.Next(1, 7);

It is easy to generate a random selection item from an array:

String [] items = new string [] {"Alpha", "Beta", "gamma", "Delta "};
Console. writeline ("{'alpha', 'beta ', 'Gamma', 'delta'}" +
"The random member is" +
Items [objran. Next (items. Length)]);

If the array size is N, objran is called. the return value generated by next (n) will be an integer within the value range [0, N-1] (This value fully corresponds to the index value of the array ). Please note that this method can also be used for arraylist objects, and in fact it can also be used for any 0-based indexed set.

In the background, the overloaded random. Next method uses the knuth pseudo-random number generation method. This is also called "subtraction ". Knuth published this algorithm in "seminumerical algorithms" in the art of computer programming (Addison-Wesley, 1981), Volume 1, 2nd. It is quite difficult to generate a uniform pseudo-random number, but fortunately, the. NET Framework will implement the knuth Algorithm for you.

When developing the. NET Framework for the first time, Matsumoto and Nishimura found a new pseudo-random number generation algorithm. Their algorithms are generally referred to as Mersenne Twister (maqishi rotation) methods, and they quickly replace the original Pseudo-Random Digital generation algorithm due to its excellent performance and mathematical characteristics. See "Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator", ACM transactions on modeling and computer simulation Journal, vol. 8th, 1st, December 1, January 1998.

Generation of pseudo-random floating-point numbers is similar to generation of pseudo-random integers. Assume the following code is available:

Double X = (6.25 * objran. nextdouble () + 1.50;
The random double-precision value in the console. writeline ("[1.50, 7.75) value range is" +
X. tostring ("0.00 "));

A call to random. nextdouble will return a number greater than or equal to 0.0 and definitely less than 1.0. To generate floating-point numbers within the value range [lower limit, upper limit) (where two parentheses indicate that they are greater than or equal to the "lower limit" and are definitely less than the "upper limit"), you can use a small formula:

double result = ((high-low) * objRan.NextDouble()) + low;

Because most floating point values except 0.0 are approximate values, technically, you cannot generate floating point numbers within the range [lower limit, upper limit] containing the endpoint value, instead, you must change it to The value range [lower limit, upper limit) that does not contain the endpoint value ). This formula is almost the same as the formula used in the next overload that accepts the complete value range, except that the result of converting to an integer is different.


Random Analysis Mode

Sometimes, you may need to check the input or output mode of a test case to prove that it is generated randomly (for example, to test and determine whether a gambling simulation fact outputs a random result ). There are several statistical methods available for this purpose, but the simplest is the Wald-Wolfowitz test. This test is sometimes called a single sample travel test (where "Travel" is a series of identical numbers or characters ). The Wald-Wolfowitz test is applicable to a sequence composed of two symbols, for example:

A B B A A A B B B A B B

The principle is that for each specified number of symbol types in the mode (two symbol types in total), if the symbol is randomly generated (in this example, it means that the probability of A or B appearing at each position in the mode is 50%), you can calculate the expected number of itineraries in the mode. A pattern travel is a sequence composed of the same symbol types. Therefore, in the model just shown, there are a total of six travel routes: A, BB, AAA, BBB, A, and BB. If the actual number of itineraries in the mode is too large or too small, it indicates that the mode is not randomly generated.

Although the Wald-Wolfowitz test only applies to modes that only contain two symbol types, you can map any pattern to the Wald-Wolfowitz format. For example, if you have an integer sequence such as {7, 9, 3, 4, 1, 6}, the average value is (5.0 ), then map the sequence to the symbols H and l, where H represents the number greater than the average, and l represents the number less than the average: H l h. If you need to analyze more complex models, you can use chi-square or Kolmogorov tests and other testing methods.

Now let's assume that N1 is the number of the first symbol in a certain mode, while N2 is the number of the second symbol. In the random generation mode, the expected number of threads is:

µ = 1 + ((2*n1*n2) / (n1 + n2))

The variance α 2 is calculated from the following formula:

α2 = ((2*n1*n2) * (2*n1*n2 - n1 - n2)) / ((n1 + n2)2 * (n1 + n2 - 1))
These two formulas are obtained based on probability theory. We can use the mean value and variance to determine the probability of a specific pattern being a random process result. Assume that we start with the two-symbol mode. This mode is represented as the following string:
string s = "XOOOXXXXOXXXXXXXOOOOXXXXXXXXXX";

Although you can manually calculate the number of x and O in this mode, it makes it easier for my program to work on behalf of me. First, I will determine the two types of symbols in this mode string:

char kind1 = s[0], kind2 = s[0];
for (int i = 0; i < s.Length && kind1 == kind2; ++i)
{
if (s[i] != kind1) kind2 = s[i];
}

I specify the first character in the pattern string to the first type, and then scan the entire pattern string until the second type is found. Next, we will perform two simple error checks to ensure that there are at least two different types of characters in this mode and there are no more than two types of characters.

If (kind2 = kind1)
Throw new exception ("the string must have two different types ");

For (INT I = 0; I <S. length; ++ I)
If (s [I]! = Kind1 & S [1]! = Kind2)
Throw new exception ("the string can only have two types ");

If you want to consider the performance, you can re-convert the traversal of the pattern string to only one traversal, but the clarity may be affected.

Now, I can start to calculate the number of symbols of each type and the number of travel periods in the Mode:

int n1 = 0, n2 = 0, runs = 1;
for (int i = 0; i < s.Length-1; ++i)
{
if (s[i] == kind1) ++n1;
else if (s[i] == kind2) ++n2;
if (s[i+1] != s[i]) ++runs;
}
if (s[s.Length-1] == kind1) ++n1;
else if (s[s.Length-1] == kind2) ++n2;

I started scanning this mode from the first character until the second to the last character. If the current character matches the previously determined symbol type, I will increase the corresponding counter. For the number of threads in the calculation mode, I took advantage of the fact that the process depends on a change in the symbol type. If the current character is different from the next character, I will know that there is another itinerary, and then I will increment it accordingly. Since I stopped the last and second characters in the mode string, I finally want to check the last character. I also start to accumulate the progress counter from 1 (instead of 0), because according to the definition, all modes have at least one schedule.

The Wald-Wolfowitz test method is valid only when the number of symbols of each type in the analyzed mode is 8 or greater than 8. Therefore, I will perform the following checks:

If (N1 <8 | N2 <8)
Throw new exception ("N1 and N2 must both be greater than or equal to 8," +
"This test makes sense ");

At this time, I have calculated the number of each symbol type and the actual number of travel in the mode. Now, if the two symbol types are randomly generated, I will calculate the expected number of threads in the Mode:

double expectedRuns = 1 + ((2.0*n1*n2) / (n1 + n2));

Then I will calculate the variance of the number of itineraries (if randomly generated), as shown below:

double varianceNumerator = (2.0*n1*n2) * (2.0*n1*n2 - N);
double varianceDenominator = (double)((N*N)*(N-1));
double variance = varianceNumerator / varianceDenominator;

The next step in the analysis is to calculate the standardized test statistic Z:

double z = (R - expectedRuns) / Math.Sqrt(variance);

The Z statistic is equal to the actual number of travel in the pattern minus the expected number of travel in the pattern, and then divided by the value obtained after the standard deviation of the expected number (that is, the square root of the variance. The number of standard itineraries in the interpretation mode is easier than that in the actual interpretation mode. It is easy to interpret the code, but it is a little obscure in concept. It starts with the following:

If (z <-2.580 | z> 2.580)
{
Console. Write ("fully proves that (1%) mode is not randomly generated :");
Console. writeline (Z <-2.580? "Too few travel times. ": Too many flights. ");
}

I performed a so-called bilateral test at a 1% significance level. If the absolute value of the standardized test statistic is greater than 2.580, this means that the probability of the analysis model generated by the random process is less than 1%. The value 2.580 is derived from the statistical table. If the test statistic is negative, the actual number of travel periods is smaller than the expected number, and vice versa. In Figure 2, I am also searching for inadequate evidence that this mode is not randomly generated. Note that, at any time, you cannot say that the given mode is randomly generated. You can only check whether there is statistical evidence to prove that the mode is not randomly generated.


Mixed Projects

Let's discuss the list of mixed projects. If you have a set of test case input and want to deliver all of them to the tested system in a random order, the mixed project list is useful. You can view the mixed sorting list as a random arrangement of generated projects. This unusually tricky problem was once the subject of a lot of research work. The best general mixed sorting algorithm is called the Fisher-Yates algorithm. It is also called the knuth mixed sorting algorithm. This algorithm is extremely simple. Suppose there is a project array:

string[] animals = new string[] { 
"ant", "bat", "cow", "dog", "elk", "fox" };

If you use the most common form of the Fisher-Yates algorithm to mix these animals, you will get the following information:

for (int i = 0; i < animals.Length; i++)
{
int r = objRan.Next(i, animals.Length);
string temp = animals[r];
animals[r] = animals[i];
animals[i] = temp;
}

I iterate every index of the array to be mixed. I select a random position between the current index and the end of the array, and then exchange the items at the current index and Random Index. Incorrect mixed sorting algorithms are very common. The example below is especially tricky. Consider this attempt:

For (INT I = 0; I <animals. length; I ++)
{
// Int r = objran. Next (I, animals. Length); // correct
Int r = objran. Next (0, animals. Length); // incorrect
String temp = animals [R];
Animals [R] = animals [I];
Animals [I] = temp;
}

This code will be generated and executed, but the final list of rearrangements will be biased towards some sort of projects. For example, suppose there are only three items in the original list to be mixed, that is, {ABC }. The purpose of mixed sorting is to generate a random arrangement of the three items, where each sort produces an equal probability. The table in Figure 3 shows all 27 possible results produced by using the incorrect mixed Sorting Algorithm shown above.

The first row in the table in Figure 3 indicates that, during the first iteration of the entire mixed row loop, the I value is 0 and the random Index R value is 0. Because the initial list is {ABC}, the exchanged list is still {ABC }. In the second iteration, I is 1 and the random index is 0. After the switch, the list is now {BAC }. In the last iteration, I is 3 and the random index is 0. After the switch, the final list is sorted as {cab }. There are six possible final arrangements for these three projects. If you want to calculate the number of times each result appears in the table in figure 4, the following results are obtained:

{ABC} = 4 times
{ACB} = 5 times
{BAC} = 5 times
{BCA} = 5 times
{Cab} = 4 times
{CBA} = 4 times

In other words, not all arrays produce equal probabilities. Note that a appears 9 times in the first position, B appears 10 times in the first position, and C appears 8 times in the first position. If this incorrect mixed sorting algorithm is used in a gambling game simulation, serious problems may occur.

If you use the correct hybrid algorithm, you may see in result 4 (note that the R value is never less than the I value ). In this case, the probability of each sort in the six possible final arrangement of the three items is equal, and the probability of a letter appearing at a specific position is equal. At the same time, note that the third traversal is not required, because it only exchanges the last value in the array with itself. Therefore, the loop in the correct algorithm can be converted to animals. Length-1 instead of animals. length.


Generate normal/Gaussian numbers

The fourth method I will demonstrate in this month's column is to generate numbers from a bell-shaped distribution (usually known as a normal or Gaussian distribution.

Suppose you want to generate some test case input data that corresponds to the height of a group of people. A very clever method called the box-Muller algorithm can be used to generate pseudo-random numbers with normal distribution. The code used to create the output shown in Figure 1 starts with the following:

Gaussian G = new Gaussian ();
Double HT;
Int outliers = 0;
Console. writeline ("from average 68.0 inch, standard deviation 6.0 Inch" +
"Normal/Gaussian distribution generation" +
"100 random heights-> ");

Take a program-defined Gaussian object as an example. This object executes all the work and uses the box-Muller algorithm. The variable height is subject to a normal distribution value. I will also initialize a counter to track non-normal values, that is, those values that are much higher than or far below the average height. Tracking non-normal values allows me to verify that my random height is actually normal. Figure 5 lists the code that generates and displays my random height.

I print a new line using the modulus (%) operator every 10 values, just to keep the output neat. Normally distributed random heights with an average value of 68.0 inch and a standard deviation of 6.0 inch are returned by calling the Gaussian. nextgaussian2 method, which will be described in detail later. I track non-normal values by monitoring values smaller than 56 inch or greater than 80 inch. These values are values above or below the average 68.0 inch two standard deviations (6.0*2 = 12.0 inch. According to statistics, the probability that the random value generated exceeds the average value of two standard deviations is about 5%. Therefore, if we generate 100 random heights (as I do now), we can expect about 5 non-normal values. If the number of non-normal values is far greater than or far less than 5, you need to carefully check the code. Note that in the running example shown in figure 1, I just got five non-normal values, which makes me more confident that my randomly generated height data points are actually normal.

The implicit principle of the box-Muller algorithm is very profound, but the result is quite simple. If you have two consistent random numbers U1 and U2 in the [0, 1) value range (as described in the first part of this column ), then you can use either of the following two equations to calculate the random number z of a normal distribution:

Z = R * cos( θ )
or
Z = R * sin( θ )

Where,

R = sqrt(-2 * ln(U2))
and
θ = 2 * π * U1

The normal value Z has an average value equal to 0 and a standard deviation equal to 1. You can use the following equation to map Z to a statistic X with an average value of M and a standard deviation of SD:

X = m + (Z * sd)

The simplest method for implementing Gaussian classes using the nextgaussian method is represented by the code in figure 6.

I use math. Cos, But I can use math. sin with ease. This implementation code is feasible but inefficient. Since the box-Muller algorithm can use any function in sin or COs to calculate the Z value of the normal distribution, it is better for me to calculate two z values at the same time and save the second Z value, then, you can retrieve the saved value when calling the nextgaussian method for the second time. This implementation method is shown in 7.

Although this method is feasible, it is still inefficient. Using Math. Sin, math. Cos, and math. Log Methods reduces the computing performance. A clever way to improve efficiency is to use mathematical skills. If you check the definitions of R and θ, you will find that they correspond to the polar coordinates of a random point in a unit circle. This mathematical technique is used to calculate the coordinates of a random point in a unit square (to avoid calling the math. Sin and math. Cos methods) and determine whether the random point is within the Unit Circle. If so, we can use this set of coordinates. If not, we can calculate a new set of random coordinates and try again. About 78% of the randomly generated coordinates are within the Unit Circle range, which provides better performance, but clearly affects clarity.

The Unit square technique is illustrated in figure 8. The basic box-Muller algorithm selects a polar coordinate (R, θ) and ensures the point within the Unit Circle range. You can also select the rectangular coordinates within the Square of units that enclose the Unit Circle. Points (x1, Y1) are within the Unit Circle range, but points (X2, Y2) are outside the unit circle range. Figure 9 shows the implementation of the unit square method.

Figure 8 unit square technique

In the software testing scenario, performance is generally not a major consideration, so all three implementation methods I mentioned are applicable. However, in software development, especially during simulation, performance becomes a major problem. Although the box-Muller algorithm is efficient and relatively simple to execute, other substitution algorithms can generate normal/Gaussian pseudo-random numbers. There is an efficient alternative called the Ziggurat method.


Summary

Let me give a brief summary. Generating Random test case input data is a basic software test skill .. Net Framework contains a system. Random class that can be used to generate a uniformly distributed integer or floating-point pseudo-random number in a specific value. Make sure that the endpoint value of the value range is correctly specified.

The Wald-Wolfowitz test method can be used to analyze the pattern containing two symbols to prove that it is a random generation pattern. You can use this test method to analyze the input data of random test cases or the output data of the tested system.

The best general mixed sorting algorithm is the Fisher-Yates algorithm. It is very simple and almost does not need to use another method. However, even if there is a slight deviation in the correct algorithm, the algorithm may seem correct, but there are major errors.

The box-Muller algorithm can be used to generate pseudo-random numbers with normal distribution. The mathematical principles implied by the box-Muller algorithm are very profound, but the implementation process is extremely simple. There are several methods that can be used to implement the box-Muller algorithm, but both improve clarity by reducing efficiency.

Please send your questions and comments to James via testrun@microsoft.com.

Dr. James McCaffrey works for volt Information Sciences Inc., where he is responsible for managing technical training for Microsoft software engineers. He has already worked for a variety of Microsoft products, including Internet Explorer and MSN Search. James can be contacted by jmccaffrey@volt.com or v-jammc@microsoft.com.

This article is excerpted from msdn

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.