In real life, grouping statistics is very common. For example, the People's bank requires commercial banks to submit a report on the anti-money laundering of a project is the month of the number of large transactions and the amount of money, where a large transaction defined as a customer's cumulative amount of the day of 200,000 yuan or foreign currency equivalent of more than 10,000 U.S. dollars. This is done by grouping statistics by transaction date from a large number of transaction chronological.
Let's generate the data to be counted, as follows:
IEnumerable<Tuple<int, double>> GetTuples(int n)
{
var tuples = new Tuple<int, double>[n];
var rand = new Random();
for (int k = 1, i = 0; i < n; i++)
{
var r = rand.Next(n);
k += (r >= n - 3) ? 2 : ((r >= n - 9) ? 1 : 0);
tuples[i] = new Tuple<int, double>(k, rand.NextDouble());
}
return tuples;
}
This method generates the data that is already sorted by n items.
Now, let's group by keyword and count the number and average of each group.
First, use C # 's foreach Loop, as follows:
IEnumerable<Tuple<int, int, double>> ForEach(IEnumerable<Tuple<int, double>> tuples)
{
var result = new List<Tuple<int, int, double>>();
var count = 0;
var sum = 0.0;
int? key = null;
foreach (var v in tuples)
{
if (key != v.Item1)
{
if (key != null) result.Add(new Tuple<int, int, double>(key.Value, count, sum / count));
sum = count = 0;
key = v.Item1;
}
count++;
sum += v.Item2;
}
if (key != null) result.Add(new Tuple<int, int, double>(key.Value, count, sum / count));
return result;
}
One of the biggest drawbacks of this approach is that after the Foreach loop is over, there is a statistic that smells the "bad taste" of the code.
So let's refactor, and this time, use iterators to loop:
IEnumerable<Tuple<int, int, double>> Iterate(IEnumerable<Tuple<int, double>> tuples)
{
var result = new List<Tuple<int, int, double>>();
var count = 0;
var sum = 0.0;
int? key = null;
for (var iter = tuples.GetEnumerator(); ; count++, sum += iter.Current.Item2)
{
var hasValue = iter.MoveNext();
if (!hasValue || key != iter.Current.Item1)
{
if (key != null) result.Add(new Tuple<int, int, double>(key.Value, count, sum / count));
if (!hasValue) break;
sum = count = 0;
key = iter.Current.Item1;
}
}
return result;
}
In this way, the "bad taste" is eliminated.