Tutorial on implementing greedy ranking algorithm in Python, python Algorithm
In an earlier article, I once mentioned that I have already written a Sort Algorithm of my own, and I think it is necessary to review this sort algorithm through some code.
For the work I have done, I verify and ensure the security of the microprocessor. One Method for Testing a very complex CPU is to create another model of the chip, which can be used to generate a pseudo-random command stream running on the CPU. This so-called ISG (Directive abortion generator) can create thousands or even millions of such tests in a short period of time. In some way, this allows you to skillfully control or manipulate the command stream that will be executed on the CPU.
Now we can simulate these command streams to obtain the CPU usage (this is called overwrite) information through the time spent by each test instance, in addition, an ISG test may overwrite the same area of the CPU. To increase the overall coverage of the CPU, we start a behavior called recovery-all tests are run, and their coverage and time spent will be stored. At the end of this recovery, you may have thousands of test instances that only cover a certain part of the CPU.
If you take the record of the recovery test and sort it, you will find that a sub-assembly of the test results gives them to overwrite all the CPU. Generally, thousands of Pseudo-Random tests may be sorted to generate a sublist with only a few hundred tests. They will provide the same coverage at runtime. Next, we often try to fill this gap by checking which part of the CPU is not covered, and then generating more tests through ISG or other methods. Then a new recovery is run, and the CPU is sorted again to make full use of the CPU to reach a certain coverage target.
Ranking tests is an important part of the recovery process. You may forget them when they are well performed. Unfortunately, sometimes, when I want to rank other data, the commonly used ranking algorithms provided by the CAD tool vendor are not suitable. Therefore, being able to process hundreds of tests and coverage points is the essence of a ranking algorithm.
Input
Normally, I have to parse my input from text or HTML files generated by other CAD programs-this is a tedious job and I will skip this tedious job, the ideal input is provided in the form of a Python dictionary. (Sometimes the code used to parse the input file can be as big or larger as the ranking algorithm ).
Let's assume that each ISG test has a name and runs within the specified "time". When the simulation displays the feature of 'overwrite' a group of numbers in the design. After parsing, the collected input data is represented by the result dictionary in the program.
results = {# 'TEST': ( TIME, set([COVERED_POINT ...])), 'test_00': ( 2.08, set([2, 3, 5, 11, 12, 16, 19, 23, 25, 26, 29, 36, 38, 40])), 'test_01': ( 58.04, set([0, 10, 13, 15, 17, 19, 20, 22, 27, 30, 31, 33, 34])), 'test_02': ( 34.82, set([3, 4, 6, 12, 15, 21, 23, 25, 26, 33, 34, 40])), 'test_03': ( 32.74, set([4, 5, 10, 16, 21, 22, 26, 39])), 'test_04': (100.00, set([0, 1, 4, 6, 7, 8, 9, 11, 12, 18, 26, 27, 31, 36])), 'test_05': ( 4.46, set([1, 2, 6, 11, 14, 16, 17, 21, 22, 23, 30, 31])), 'test_06': ( 69.57, set([10, 11, 15, 17, 19, 22, 26, 27, 30, 32, 38])), 'test_07': ( 85.71, set([0, 2, 4, 5, 9, 10, 14, 17, 24, 34, 36, 39])), 'test_08': ( 5.73, set([0, 3, 8, 9, 13, 19, 23, 25, 28, 36, 38])), 'test_09': ( 15.55, set([7, 15, 17, 25, 26, 30, 31, 33, 36, 38, 39])), 'test_10': ( 12.05, set([0, 4, 13, 14, 15, 24, 31, 35, 39])), 'test_11': ( 52.23, set([0, 3, 6, 10, 11, 13, 23, 34, 40])), 'test_12': ( 26.79, set([0, 1, 4, 5, 7, 8, 10, 12, 13, 31, 32, 40])), 'test_13': ( 16.07, set([2, 6, 9, 11, 13, 15, 17, 18, 34])), 'test_14': ( 40.62, set([1, 2, 8, 15, 16, 19, 22, 26, 29, 31, 33, 34, 38])), }<span style="font-size:10pt;line-height:1.5;font-family:'sans serif', tahoma, verdana, helvetica;"></span>
The core of the greedy ranking algorithm is to sort the subset of the selected tests:
- Use at least one test set to cover the largest possible range.
- After the first step, gradually reduce the test set and cover as large as possible.
- Sort the selected test, so that the test of the small dataset can also be used.
- After sorting, You can optimize the algorithm execution time.
- Of course, he needs to be able to work in a large test set.
The principle of the greedy ranking algorithm is to first select the optimal solution of a certain item in the current test set, then find the optimal solution of the next item, and then proceed in sequence...
If more than two algorithms produce the same execution result, the two algorithms are compared based on the execution time.
Use the following function to complete the algorithm:
def greedyranker(results): results = results.copy() ranked, coveredsofar, costsofar, round = [], set(), 0, 0 noncontributing = [] while results: round += 1 # What each test can contribute to the pool of what is covered so far contributions = [(len(cover - coveredsofar), -cost, test) for test, (cost, cover) in sorted(results.items()) ] # Greedy ranking by taking the next greatest contributor delta_cover, benefit, test = max( contributions ) if delta_cover > 0: ranked.append((test, delta_cover)) cost, cover = results.pop(test) coveredsofar.update(cover) costsofar += cost for delta_cover, benefit, test in contributions: if delta_cover == 0: # this test cannot contribute anything noncontributing.append( (test, round) ) results.pop(test) return coveredsofar, ranked, costsofar, noncontributing
Each while loop (5th rows), the next best test will be appended to the ranking and test, without discarding any additional overwrites contributed (37-41 rows)
The above function is a little simple, so I spent a little time to mark it with tutor and print it at runtime.
Function (with guidance ):
It does the same thing, but the code is too large and redundant:
def greedyranker(results, tutor=True): results = results.copy() ranked, coveredsofar, costsofar, round = [], set(), 0, 0 noncontributing = [] while results: round += 1 # What each test can contribute to the pool of what is covered so far contributions = [(len(cover - coveredsofar), -cost, test) for test, (cost, cover) in sorted(results.items()) ] if tutor: print('\n## Round %i' % round) print(' Covered so far: %2i points: ' % len(coveredsofar)) print(' Ranked so far: ' + repr([t for t, d in ranked])) print(' What the remaining tests can contribute, largest contributors first:') print(' # DELTA, BENEFIT, TEST') deltas = sorted(contributions, reverse=True) for delta_cover, benefit, test in deltas: print(' %2i, %7.2f, %s' % (delta_cover, benefit, test)) if len(deltas)>=2 and deltas[0][0] == deltas[1][0]: print(' Note: This time around, more than one test gives the same') print(' maximum delta contribution of %i to the coverage so far' % deltas[0][0]) if deltas[0][1] != deltas[1][1]: print(' we order based on the next field of minimum cost') print(' (equivalent to maximum negative cost).') else: print(' the next field of minimum cost is the same so') print(' we arbitrarily order by test name.') zeroes = [test for delta_cover, benefit, test in deltas if delta_cover == 0] if zeroes: print(' The following test(s) cannot contribute more to coverage') print(' and will be dropped:') print(' ' + ', '.join(zeroes)) # Greedy ranking by taking the next greatest contributor delta_cover, benefit, test = max( contributions ) if delta_cover > 0: ranked.append((test, delta_cover)) cost, cover = results.pop(test) if tutor: print(' Ranking %s in round %2i giving extra coverage of: %r' % (test, round, sorted(cover - coveredsofar))) coveredsofar.update(cover) costsofar += cost for delta_cover, benefit, test in contributions: if delta_cover == 0: # this test cannot contribute anything noncontributing.append( (test, round) ) results.pop(test) if tutor: print('\n## ALL TESTS NOW RANKED OR DISCARDED\n') return coveredsofar, ranked, costsofar, noncontributing
Start with if tutor: add the above Code
Sample value output
The code that calls sorting and prints the result is:
totalcoverage, ranking, totalcost, nonranked = greedyranker(results)print('''A total of %i points were covered,using only %i of the initial %i tests,and should take %g time units to run. The tests in order of coverage added: TEST DELTA-COVERAGE''' % (len(totalcoverage), len(ranking), len(results), totalcost))print('\n'.join(' %6s %i' % r for r in ranking))
The results contain a large number of things from the tutor and follow the results.
In this test case where 15 pieces of test data are randomly generated, it seems that only seven pieces are required to generate the maximum total coverage rate. (And if you are willing to give up three tests, each of which only covers one additional point, then 4 of the 15 tests will provide 92.5% of the maximum possible coverage rate ).