Original article: http://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization

Note: Due to the difficulty of completely translating articles, some sentences adopt the free translation method. In addition, in addition to translation errors, we also hope to testify.

**Overview**

Some computing problems are solved by searching in the solution space. A search algorithm is a description of how to select the distinct solution for evaluation and repeat the process. Different search algorithms may have different results for specific problems, but there is no performance difference for all problems. That is to say, if an algorithm has a better solution to some problems, it must have a poor solution to another problem. In this sense, there is no free lunch in the search field. In addition, Schaffer said that the search performance is conservation. Search can usually be seen as optimization, so there is no free lunch for optimization issues.

Wolpert and Macready's "no free lunch" theory is simply to say that the average performance of any two algorithms in the face of all problems is equivalent. No free lunch indicates that the average performance obtained by different algorithms for different problems is higher than that obtained by the same algorithm for all problems. Igel, Toussaint, and English have determined that there are no general conditions for free lunch. Although this is theoretically possible, the actual situation is not 100% in line with the theory. Droste, Jansen, and Wegener prove a theorem-in fact, "almost no free lunch ".

Let's make the problem specific. Suppose someone wants to optimize a problem P. When he has knowledge about how the problem is generated, he may use this knowledge to find an algorithm dedicated to P processing. Of course, this algorithm has good performance. When he does not use this knowledge or does not know it at all, his problem becomes to finding an algorithm with better performance on most issues. The author of the "almost no free lunch" theory says that he basically won't find it, but when he applies the theory to reality, the author acknowledges that there may be exceptions.

**No free lunch (NFL)**

More formally, the "problem" is to find a good solution for the target function. The search algorithm uses the target function as the input, and then evaluates the objective solution one by one. Finally, the ranking sequence with high rating is output.

Wolpert and Macready agree that an algorithm will never reevaluate the same sequence solution. The performance of the algorithm is measured by output. For simplicity, we do not use random functions in algorithms. According to the preceding conventions, the search algorithm runs every possible input and generates every possible output. Because performance is measured using outputs, it is difficult to differentiate the frequency of performance achieved by different algorithms at a specific level.

Some performance indicators indicate how good the search algorithm is in the optimization of the target function. Indeed, it is not surprising to use search algorithms to optimize problems and then use such indicators. A common indicator is to use the minimum subscript of the worst solution in the output sequence, that is, to minimize the number of evaluations required by the target function. For some algorithms, the time for finding the minimum value is proportional to the number of evaluations.

The original NFL theory assumes that all target functions are equivalent to the input of search algorithms. Strictly speaking, it has been determined that when and only when the target function is shuffled without affecting the probability of a good result output, there is an NFL, please advise. The original Article is it has since been established that there is NFL if and only if, Loosely speaking, "shuffling" objective functions has no impact on their probabilities ). Although theoretically speaking, in this case there is an NFL, it is not actually accurate.

Obviously, it's a "free lunch" instead of an NFL. But this is a misunderstanding. The NFL is a problem of degree, not a problem of either or no. If the NFL appears in an approximate way, all algorithms obtain similar results for all objective functions. Note that "not the NFL" only means that algorithms may not be the same as some performance indicators. When interest performance is used to measure the performance, the algorithms may still be equivalent or approximately equivalent (Translation: for a performance measure of interest in interest, which may be a specialized term, so no translation ).

In theory, all algorithms usually do well when dealing with optimization problems. That is to say, the algorithm can use a relatively small evaluation to obtain a good solution for all objective functions. The reason is that almost all target functions have a large degree of Kolmogorov randomness. This can be compared to extremely irregular and unpredictable. The explain solution is equivalent to any solution, and all the good solutions are scattered in the Candidate space. An algorithm rarely evaluates partial solutions of more than a small amount to get a very good solution.

In fact, almost all of the target functions and algorithms have such a high Kolmogorov complexity that they do not have the aforementioned situation. The amount of information in a typical target function or algorithm is much more recorded than that in the visible field estimated by Seth Lloyd. For example, if a sequence is decoded into a sequence of 3000 s and 1 s ), the good solution is 0 and 1, so most target functions have at least 2 ^ 300 bits of Kolmogorov complexity. This is greater than the limit of 10 ^ 90 ≈ 2 ^ 299 proposed by Lloyd. It can be seen that not all physical reality can use the theory of "no free lunch. In practical sense, an algorithm that is "small enough" is used in physical reality applications with better performance than those that are not.

**Official introduction to the NFL**

Is a set of all target functions F: X-> Y. X is a finite space for solutions, and Y is a finite order set. Set J to an array of X, and F to a random variable distributed on. For every J in J, f o j (note that the O in the middle is the right composite operator of the Set Theory) is the random variable distributed above, all of them have P (f o j = f) = P (F = f o J ^-1)

A (f) indicates that F is used to output the search algorithm. If a (F) and B (f) share the same distribution for all algorithms A and B, then f contains the NFL distribution. This condition only exists when F and f o j are evenly distributed to any J in J. In other words, there is no free lunch when and only when the target function is arranged and distributed in the solution space.

When and only when the condition is proposed by C. Schumacher in his doctoral thesis black box search-framework and methods. Recently, the theory of set theory has been generalized to arbitrary set X and Y.

**Original NFL Theorem**

Wolpert and Macready give two most important NFL theorems. One is that the target function will not change during the search process. The other is about changing the target function.

Theorem 1: For any pair of algorithms A1 and A2

In general, this indicates that when all functions F are equal, the probability of any sequence of M values observed in the search process does not depend on the algorithm used. Theorem 2 builds a more refined model for Time-Varying objective functions.

**Explanation of NFL results**

An easy-to-understand but not very precise explanation of the NFL results is that "a general-purpose optimization strategy theoretically does not exist. The only condition in which a policy performs better than other policies is that it is used to solve specific problems ." Below are some comments:

*An almost general-purpose algorithm theoretically exists. Each search algorithm runs well in almost all target functions.*

*An algorithm may be better than another algorithm in terms of issues not specifically handled. Maybe this problem is the worst problem for both algorithms. Wolpert and Macready establish a metric to measure the multi-matching relationship between an algorithm and a problem. That is to say, an algorithm is better suited to a problem than another algorithm, rather than specifically addressing this problem.*

*In fact, some algorithms reevaluate the solution. An algorithm that never reevaluates the solution is better than another algorithm on a specific issue. It may not be related to whether the problem is specially handled.*

*For almost all target functions, specialization is accidental in nature. Non-compression or random Kolmogorov makes the target function untraceable when using an algorithm. A non-compress target function is provided, and there is no preference between different algorithms. If a selected algorithm is better than most algorithms, the result is an accidental event.*

In fact, only highly compreable (not random at all) target functions are suitable for storing in computers. Not every algorithm can obtain a good solution in almost all of the compress functions. Adding the prior knowledge of the problem to the algorithm usually improves the performance. Strictly speaking, when the NFL appears, it is important for the professional who handles optimization problems not to literally understand "Full Employment theorems. First, people usually only have a little bit of prior knowledge. Secondly, when a prior knowledge is added to an algorithm to solve certain problems, the performance improvement is minimal. Finally, human time is more precious than computer time. The company prefers to use an unmodified program to slowly optimize a function, rather than investing manpower to develop a faster program. This example is not uncommon.

The results of the NFL do not indicate that algorithms are useless for a problem that is not specifically handled. No one can quickly determine which part of an algorithm can obtain a good solution to the actual problem. In addition, there are actually free lunches, but this is not in conflict with the theory. Running an algorithm on a computer is much less costly than using human time and obtaining benefits from a good solution. If the algorithm successfully finds a satisfactory solution within the acceptable time, then this small investment will return a lot. Even if the algorithm does not find such a solution, it rarely loses.

**Coevolutionary free lunches)**

Wolpert and Macready have proved free lunch in collaborative evolution. They analyzed the covers 'self-play' issue. In these cases, participants will create a champion by playing against one or more enemies in multiplayer games. That is to say, the purpose is not to get a good Participant through the target function. The performance of each participant (equivalent to the solution) is achieved through the performance of the game. An algorithm tries to get good participants through participants and their game performance. The algorithm treats the best of all participants as the champion. Wolpert and Macready have proved that some Coevolutionary algorithms won the championship and the quality is usually better than other algorithms. Creating a champion through self-simulation is an interesting part of evolutionary computing and game theory. The reason is also very simple, because non-evolutionary species are naturally not the champions.