Given any D, it is the probability that some H's bad Sample (i.e. Ein and eout are not close) is:
That is, the number of alternative functions in H m=| H| The less the sample data is, the smaller the probability of the sample becoming a bad sample. At an acceptable probability level, learning algorithm a only needs to pick the best-performing H as the G-line.
Choosing the best G needs to meet two conditions: to find a hypothesis that G makes eout (g) and Ein (g) Very close, making ein (g) Small enough,
The following is the relationship between Bad and M:
So choosing a suitable m is very important and requires a finite value m to replace a very infinite value m
Idea: Overlapping for similar hypotheses H1 ≈ H2, their Ein (H1) ≈ein (H2), Eout (H1) ≈eout (H2) ( For example, two straight lines in the PLA, adjacent to a very close line) =>union bound over-estimating
To account for Overlap,we can group similar hypotheses by kind
H a dichotomy (dichotomy) of D: Each function h in an alternate function set is a mapping of input X to output y: H={hypothesis h:x->{x,0}} will H (x1,x2,..., xN) = (h (x1), H (x2),..., H (xN)) ∈{x,0}n where H (x1,x 2,...,xN) contains all the dichotomies of D.
The difference between hypotheses H and Dichotomies h (x1,x2,...,xN) :
Growth Function:remove dependence by taking max of all possible (x1, x2, ..., xN)
4 Growth functions
Break point: there is a K input, if it cannot be the current number of alternate function set h shatter, then k is h of a Break Point
The infinite hypotheses becomes the limited dichotomies