http://bubblexc.com/y2011/547/
In the above section, the experience loss of 0 means that we get a discriminative function f that fully conforms to the training sample requirements, namely ∀ (Xi,yi), F (xi) =yi. However, most of the time, we cannot get a workable solution that satisfies all the constraints in the preceding equation. Therefore, we also use the idea of SVM to add relaxation variables to the optimization problem, so that the model does not have to fully fit the sample of the training set, thus obtaining the following optimization problems:
minw,ξ12| | w| | 2+cn∑i=1nξ2s.t.∀i,∀y∈y∖yi:⟨w,δψi (y) ⟩≥1−ξi can interpret ξi as a penalty for those that do not meet the constraints of the Y. But here, for the two y1,y2 that do not meet the constraints, we punish them in equal amounts, which is obviously unreasonable. The right thing to do is if Δ (y1,yi) >δ (y2,yi), then we should give Y1 greater punishment. Therefore, we should also consider the impact of the loss function in the penalty , so we can further get the following optimization problem (the specific derivation process is listed in the article):
minw,ξ12| | w| | 2+cn∑i=1nξ2s.t.∀i,∀y∈y∖yi:⟨w,δψi (y) ⟩≥δ (yi