principle
Logarithmic loss, or logarithmic likelihood loss (Log-likelihood Loss), also known as Logistic regression loss (logistic Loss) or cross-entropy loss (Cross-entropy Loss), is defined on probability estimates. It is often used ( Multi-nominal, multiple) logistic regression and neural networks, as well as some variants of the desired maximal algorithm. Can be used to evaluate the probability output of the classifier.
The logarithmic loss is quantified by punishing the wrong classification to achieve the accuracy (accuracy) of the classifier. Minimizing the logarithmic loss is essentially equivalent to maximizing the accuracy of the classifier. In order to calculate the logarithmic loss, the classifier must provide a probability value for each category to which the input belongs, not just the most likely category. The logarithm loss function is calculated as follows:
Where Y is the output variable, x is the input variable, and L is the loss function. n is the input sample amount, M is the number of possible categories, and Yij is a binary indicator that indicates whether class J is the true category of the input instance XI. Pij the probability that the input instance Xi belongs to the class J for the model or classifier.
If there are only two classes {0, 1}, the equation for the logarithmic loss function is simplified to
At this point, Yi is the real category of the input instance XI, and pi is the probability that the input instance Xi belongs to Category 1. The logarithmic loss for all samples represents the average of the logarithmic loss for each sample, and for the perfect classifier, the logarithmic loss is 0.
Python Implementation
The logarithmic loss is achieved using both the custom Logloss function and the Sklearn.metrics.log_loss function in the Scikit-learn library, as follows:
#!/usr/bin/env python#-*-Coding:utf8-*-#Author:klchang#date:2018.6.23#Y_true:list, the true labels of input instances#y_pred:list, the probability when the predicted label of input instances equals to 1defLogloss (Y_true, y_pred, eps=1e-15): ImportNumPy as NP#Prepare numpy Array DataY_true =Np.array (y_true) y_pred=Np.array (y_pred)assert(Len (y_true) andLen (y_true) = =Len (y_pred))#Clip y_pred between EPs and 1-epsp = np.clip (y_pred, EPS, 1-EPS) loss= Np.sum (-y_true * Np.log (P)-(1-y_true) * Np.log (1-p))returnLoss/Len (y_true)defunitest (): Y_true= [0, 0, 1, 1] y_pred= [0.1, 0.2, 0.7, 0.99] Print("Use self-defined logloss () in binary classification, the result is {}". Format (Logloss (Y_true, y_pred))) fromSklearn.metricsImportLog_lossPrint("Use Log_loss () in Scikit-learn, the result is {}". Format (Log_loss (Y_true, y_pred)))if __name__=='__main__': Unitest ()
Note: In the implementation, the parameter EPS is added to avoid the error caused by the prediction probability output of 0 or 1; The input parameter of the logarithmic loss function is y_pred to the probability that the prediction instance belongs to Class 1 o'clock; Logarithmic loss is calculated using natural logarithm.
References
1. Log Loss. Http://wiki.fast.ai/index.php/Log_Loss
2. Making sense of logarithmic Loss. https://www.r-bloggers.com/making-sense-of-logarithmic-loss/
3. What is a intuitive explanation for the log loss function. Https://www.quora.com/What-is-an-intuitive-explanation-for-the-log-loss-function
4. Log-loss in Scikit-learn documentation. Http://scikit-learn.org/stable/modules/model_evaluation.html#log-loss
5. Sklearn Documentation-sklearn.metrics.log_loss. Http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html#sklearn.metrics.log_loss
6. Hangyuan Li. Statistical learning methods. Beijing: Tsinghua University Press. 2012
The principle of logarithmic loss function (logarithmic Loss functions) and the implementation of Python