Detailed explanation of the calculation process of Logloss in Sklearn __ machine learning

Source: Internet
Author: User

Transfer from: The introduction of http://blog.csdn.net/ybdesire/article/details/73695163 problem

With Sklearn, when calculating loglosss, the multiple-class problem is computed with such code (as follows), and an error is made. Where Y_true is the real value, y_pred is the predictive value

Y_true = [0,1,3]
y_pred = [1,2,1]
Log_loss (y_true, y_pred)

valueerror:y_true and y_pred contain different Mber of Classes 3, 2. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 3]
     
     
      
      1
      
      2
      
      3
      
      4
      
      5
     
     

What the hell is going on here?

This problem arises because you do not understand the computational process of Logloss. In the Logloss calculation process, the output must be required to be expressed in one-hot. This problem can be fixed by changing the solution of this onehotencoder problem to the following.

From sklearn.metrics import Log_loss from
sklearn.preprocessing import onehotencoder

one_hot = Onehotencoder (n _values=4, sparse=false)

y_true = One_hot.fit_transform ([0,1,3])
y_pred = One_hot.fit_transform ([1,2,1])
Log_loss (y_true, y_pred)
     
     
      
      1
      
      2
      
      3
      
      4
      
      5
      
      6 7 8
     
     

So, what is the exact calculation process of Logloss? Explained in detail below. Logloss Calculation Detailed

First, let's look at the Logloss formula:

logloss=& #x2212;1n& #x2211;i=1n& #x2211; J=1myi,jlog (pi,j) "role=" presentation "style=" Text-align: Center position:relative; " >logloss=−1n∑i=1n∑j=1myi,jlog (pi,j) logloss=−1n∑i=1n∑j=1myi,jlog (PI,J)

The meanings of each letter in this formula are: N: sample number M: number of categories, such as the above multiple-class example, M is 4 Yij: The I sample belongs to category J when it is 1, otherwise 0 Pij: The probability of the first sample being predicted as Class J

We use the following set of data to illustrate the computational process: Y_true = [0,1,3] y_pred = [1,2,1]

solving Logloss

First, we know that n=3 (3 samples), M=4 (category number 4 (0,1,2,3)).

So, Y and P are all 3x4 matrices:

However, if the P-matrix is made Log,log (0) is infinitely large. Sklearn solves this problem by converting 0 in P to 1e-15 (1 to 15).

p = Array ([[  1.00000000e-15,   1.00000000e+00,   1.00000000e-15,
          1.00000000e-15,   1.00000000e-15,   1.00000000e-15,
          1.00000000e+00,   1.00000000e-15,   1.00000000e-15,
          1.00000000e+00,   1.00000000e-15,   1.00000000e-15]])
     
     
      
      1
      
      2
      
      3
      
      4
     
     

And, after debugging (debug Sklearn Source Method reference this article), also found that Sklearn will logloss calculation formula made a little change, as shown below, the 1/n moved to the p.

logloss=& #x2212;& #x2211;i=1n& #x2211; J=1myi,jlog (1npi,j) "role=" presentation "style=" Text-align: Center position:relative; " >logloss=−∑i=1n∑j=1myi,jlog (1npi,j) logloss=−∑i=1n∑j=1myi,jlog (1NPI,J)

This change corresponds to the source code is

y_pred/= y_pred.sum (Axis=1) [:, Np.newaxis]
     
     
      
      1
     
     

So, these two matrices are converted to:

# above P divided by 3 is this p
p=array ([[  3.33333333e-16,   3.33333333e-01,   3.33333333e-16,
          3.33333333e-16,   3.33333333e-16,   3.33333333e-16,
          3.33333333e-01,   3.33333333e-16,   3.33333333e-16,
          3.33333333e-01,   3.33333333e-16,   3.33333333e-16]]



y = Array ([[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]])
     
     
      
      1< C17/>2
      
      3
      
      4
      
      5
      
      6
      
      7
      
      8 9
     
     

Get Y and P, and use the dot multiplication function below to calculate the value of the Logloss.

Loss =-(Y * Np.log (P)). SUM (Axis=1)
     
     
      
      1
     
     

The final logloss of the

is: 106.91216605. summary To facilitate calculations, the Sklearn converts the number 0 to the Logloss calculation in 1e-15 Sklearn, a little bit different from the traditional Logloss formula reference Sklearn's Log_loss Source: https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/metrics/classification.py #L1544 How to debug a Python third-party library dynamically: http://blog.csdn.net/ybdesire/article/details/54649211

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.