Detailed explanation of the calculation process of Logloss in Sklearn _

Detailed explanation of the calculation process of Logloss in Sklearn __ machine learning

Last Update:2018-08-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transfer from: The introduction of http://blog.csdn.net/ybdesire/article/details/73695163 problem

With Sklearn, when calculating loglosss, the multiple-class problem is computed with such code (as follows), and an error is made. Where Y_true is the real value, y_pred is the predictive value

Y_true = [0,1,3]
y_pred = [1,2,1]
Log_loss (y_true, y_pred)

valueerror:y_true and y_pred contain different Mber of Classes 3, 2. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1 3]
     
     
      
      1
      
      2
      
      3
      
      4
      
      5

What the hell is going on here?

This problem arises because you do not understand the computational process of Logloss. In the Logloss calculation process, the output must be required to be expressed in one-hot. This problem can be fixed by changing the solution of this onehotencoder problem to the following.

From sklearn.metrics import Log_loss from
sklearn.preprocessing import onehotencoder

one_hot = Onehotencoder (n _values=4, sparse=false)

y_true = One_hot.fit_transform ([0,1,3])
y_pred = One_hot.fit_transform ([1,2,1])
Log_loss (y_true, y_pred)
     
     
      
      1
      
      2
      
      3
      
      4
      
      5
      
      6 7 8

So, what is the exact calculation process of Logloss? Explained in detail below. Logloss Calculation Detailed

First, let's look at the Logloss formula:

logloss=& #x2212;1n& #x2211;i=1n& #x2211; J=1myi,jlog (pi,j) "role=" presentation "style=" Text-align: Center position:relative; " >logloss=−1n∑i=1n∑j=1myi,jlog (pi,j) logloss=−1n∑i=1n∑j=1myi,jlog (PI,J)

The meanings of each letter in this formula are: N: sample number M: number of categories, such as the above multiple-class example, M is 4 Yij: The I sample belongs to category J when it is 1, otherwise 0 Pij: The probability of the first sample being predicted as Class J

We use the following set of data to illustrate the computational process: Y_true = [0,1,3] y_pred = [1,2,1]

solving Logloss

First, we know that n=3 (3 samples), M=4 (category number 4 (0,1,2,3)).

So, Y and P are all 3x4 matrices:

However, if the P-matrix is made Log,log (0) is infinitely large. Sklearn solves this problem by converting 0 in P to 1e-15 (1 to 15).

p = Array ([[  1.00000000e-15,   1.00000000e+00,   1.00000000e-15,
          1.00000000e-15,   1.00000000e-15,   1.00000000e-15,
          1.00000000e+00,   1.00000000e-15,   1.00000000e-15,
          1.00000000e+00,   1.00000000e-15,   1.00000000e-15]])
     
     
      
      1
      
      2
      
      3
      
      4

And, after debugging (debug Sklearn Source Method reference this article), also found that Sklearn will logloss calculation formula made a little change, as shown below, the 1/n moved to the p.

logloss=& #x2212;& #x2211;i=1n& #x2211; J=1myi,jlog (1npi,j) "role=" presentation "style=" Text-align: Center position:relative; " >logloss=−∑i=1n∑j=1myi,jlog (1npi,j) logloss=−∑i=1n∑j=1myi,jlog (1NPI,J)

This change corresponds to the source code is

y_pred/= y_pred.sum (Axis=1) [:, Np.newaxis]
     
     
      
      1

So, these two matrices are converted to:

# above P divided by 3 is this p
p=array ([[  3.33333333e-16,   3.33333333e-01,   3.33333333e-16,
          3.33333333e-16,   3.33333333e-16,   3.33333333e-16,
          3.33333333e-01,   3.33333333e-16,   3.33333333e-16,
          3.33333333e-01,   3.33333333e-16,   3.33333333e-16]]



y = Array ([[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1]])
     
     
      
      1< C17/>2
      
      3
      
      4
      
      5
      
      6
      
      7
      
      8 9

Get Y and P, and use the dot multiplication function below to calculate the value of the Logloss.

Loss =-(Y * Np.log (P)). SUM (Axis=1)
     
     
      
      1

The final logloss of the

is: 106.91216605. summary To facilitate calculations, the Sklearn converts the number 0 to the Logloss calculation in 1e-15 Sklearn, a little bit different from the traditional Logloss formula reference Sklearn's Log_loss Source: https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/metrics/classification.py #L1544 How to debug a Python third-party library dynamically: http://blog.csdn.net/ybdesire/article/details/54649211

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Detailed explanation of the calculation process of Logloss in Sklearn __ machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Detailed explanation of the calculation process of Logloss in Sklearn __ machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support