Roc_curve () function analysis of Sklearn _

Roc_curve () function analysis of Sklearn __roc

Last Update:2018-08-20 Source: Internet

Author: User

Tags diff

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When using the Sklearn Roc_curve () function, it is found that the returned results are not the same as imagined, theoretically threshold should take all y_score (i.e. model predictive values). But the results of roc_curve () only output part of the threhold. From the source found the reason.

Initial data:

Y_true = [0, 0, 1, 0, 0, 1, 0, 1, 0, 0]
y_score = [0.31689620142873609, 0.32367439192936548, 0.42600526758001989, 0.38 769987193780364, 0.3667541015524296, 0.39760831479768338, 0.42017521636505745, 0.41936155918127238, 0.33803961944475219, 0.33998332945141224]

The Roc_curve function of Sklearn evaluates to false positive rate and true positive rate and corresponding threshold:

FPR_SKL, tpr_skl, thresholds_skl = Roc_curve (Y_true, Y_score, Drop_intermediate=false)

The calculated values are as follows:

FPR_SKL
[0.          0.14285714  0.14285714  0.14285714  0.28571429  0.42857143  0.57142857 0.85714286  1.        ]

TPR_SKL
[0.          0.14285714  0.14285714  0.14285714  0.28571429  0.42857143
  0.57142857  0.71428571  0.85714286  1.        ]

thresholds_skl
[0.42600527  0.42017522  0.41936156  0.39760831  0.38769987  0.3667541
  0.33998333  0.33803962  0.32367439  0.3168962]

Roc_curve () function

Analyze the Roc_curve () code to see how these three values are calculated, in fact, is the general AUC calculation process.

The first is the _binary_clf_curve () function:

    FPS, TPS, thresholds = _binary_clf_curve (
        y_true, Y_score, Pos_label=pos_label, Sample_weight=sample_weight)

FPS and TPS are the values of the FP and TP in the confusion matrix; thresholds is the result of y_score in reverse order (because of the number of decimal places to keep, so the surface looks different, in fact, is the same). In this example, the value is as follows:

fps = [0, 1, 1, 1, 2, 3, 4, 5, 6, 7] The TPS = [1, 1, 2, 3, 3, 3, 3, 3, 3, 3
]
thresholds = [0.42600526758001989, 0.420 17521636505745, 0.41936155918127238, 0.39760831479768338, 0.38769987193780364, 0.3667541015524296, 0.33998332945141224, 0.33803961944475219, 0.32367439192936548, 0.31689620142873609]

For ease of understanding, the calculation of FPS and TPS is achieved in a more intuitive way:

For threshold in thresholds:
    # is greater than or equal to threshold 1, otherwise 0
    y_prob = [1 if i>=threshold else 0 for I in Y_score]
    # results are correct Result
    = [i==j to I,j in Zip (Y_true, Y_prob)]
    # is predicted to be a positive class
    positive = [i==1 for i in Y_prob]

    TP = [I and J For i,j in zip (result, positive)] # prediction is positive class and predictive correct
    fp = [(Not i) and J for I,j in zip (result, positive)] # predicted to be a positive class and predictive error 
  
   print (Tp.count (True), Fp.count (True))

# output
0 1
1 1
1
2 1 3 2 3 3 3 4 3 5 3
6 3
7 3

Through FPS and TPS, you can calculate the corresponding FPR and TPR, of which-1 is the minimum threshold, that is, all samples are judged as positive, correspondingly, fps[-1] is the sum of negative samples, tpr[-1 is the sum of positive samples. The source code for the corresponding calculation is simplified as follows:

FPR = [I/fps[-1] for I (FPS)] # Fps/fps[-1]
TPR = [i/tps[-1] for I in TPS] # Tps/tps[-1]

drop_intermediate Parameters

Roc_curve () function has the drop_intermediate parameter, the corresponding source code is:

If Drop_intermediate and Len (fps) > 2:
    optimal_idxs = Np.where (Np.r_[true,
                                  np.logical_or (fps, 2),
                                                Np.diff (TPS, 2)),
                                  True] [0]
    fps = fps[optimal_idxs]
    TPS = Tps[optimal_idxs]
    thresholds = THRESHOLDS[OPTIMAL_IDXS]

In this example, the value of the corresponding variable is:

# Take two order difference
Np.diff (fps, 2)
[-1 0 1 0 0 0 0 0  ]
Np.diff (TPS, 2)
[1  0 -1  0  0  0  0  0]

# Fetch or
np.logical_or (Np.diff (FPS, 2), Np.diff (TPS, 2))
[True, False, True, False, False, False, False, False  ]

# Adds a True np.r_[true to the top and the tail
, Np.logical_or (Np.diff (FPS, 2 ), Np.diff (TPS, 2)), True]
[true,  true, False,  true, False, False, False, False, False,  true]

# True is the array subscript
np.where (np.r_[true, Np.logical_or (Np.diff (FPS, 2), Np.diff (TPS, 2), True]) [0]
[0, 1, 3, 9]

Optimal_idxs In fact is the ROC image inflection point, for drawing, only need inflection point. To imagine FPS and TPS as a person's displacement on a graph, the first-order difference is "moving speed" and the second-order difference is "acceleration".

"Roc image" is as follows:

fps = [0, 1, 1, 1, 2, 3, 4, 5, 6, 7] The TPS = [1, 1, 2, 3, 3, 3, 3, 3, 3, 3
]

plt.plot (
    fps,
    TPs,
    ' B ') )
Plt.xlim ([-1, 8])
Plt.ylim ([-1, 8])
Plt.ylabel (' TPS ')
plt.xlabel (' fps ')
plt.show ()

Therefore, the Drop_intermediate parameter is actually optimized for the ROC computing process without affecting the ROC image.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More