Let us now consider a binary classification task. Our training data is denoted as the set , where
Our data vector can have two outcomes, . Let us call the outcome with a +1 value, be the positive value while the 0 is called the negative value.
For our binary classifier model trained on is denoted as The number of positive samples in is , while the number of negative samples is . The total number of samples is $N$,
Furthermore, we denote the estimates of and given by our classifier as and ,
With these definitions we have that the confusion matrix is defined as
when we divide by the number of samples, then we can give the confusion matrix a probabilistic interpretation
In the special case that our classifier doesn’t make any mistakes, the false negatives and positives are zero, therefore
In the other extreme where the classifier doesn’t make any correct classifications we have
And in the case where the classifier is totally random,
How well did the classifier get the correct labels.
Of the samples, , that the classifier thought were positive, how many are actually correct ?
Recall, is the metric that measures the fraction of positively identified samples,
The metric is the following function of precision and recall.
In the case where =1, then is the harmonic mean of the precision and recall
The Receiver operator curve is the plot generated when one plots the True positive rate vs the False positive rate for a classifier that depends on a parameter .
The area of the ROC curve can be interpreted as the probability that a random sample such that will be classified as a True positive, compared to a False Positive. The area also satisfies the following properties,
Let us suppose that we have a classifier such that,
for a given and . In addition, there exists distributions and that represent the true positive distribution and true negative distributions, respectively, that we are trying to distinguish with our classifier . With this classifier, we have the following values for the confusion matrix ,
In Fig 2, the solid blue area represents the value, while the red area is the values.
With these definitions, let us now compute the area of the ROC, . Note that since as , then , and so we compute the area with the limits ranging from ,
Having written down Recall and precision as conditional probabilities, it is much
By Bayes theorem we have that
In otherwords, we find that