Let us now consider a binary classification task. Our training data is denoted as the set , where

Our data vector can have two outcomes, . Let us call the outcome with a +1 value, be the positive value while the 0 is called the negative value.

- Accuracy
- Precision
- Recall
- The F Metric
- The ROC curve
- The link between accuracy, recall and precision

For our binary classifier model trained on is denoted as The number of positive samples in is , while the number of negative samples is . The total number of samples is $N$,

Furthermore, we denote the estimates of and given by our classifier as and ,

With these definitions we have that the confusion matrix is defined as

when we divide by the number of samples, then we can give the confusion matrix a probabilistic interpretation

In the special case that our classifier doesn’t make any mistakes, the false negatives and positives are zero, therefore

In the other extreme where the classifier doesn’t make any correct classifications we have

And in the case where the classifier is totally random,

How well did the classifier get the correct labels.

- a
**perfect**classifier would have . - a
**terrible**classifier has - a
**random**classifier has

Of the samples, , that the classifier thought were positive, how many are actually correct ?

- a
**perfect**classifier would have . - a
**terrible**classifier has - a
**random**classifier has

Recall, is the metric that measures the fraction of positively identified samples,

- a
**perfect**classifier would have . - a
**terrible**classifier has - a
**random**classifier has

The metric is the following function of precision and recall.

clearly

In the case where =1, then is the harmonic mean of the precision and recall

- a
**perfect**classifier would have , - a
**terrible**classifier has , - a
**random**classifier has .

The Receiver operator curve is the plot generated when one plots the True positive rate vs the False positive rate for a classifier that depends on a parameter .

The area of the ROC curve can be interpreted as the probability that a random sample such that will be classified as a True positive, compared to a False Positive. The area also satisfies the following properties,

- a
**perfect**classifier would have , - a
**terrible**classifier has , - a
**random**classifier has .

Let us suppose that we have a classifier such that,

for a given and . In addition, there exists distributions and that represent the true positive distribution and true negative distributions, respectively, that we are trying to distinguish with our classifier . With this classifier, we have the following values for the confusion matrix ,

In Fig 2, the solid blue area represents the value, while the red area is the values.

With these definitions, let us now compute the area of the ROC, . Note that since as , then , and so we compute the area with the limits ranging from ,

Having written down Recall and precision as conditional probabilities, it is much

By **Bayes theorem** we have that

In otherwords, we find that