Role of Confusion Matrix in Cyber Security

Dheeraj Singhal
3 min readJun 6, 2021

What is a Confusion Matrix?

A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of performance metrics like accuracy, precision, recall, and F1-score.

Confusion matrices are widely used because they give a better idea of a model’s performance than classification accuracy does.

The following 4 are the basic terminology which will help us in determining the metrics we are looking for.

· True Positives (TP): when the actual value is Positive and predicted is also Positive.

· True negatives (TN): when the actual value is Negative and prediction is also Negative.

· False positives (FP): When the actual is negative but prediction is Positive. Also known as the Type 1 error

· False negatives (FN): When the actual is Positive but the prediction is Negative. Also known as the Type 2 error

How to Calculate Confusion Matrix for a 2-class classification problem?

Precision: The precision metric shows the accuracy of the positive class. It measures how likely the prediction of the positive class is correct.

TP/ (TP + FP).

Accuracy: Accuracy is the ratio of Total correct predictions made by the model to total data provided

Overall, how often is the classifier correct?

(TP+TN)/ (TP+TN+FP+FN)

F-1 Score: It is the harmonic mean of Precision and Recall.

2*Precision*Recall/ (Precision + Recall)

Recall: It is the portion of values that are correctly identified as positive by the model.

TP/ (TP+FN)

What is Cyber Crime ?

Cyber crimes are unlawful acts where the computer is used either as a tool or a target or both. The enormous growth in electronic commerce (e-commerce) and online share trading has led to a phenomenal spurt in incidents of cyber crime.

In the present world, cybercrime offenses are happening at an alarming rate. As the use of the Internet is increasing many offenders, make use of this as a means of communication in order to commit a crime. The framework developed in our work is essential to the creation of a model that can support analytics regarding the identification, detection, and classification of integrated cybercrime offenses (structured and unstructured). The main focus of our work is to find the attacks that take advantage of the security vulnerabilities and analyze these attacks by making use of machine learning techniques.

Simple usecase used in Industries to Predict Cyberthreat

Cyber Crime investigation using confusion matrics

True positive (tp), false positive (fp), true negative (tb), and false negative values (fn) are used to calculate the following performance measures:

1. True Positive Rate/recall/sensitivity (tpr): the fraction of malware samples correctly identified as ransomware;

2. False Positive Rate (fpr = 1 — tnr): the fraction of goodware samples incorrectly identified as being malware;

3. True Negative Rate/specificity (tnr): the fraction of goodware samples correctly identified as goodware;

4. False Negative Rate (fnr = 1 — tpr): the fraction of ransomware samples incorrectly classified as goodware; and

5. Accuracy is reported as the fraction of all samples correctly identified. More specifically, Accuracy = tpr+tnr/ tpr+tnr+fpr+fnr ;

6. Precision is calculated as precision = tp/ tp+fp ; and

7. Youdens index is calculated as Y = tpr + tnr − 1

False Positive (Type 1 error) is the most critical value because actually cyber attack happened and machine learning model haven’t informed the organization. And this causes huge losses to the organization. Because they are not able to get information at the right time and they haven’t taken any immediate action after the attack happened.

--

--