Home / Predictive Modeling & Machine Learning / 203.4.2 Calculating Sensitivity and Specificity in R

# 203.4.2 Calculating Sensitivity and Specificity in R

### Calculating Sensitivity and Specificity

#### Building Logistic Regression Model

``````Fiberbits <- read.csv("C:\\Amrita\\Datavedi\\Fiberbits\\Fiberbits.csv")
Fiberbits_model_1<-glm(active_cust~., family=binomial, data=Fiberbits)``````
``## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred``
``summary(Fiberbits_model_1)``
``````##
## Call:
## glm(formula = active_cust ~ ., family = binomial, data = Fiberbits)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -8.4904  -0.8752   0.4055   0.7619   2.9465
##
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)
## (Intercept)                -1.761e+01  3.008e-01  -58.54   <2e-16 ***
## income                      1.710e-03  8.213e-05   20.82   <2e-16 ***
## months_on_network           2.880e-02  1.005e-03   28.65   <2e-16 ***
## Num_complaints             -6.865e-01  3.010e-02  -22.81   <2e-16 ***
## number_plan_changes        -1.896e-01  7.603e-03  -24.94   <2e-16 ***
## relocated                  -3.163e+00  3.957e-02  -79.93   <2e-16 ***
## monthly_bill               -2.198e-03  1.571e-04  -13.99   <2e-16 ***
## technical_issues_per_month -3.904e-01  7.152e-03  -54.58   <2e-16 ***
## Speed_test_result           2.222e-01  2.378e-03   93.44   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 136149  on 99999  degrees of freedom
## Residual deviance:  98359  on 99991  degrees of freedom
## AIC: 98377
##
## Number of Fisher Scoring iterations: 8``````

### Confusion Matrix

``````threshold=0.5
predicted_values<-ifelse(predict(Fiberbits_model_1,type="response")>threshold,1,0)
actual_values<-Fiberbits_model_1\$y
conf_matrix<-table(predicted_values,actual_values)
conf_matrix``````
``````##                 actual_values
## predicted_values     0     1
##                0 29492 10847
##                1 12649 47012``````

### Code-Sensitivity and Specificity

``library(caret)``
``## Warning: package 'caret' was built under R version 3.1.3``
``## Loading required package: lattice``
``## Loading required package: ggplot2``
``## Warning: package 'ggplot2' was built under R version 3.1.3``
``sensitivity(conf_matrix)``
``## [1] 0.699841``
``specificity(conf_matrix)``
``## [1] 0.812527``

### Changing Threshold

``````threshold=0.8
predicted_values<-ifelse(predict(Fiberbits_model_1,type="response")>threshold,1,0)
actual_values<-Fiberbits_model_1\$y
conf_matrix<-table(predicted_values,actual_values)
conf_matrix``````
``````##                 actual_values
## predicted_values     0     1
##                0 37767 30521
##                1  4374 27338``````

### Changed Sensitivity and Specificity

``sensitivity(conf_matrix)``
``## [1] 0.8962056``
``specificity(conf_matrix)``
``## [1] 0.4724935``

### Sensitivity and Specificity

• By changing the threshold, the good and bad customers classification will be changed hence the sensitivity and specificity will be changed
• Which one of these two we should maximize? What should be ideal threshold?
• Ideally we want to maximize both Sensitivity & Specificity. But this is not possible always. There is always a tradeoff.
• Sometimes we want to be 100% sure on Predicted negatives, sometimes we want to be 100% sure on Predicted positives.
• Sometimes we simply don’t want to compromise on sensitivity sometimes we don’t want to compromise on specificity
• The threshold is set based on business problem

### When Sensitivity is a High Priority

• Predicting a bad customers or defaulters before issuing the loan
• Predicting a bad defaulters before issuing the loan
• The profit on good customer loan is not equal to the loss on one bad customer loan
• The loss on one bad loan might eat up the profit on 100 good customers
• In this case one bad customer is not equal to one good customer.
• If p is probability of default then we would like to set our threshold in such a way that we don’t miss any of the bad customers.
• We set the threshold in such a way that Sensitivity is high
• We can compromise on specificity here. If we wrongly reject a good customer, our loss is very less compared to giving a loan to a bad customer.
• We don’t really worry about the good customers here, they are not harmful hence we can have less Specificity

### When Specificity is a High Priority

• Testing a medicine is good or poisonous
• Testing a medicine is good or poisonous
• In this case, we have to really avoid cases like , Actual medicine is poisonous and model is predicting them as good.
• We can’t take any chance here.
• The specificity need to be near 100.
• The sensitivity can be compromised here. It is not very harmful not to use a good medicine when compared with vice versa case

### Sensitivity vs Specificity – Importance

• There are some cases where Sensitivity is important and need to be near to 1
• There are business cases where Specificity is important and need to be near to 1
• We need to understand the business problem and decide the importance of Sensitivity and Specificity

## ROC Curve

• If we consider all the possible threshold values and the corresponding specificity and sensitivity rate what will be the final model accuracy.
• ROC(Receiver operating characteristic) curve is drawn by taking False positive rate on X-axis and True positive rate on Y- axis
• ROC tells us, how many mistakes are we making to identify all the positives?