In this post we will just revise our understanding of how logistic regression works, which can be considered a building block for a neural network.

### Recap of Logistic Regression

• Categorical output YES/NO type
• Using the predictor variables to predict the categorical output

### LAB: Logistic Regression

• Dataset: Emp_Productivity/Emp_Productivity.csv
• Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3
• Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes)
• Build a logistic regression model to predict Productivity using age and experience
• Finally draw the decision boundary for this logistic regression model
• Create the confusion matrix
• Calculate the accuracy and error rates

### Solution

``Emp_Productivity_raw <- read.csv("C:\\Amrita\\Datavedi\\Emp_Productivity\\Emp_Productivity.csv")``
• Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3
``````Emp_Productivity1<-Emp_Productivity_raw[Emp_Productivity_raw\$Sample_Set<3,]

dim(Emp_Productivity1)``````
``## [1] 74  4``
``names(Emp_Productivity1)``
``## [1] "Age"          "Experience"   "Productivity" "Sample_Set"``
``head(Emp_Productivity1)``
``````##    Age Experience Productivity Sample_Set
## 1 20.0        2.3            0          1
## 2 16.2        2.2            0          1
## 3 20.2        1.8            0          1
## 4 18.8        1.4            0          1
## 5 18.9        3.2            0          1
## 6 16.7        3.9            0          1``````
``table(Emp_Productivity1\$Productivity)``
``````##
##  0  1
## 33 41``````
• Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes)
``````library(ggplot2)
ggplot(Emp_Productivity1)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)``````

– Build a logistic regression model to predict Productivity using age and experience

``````Emp_Productivity_logit<-glm(Productivity~Age+Experience,data=Emp_Productivity1, family=binomial())
Emp_Productivity_logit``````
``````##
## Call:  glm(formula = Productivity ~ Age + Experience, family = binomial(),
##     data = Emp_Productivity1)
##
## Coefficients:
## (Intercept)          Age   Experience
##     -8.9361       0.2763       0.5923
##
## Degrees of Freedom: 73 Total (i.e. Null);  71 Residual
## Null Deviance:       101.7
## Residual Deviance: 46.77     AIC: 52.77``````
``coef(Emp_Productivity_logit)``
``````## (Intercept)         Age  Experience
##  -8.9361114   0.2762749   0.5923444``````
``````slope1 <- coef(Emp_Productivity_logit)[2]/(-coef(Emp_Productivity_logit)[3])
intercept1 <- coef(Emp_Productivity_logit)[1]/(-coef(Emp_Productivity_logit)[3]) ``````
• Finally draw the decision boundary for this logistic regression model
``````library(ggplot2)
base<-ggplot(Emp_Productivity1)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)
base+geom_abline(intercept = intercept1 , slope = slope1, color = "red", size = 2) #Base is the scatter plot. Then we are adding the decision boundary``````

– Create the confusion matrix

``````predicted_values<-round(predict(Emp_Productivity_logit,type="response"),0)
conf_matrix<-table(predicted_values,Emp_Productivity_logit\$y)
conf_matrix``````
``````##
## predicted_values  0  1
##                0 31  2
##                1  2 39``````
• Calculate the accuracy and error rates
``````accuracy<-(conf_matrix[1,1]+conf_matrix[2,2])/(sum(conf_matrix))
accuracy``````
``## [1] 0.9459459``