Home / Python / Predictive Modeling & Machine Learning / 204.5.1 Neural Networks : A Recap of Logistic Regression

204.5.1 Neural Networks : A Recap of Logistic Regression

Welcome to this Blog series on Neural Networks. In the series 204.5 we will go from basics of neural networks to build a neural network model that recognizes digit images and reads them correctly.

In this post we will just revise our understanding of how logistic regression works, which can be considered a building block for a neural network.

Recap of Logistic Regression

  • Categorical output YES/NO type
  • Using the predictor variables to predict the categorical output

Practice : Logistic Regression

  • Dataset: Emp_Productivity/Emp_Productivity.csv
  • Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3
  • Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes)
  • Build a logistic regression model to predict Productivity using age and experience
  • Finally draw the decision boundary for this logistic regression model
  • Create the confusion matrix
  • Calculate the accuracy and error rates

Solution

In [1]:
import pandas as pd
Emp_Productivity_raw = pd.read_csv("datasets\\Emp_Productivity\\Emp_Productivity.csv")
Emp_Productivity_raw.head(10)
Out[1]:
Age Experience Productivity Sample_Set
0 20.0 2.3 0 1
1 16.2 2.2 0 1
2 20.2 1.8 0 1
3 18.8 1.4 0 1
4 18.9 3.2 0 1
5 16.7 3.9 0 1
6 16.3 1.4 0 1
7 20.0 1.4 0 1
8 18.0 3.6 0 1
9 21.2 4.3 0 1
In [2]:
#Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3
Emp_Productivity1=Emp_Productivity_raw[Emp_Productivity_raw.Sample_Set<3]
Emp_Productivity1.shape
Out[2]:
(74, 4)
In [3]:
#frequency table of Productivity variable
Emp_Productivity1.Productivity.value_counts()
Out[3]:
1    41
0    33
Name: Productivity, dtype: int64
In [4]:
####The clasification graph
#Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes.
import matplotlib.pyplot as plt
%matplotlib inline

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==0],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==1],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.legend(loc='upper left');
plt.show()
In [5]:
#predict Productivity using age and experience
###Logistic Regerssion model1
import statsmodels.formula.api as sm
model1 = sm.logit(formula='Productivity ~ Age+Experience', data=Emp_Productivity1)
fitted1 = model1.fit()
fitted1.summary()
Optimization terminated successfully.
         Current function value: 0.315987
         Iterations 7
Out[5]:
Logit Regression Results
Dep. Variable: Productivity No. Observations: 74
Model: Logit Df Residuals: 71
Method: MLE Df Model: 2
Date: Tue, 15 Nov 2016 Pseudo R-squ.: 0.5402
Time: 16:08:12 Log-Likelihood: -23.383
converged: True LL-Null: -50.860
LLR p-value: 1.167e-12
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -8.9361 2.061 -4.335 0.000 -12.976 -4.896
Age 0.2763 0.105 2.620 0.009 0.070 0.483
Experience 0.5923 0.298 1.988 0.047 0.008 1.176
In [6]:
#coefficients
coef=fitted1.normalized_cov_params
print(coef)
            Intercept       Age  Experience
Intercept    4.249138 -0.184321    0.030957
Age         -0.184321  0.011118   -0.017256
Experience   0.030957 -0.017256    0.088759
In [7]:
# getting slope and intercept of the line
slope1=coef.Intercept[1]/(-coef.Intercept[2])
intercept1=coef.Intercept[0]/(-coef.Intercept[2])
slope1
intercept1
Out[7]:
-137.26024805820899
In [8]:
#Finally draw the decision boundary for this logistic regression model
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==0],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==1],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.legend(loc='upper left');

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1])
ax1.set_xlim([15,35])
ax1.set_ylim([0,10])
plt.show()
  • Accuracy of the model
In [9]:
#Predicting classes
predicted_values=fitted1.predict(Emp_Productivity1[["Age"]+["Experience"]])
predicted_values[1:10]

threshold=0.5
threshold

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

predicted_class
Out[9]:
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
In [10]:
#Confusion Matrix, Accuracy and Error
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Productivity1[['Productivity']],predicted_class)
print('Confusion Matrix :', ConfusionMatrix)
accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy : ',accuracy)
error=1-accuracy
print('Error: ',error)
Confusion Matrix : [[31  2]
 [ 2 39]]
Accuracy :  0.945945945946
Error:  0.0540540540541

About admin

Check Also

204.7.6 Practice : Random Forest

Let’s implement the concept of Random Forest into practice using Python. Practice : Random Forest …

Leave a Reply

Your email address will not be published. Required fields are marked *