Home / Python / Predictive Modeling & Machine Learning / 204.5.3 Practice : Non Linear Decision Boundary

# 204.5.3 Practice : Non Linear Decision Boundary

Linear decision boundaries is not always way to go, as our data can have polynomial boundary too. In this post we will just see what happens if we try to use a linear function to classify a bit complex data.

## Non-Linear Decision Boundaries

• Dataset: “Emp_Productivity/ Emp_Productivity.csv”
• Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes)
• Build a logistic regression model to predict Productivity using age and experience
• Finally draw the decision boundary for this logistic regression model
• Create the confusion matrix
• Calculate the accuracy and error rates

We are considering the entire data not just the subset

In [12]:
```Emp_Productivity_raw = pd.read_csv("datasets\\Emp_Productivity\\Emp_Productivity.csv")
```
In [13]:
```#plotting the overall data
import matplotlib.pyplot as plt

fig = plt.figure()

ax.scatter(Emp_Productivity_raw.Age[Emp_Productivity_raw.Productivity==0],Emp_Productivity_raw.Experience[Emp_Productivity_raw.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax.scatter(Emp_Productivity_raw.Age[Emp_Productivity_raw.Productivity==1],Emp_Productivity_raw.Experience[Emp_Productivity_raw.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.legend(loc='upper left');
plt.show()
```
In [14]:
```###Logistic Regerssion model1
import statsmodels.formula.api as sm
model = sm.logit(formula='Productivity ~ Age+Experience', data=Emp_Productivity_raw)
fitted = model.fit()
fitted.summary()
```
```Optimization terminated successfully.
Current function value: 0.632202
Iterations 5
```
Out[14]:
Dep. Variable: No. Observations: Productivity 119 Logit 116 MLE 2 Tue, 15 Nov 2016 0.03361 16:08:50 -75.232 True -77.848 0.07307
coef std err z P>|z| [95.0% Conf. Int.] 0.4478 0.699 0.641 0.522 -0.921 1.817 -0.0176 0.038 -0.459 0.646 -0.092 0.057 -0.0632 0.091 -0.698 0.485 -0.241 0.114
In [15]:
```#coefficients
coef=fitted.normalized_cov_params
coef
```
Out[15]:
Intercept Age Experience
Intercept 0.488120 -0.022329 0.030775
Age -0.022329 0.001461 -0.002995
Experience 0.030775 -0.002995 0.008210
In [16]:
```# getting slope and intercept of the line
slope=coef.Intercept[1]/(-coef.Intercept[2])
intercept=coef.Intercept[0]/(-coef.Intercept[2])
print('Slope :', slope)
print('Intercept :', intercept)
```
```Slope : 0.725542552217
Intercept : -15.8607950797
```
In [17]:
```#Finally draw the decision boundary for this logistic regression model
fig = plt.figure()

ax.scatter(Emp_Productivity_raw.Age[Emp_Productivity_raw.Productivity==0],Emp_Productivity_raw.Experience[Emp_Productivity_raw.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax.scatter(Emp_Productivity_raw.Age[Emp_Productivity_raw.Productivity==1],Emp_Productivity_raw.Experience[Emp_Productivity_raw.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.legend(loc='upper left');

x_min, x_max = ax.get_xlim()
ax.plot([0, x_max], [intercept, x_max*slope+intercept])
plt.show()
```
• We can see above that the linear boundary layer is so bad in distinguising the classes.

accuracy and error

In [18]:
```#Create the confusion matrix
#predicting values
predicted_values=fitted.predict(Emp_Productivity_raw[["Age"]+["Experience"]])
predicted_values[1:10]

#Lets convert them to classes using a threshold
threshold=0.5
threshold

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

#Predcited Classes
predicted_class[1:10]

from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Productivity_raw[['Productivity']],predicted_class)
ConfusionMatrix
```
Out[18]:
```array([[69,  7],
[43,  0]])```
In [19]:
```#Accuracy and Error
accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy : ', accuracy)

error=1-accuracy
print('Error: ',error)
```
```Accuracy :  0.579831932773
Error:  0.420168067227
```
• We can see we have achieved a very bad Accuracy on this model, this is due to classes not having a linear boundary.

In next post we will see the issues with non linear decision boundaries.