Home / Predictive Modeling & Machine Learning / 203.5.2 Decision Boundary – Logistic Regression

# 203.5.2 Decision Boundary – Logistic Regression

## Decision Boundary

### Decision Boundary – Logistic Regression

• The line or margin that separates the classes
• Classification algorithms are all about finding the decision boundaries
• It need not be straight line always
• The final function of our decision boundary looks like
• Y=1 if $$w^Tx+w_0>0$$ ; else Y=0
• In logistic regression, it can be derived from the logistic regression coefficients and the threshold.
• Imagine the logistic regression line p(y)=$$e^(b_0+b_1x_1+b_2x_2)/1+exp^(b_0+b_1x_1+b_2x_2)$$
• Suppose if p(y)>0.5 then class-1 or else class-0
• $$log(y/1-y)=b_0+b_1x_1+b_2x_2$$
• $$Log(0.5/0.5)=b_0+b_1x_1+b_2x_2$$
• $$0=b_0+b_1x_1+b_2x_2$$
• $$b_0+b_1x_1+b_2x_2=0 is the line$$
• Rewriting it in mx+c form
• $$X_2=(-b_1/b_2)X_1+(-b_0/b_2)$$
• Anything above this line is class-1, below this line is class-0
• $$X_2>(-b_1/b_2)X_1+(-b_0/b_2)$$ is class-1
• $$X_2<(-b_1/b_2)X_1+(-b_0/b_2)$$ is class-0
• $$X_2=(-b_1/b_2)X_1+(-b_0/b_2)$$ tie probability of 0.5
• We can change the decision boundary by changing the threshold value(here 0.5)

### LAB: Decision Boundary

• Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes)
• Build a logistic regression model to predict Productivity using age and experience
• Finally draw the decision boundary for this logistic regression model
• Create the confusion matrix
• Calculate the accuracy and error rates

### Solution

• Drawing the Decision boundary for the logistic regression model
library(ggplot2)
base<-ggplot(Emp_Productivity1)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)
base+geom_abline(intercept = intercept1 , slope = slope1, color = "red", size = 2) 

#Base is the scatter plot. Then we are adding the decision boundary
• Accuracy of the model1
predicted_values<-round(predict(Emp_Productivity_logit,type="response"),0)
conf_matrix<-table(predicted_values,Emp_Productivity_logit\$y)
conf_matrix
##
## predicted_values  0  1
##                0 31  2
##                1  2 39
accuracy<-(conf_matrix[1,1]+conf_matrix[2,2])/(sum(conf_matrix))
accuracy
## [1] 0.9459459

## New Representation for Logistic Regression

$y=\frac{e^(b_0+b_1x_1+b_2x_2)}{1+e^(b_0+b_1x_1+b_2x_2)}$ $y=\frac{1}{1+e^-(b_0+b_1x_1+b_2x_2)}$ $y=g(w_0+w_1x_1+w_2x_2) where g(x)=\frac{1}{1+e^-(x)}$ $y=g(\sum w_kx_k)$

#### Finding the weights in logistic regression

out(x) = $$y=g(\sum w_kx_k)$$

The above output is a non linear function of linear combination of inputs – A typical multiple logistic regression line

We find w to minimize $$\sum_{i=1}^n [y_i – g(\sum w_kx_k)]^2$$