## Decision Boundary

### Decision Boundary – Logistic Regression

- The line or margin that separates the classes
- Classification algorithms are all about finding the decision boundaries
- It need not be straight line always
- The final function of our decision boundary looks like
- Y=1 if \(w^Tx+w_0>0\) ; else Y=0

- In logistic regression, it can be derived from the logistic regression coefficients and the threshold.
- Imagine the logistic regression line p(y)=\(e^(b_0+b_1x_1+b_2x_2)/1+exp^(b_0+b_1x_1+b_2x_2)\)
- Suppose if p(y)>0.5 then class-1 or else class-0
- \(log(y/1-y)=b_0+b_1x_1+b_2x_2\)
- \(Log(0.5/0.5)=b_0+b_1x_1+b_2x_2\)
- \(0=b_0+b_1x_1+b_2x_2\)
- \(b_0+b_1x_1+b_2x_2=0 is the line\)

- Rewriting it in mx+c form
- \(X_2=(-b_1/b_2)X_1+(-b_0/b_2)\)

- Anything above this line is class-1, below this line is class-0
- \(X_2>(-b_1/b_2)X_1+(-b_0/b_2)\) is class-1
- \(X_2<(-b_1/b_2)X_1+(-b_0/b_2)\) is class-0
- \(X_2=(-b_1/b_2)X_1+(-b_0/b_2)\) tie probability of 0.5

- We can change the decision boundary by changing the threshold value(here 0.5)

### LAB: Decision Boundary

- Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes)
- Build a logistic regression model to predict Productivity using age and experience
- Finally draw the decision boundary for this logistic regression model
- Create the confusion matrix
- Calculate the accuracy and error rates

### Solution

- Drawing the Decision boundary for the logistic regression model

```
library(ggplot2)
base<-ggplot(Emp_Productivity1)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)
base+geom_abline(intercept = intercept1 , slope = slope1, color = "red", size = 2)
```

`#Base is the scatter plot. Then we are adding the decision boundary`

- Accuracy of the model1

```
predicted_values<-round(predict(Emp_Productivity_logit,type="response"),0)
conf_matrix<-table(predicted_values,Emp_Productivity_logit$y)
conf_matrix
```

```
##
## predicted_values 0 1
## 0 31 2
## 1 2 39
```

```
accuracy<-(conf_matrix[1,1]+conf_matrix[2,2])/(sum(conf_matrix))
accuracy
```

`## [1] 0.9459459`

## New Representation for Logistic Regression

\[y=\frac{e^(b_0+b_1x_1+b_2x_2)}{1+e^(b_0+b_1x_1+b_2x_2)}\] \[y=\frac{1}{1+e^-(b_0+b_1x_1+b_2x_2)}\] \[y=g(w_0+w_1x_1+w_2x_2) where g(x)=\frac{1}{1+e^-(x)}\] \[y=g(\sum w_kx_k)\]

#### Finding the weights in logistic regression

out(x) = \(y=g(\sum w_kx_k)\)

The above output is a non linear function of linear combination of inputs – A typical multiple logistic regression line

We find w to minimize \(\sum_{i=1}^n [y_i – g(\sum w_kx_k)]^2\)