## Multicollinearity

- When the relation between X and Y is non linear, we use logistic regression
- The multicollinearity is an issue related to predictor variables. = Multicollinearity need to be fixed in logistic regression as well.
- Otherwise the individual coefficients of the predictors will be effected by the interdependency
- The process of identification is same as linear regression

### Multicollinearity in R

`library(car)`

`## Warning: package 'car' was built under R version 3.1.3`

`vif(Fiberbits_model_1)`

```
## income months_on_network
## 4.590705 4.641040
## Num_complaints number_plan_changes
## 1.018607 1.126892
## relocated monthly_bill
## 1.145847 1.017565
## technical_issues_per_month Speed_test_result
## 1.020648 1.206999
```

### Individual Impact of Variables

- Out of these predictor variables, what are the important variables?
- If we have to choose the top 5 variables what are they?
- While selecting the model, we may want to drop few less impacting variables.
- How to rank the predictor variables in the order of their importance?
- We can simply look at the z values of the each variable. Look at their absolute values
- Or calculate the Wald chi-square, which is nearly equal to square of the z-score
- Wald Chi-Square value helps in ranking the variables

### Code-Individual Impact of Variables

`library(caret)`

`## Warning: package 'caret' was built under R version 3.1.3`

`## Loading required package: lattice`

`## Loading required package: ggplot2`

`## Warning: package 'ggplot2' was built under R version 3.1.3`

`varImp(Fiberbits_model_1, scale = FALSE)`

```
## Overall
## income 20.81981
## months_on_network 28.65421
## Num_complaints 22.81102
## number_plan_changes 24.93955
## relocated 79.92677
## monthly_bill 13.99490
## technical_issues_per_month 54.58123
## Speed_test_result 93.43471
```

This will give the absolute value of the Z-score

### Model Selection – AIC and BIC

- AIC and BIC values are like adjusted R-squared values in linear regression
- Stand-alone model AIC has no real use, but if we are choosing between the models AIC really helps.
- Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models
- If we are choosing between two models, a model with less AIC is preferred
- AIC is an estimate of the information lost when a given model is used to represent the process that generates the data
- AIC= -2ln(L)+ 2k
- L be the maximum value of the likelihood function for the model
- k is the number of independent variables
- BIC is a substitute to AIC with a slightly different formula. We will follow either AIC or BIC throughout our analysis

### Code-AIC and BIC

```
library(stats)
AIC(Fiberbits_model_1)
```

`## [1] 98377.36`

`BIC(Fiberbits_model_1)`

`## [1] 98462.97`