Home / Python / Predictive Modeling & Machine Learning / 204.1.10 Practice : Multiple Regression with Multicollinearity

# 204.1.10 Practice : Multiple Regression with Multicollinearity

In this practice post we will build a Multiple Regression model and try to improve it by clearing the problem of multicollinearity in the model.

### Practice : Multiple Regression

• Dataset: Webpage_Product_Sales/Webpage_Product_Sales.csv
• Build a model to predict sales using rest of the variables
• Drop the less impacting variables based on p-values.
• Is there any multicollinearity?
• How many variables are there in the final model?
• What is the R-squared of the final model?
• Can you improve the model using same data and variables?
In [57]:
```import pandas as pd
Webpage_Product_Sales.shape
```
Out[57]:
`(675, 12)`
In [58]:
```Webpage_Product_Sales.columns
```
Out[58]:
```Index(['ID', 'DayofMonth', 'Weekday', 'Month', 'Social_Network_Ref_links',
'Special_Discount', 'Holiday', 'Server_Down_time_Sec', 'Web_UI_Score',
'Sales'],
dtype='object')```
In [59]:
```import statsmodels.formula.api as sm
fitted1 = model1.fit()
fitted1.summary()
```
Out[59]:
Dep. Variable: R-squared: Sales 0.818 OLS 0.815 Least Squares 298.4 Wed, 27 Jul 2016 5.54e-238 12:45:36 -6456.7 675 1.294e+04 664 1.299e+04 10 nonrobust
coef std err t P>|t| [95.0% Conf. Int.] 6545.8922 1286.240 5.089 0.000 4020.304 9071.481 -6.2582 11.545 -0.542 0.588 -28.928 16.412 -134.0441 14.009 -9.569 0.000 -161.551 -106.537 1.877e+04 683.077 27.477 0.000 1.74e+04 2.01e+04 4718.3978 402.019 11.737 0.000 3929.016 5507.780 -0.1258 0.944 -0.133 0.894 -1.980 1.728 6.1557 1.002 6.142 0.000 4.188 8.124 6.6841 0.411 16.261 0.000 5.877 7.491 481.0294 41.508 11.589 0.000 399.527 562.532 1355.2153 67.224 20.160 0.000 1223.218 1487.213 47.0579 15.198 3.096 0.002 17.216 76.900
 Omnibus: Durbin-Watson: 40.759 1.356 0 102.136 0.297 6.63e-23 4.811 25700
In [60]:
```#VIF
vif_cal(Webpage_Product_Sales,"Sales")
```
```ID  VIF =  1.18
DayofMonth  VIF =  1.01
Weekday  VIF =  1.0
Month  VIF =  1.19
Clicks_From_Serach_Engine  VIF =  12.08
Special_Discount  VIF =  1.37
Holiday  VIF =  1.38
Server_Down_time_Sec  VIF =  1.02
Web_UI_Score  VIF =  1.02
```
In [61]:
```##Dropped Clicks_From_Serach_Engine based on VIF

import statsmodels.formula.api as sm
fitted2 = model2.fit()
fitted2.summary()
```
Out[61]:
Dep. Variable: R-squared: Sales 0.818 OLS 0.815 Least Squares 332.0 Wed, 27 Jul 2016 2.98e-239 12:48:18 -6456.7 675 1.293e+04 665 1.298e+04 9 nonrobust
coef std err t P>|t| [95.0% Conf. Int.] 6598.7469 1222.658 5.397 0.000 4198.012 8999.482 -6.3332 11.523 -0.550 0.583 -28.959 16.293 -133.9518 13.981 -9.581 0.000 -161.405 -106.499 1.877e+04 681.292 27.557 0.000 1.74e+04 2.01e+04 4713.9295 400.323 11.775 0.000 3927.881 5499.978 6.0279 0.291 20.740 0.000 5.457 6.599 6.6872 0.410 16.307 0.000 5.882 7.492 480.6876 41.398 11.611 0.000 399.401 561.974 1355.2536 67.174 20.175 0.000 1223.355 1487.152 47.0168 15.184 3.097 0.002 17.203 76.831
 Omnibus: Durbin-Watson: 40.826 1.356 0 102.313 0.298 6.07e-23 4.812 19400
In [62]:
```#VIF for the updated model
vif_cal(Webpage_Product_Sales.drop(["Clicks_From_Serach_Engine"],axis=1),"Sales")
```
```ID  VIF =  1.18
DayofMonth  VIF =  1.01
Weekday  VIF =  1.0
Month  VIF =  1.19
Special_Discount  VIF =  1.36
Holiday  VIF =  1.38
Server_Down_time_Sec  VIF =  1.02
Web_UI_Score  VIF =  1.02
```
In [63]:
```##Drop the less impacting variables based on p-values.
##Dropped Web_UI_Score based on P-value

import statsmodels.formula.api as sm
fitted3 = model3.fit()
fitted3.summary()
```
Out[63]:
Dep. Variable: R-squared: Sales 0.818 OLS 0.816 Least Squares 373.9 Wed, 27 Jul 2016 1.74e-240 12:49:15 -6456.9 675 1.293e+04 666 1.297e+04 8 nonrobust
coef std err t P>|t| [95.0% Conf. Int.] 6101.1539 821.286 7.429 0.000 4488.532 7713.776 -134.0717 13.972 -9.596 0.000 -161.507 -106.637 1.874e+04 678.528 27.623 0.000 1.74e+04 2.01e+04 4726.1858 399.491 11.831 0.000 3941.771 5510.600 6.0357 0.290 20.802 0.000 5.466 6.605 6.6738 0.409 16.312 0.000 5.870 7.477 479.5231 41.322 11.605 0.000 398.386 560.660 1354.4252 67.122 20.179 0.000 1222.629 1486.221 46.9564 15.175 3.094 0.002 17.159 76.754
 Omnibus: Durbin-Watson: 41.049 1.352 0 103.243 0.298 3.81e-23 4.821 13100
In [65]:
```#How many variables are there in the final model?
8
```
Out[65]:
`8`
In [69]:
```#What is the R-squared of the final model?
fitted3.rsquared
```
Out[69]:
`0.8178742020411971`