Home / Python / Predictive Modeling & Machine Learning / 204.1.11 Interaction Terms

204.1.11 Interaction Terms

This is the final post in our Linear Regression Series.

This post is about a trick called Interaction Terms, which may improve the accuracy of the model.

Interaction Terms

  • Interaction terms are when we use a derived variable from one or more per-existing variables, it can be multiple or division of these variables.
  • Adding interaction terms might help in improving the prediction accuracy of the model.
  • The addition of interaction terms needs prior knowledge of the dataset and variables.

Practice : Interaction Terms

  • Add few interaction terms to previous web product sales model and see the increase in the accuracy.
In [70]:
import statsmodels.formula.api as sm
model4 = sm.ols(formula='Sales ~ Server_Down_time_Sec+Holiday+Special_Discount+Online_Ad_Paid_ref_links+Social_Network_Ref_links+Month+Weekday+DayofMonth+Holiday*Weekday', data=Webpage_Product_Sales)
fitted4 = model4.fit()
fitted4.summary()
Out[70]:
OLS Regression Results
Dep. Variable: Sales R-squared: 0.865
Model: OLS Adj. R-squared: 0.863
Method: Least Squares F-statistic: 473.6
Date: Wed, 27 Jul 2016 Prob (F-statistic): 2.17e-282
Time: 12:59:08 Log-Likelihood: -6355.7
No. Observations: 675 AIC: 1.273e+04
Df Residuals: 665 BIC: 1.278e+04
Df Model: 9
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 6753.6923 708.791 9.528 0.000 5361.955 8145.430
Server_Down_time_Sec -140.4922 12.044 -11.665 0.000 -164.141 -116.844
Holiday 2201.8694 1232.336 1.787 0.074 -217.870 4621.608
Special_Discount 4749.0044 344.145 13.799 0.000 4073.262 5424.747
Online_Ad_Paid_ref_links 5.9515 0.250 23.805 0.000 5.461 6.442
Social_Network_Ref_links 7.0657 0.353 19.994 0.000 6.372 7.760
Month 480.3156 35.597 13.493 0.000 410.420 550.212
Weekday 1164.8864 59.143 19.696 0.000 1048.756 1281.017
DayofMonth 47.0967 13.073 3.603 0.000 21.428 72.766
Holiday:Weekday 4294.6865 281.683 15.247 0.000 3741.592 4847.782
Omnibus: 7.552 Durbin-Watson: 0.867
Prob(Omnibus): 0.023 Jarque-Bera (JB): 7.305
Skew: 0.219 Prob(JB): 0.0259
Kurtosis: 2.740 Cond. No. 2.32e+04

Conclusion – Regression

  • Try adding the polynomial & interaction terms to your regression line. Sometimes they work like a charm.
  • Adjusted R-squared is a good measure of training/in time sample error. We can’t be sure about the final model performance based on this. We may have to perform cross-validation to get an idea on testing error.
  • Outlies can influence the regression line, we need to take care of data sanitization before building the regression line.

About admin

Check Also

204.7.6 Practice : Random Forest

Let’s implement the concept of Random Forest into practice using Python. Practice : Random Forest …

Leave a Reply

Your email address will not be published. Required fields are marked *