Home / Python / Predictive Modeling & Machine Learning / 204.1.11 Interaction Terms

# 204.1.11 Interaction Terms

This is the final post in our Linear Regression Series.

This post is about a trick called Interaction Terms, which may improve the accuracy of the model.

## Interaction Terms

• Interaction terms are when we use a derived variable from one or more per-existing variables, it can be multiple or division of these variables.
• Adding interaction terms might help in improving the prediction accuracy of the model.
• The addition of interaction terms needs prior knowledge of the dataset and variables.

### Practice : Interaction Terms

• Add few interaction terms to previous web product sales model and see the increase in the accuracy.
In [70]:
```import statsmodels.formula.api as sm
fitted4 = model4.fit()
fitted4.summary()
```
Out[70]:
Dep. Variable: R-squared: Sales 0.865 OLS 0.863 Least Squares 473.6 Wed, 27 Jul 2016 2.17e-282 12:59:08 -6355.7 675 1.273e+04 665 1.278e+04 9 nonrobust
coef std err t P>|t| [95.0% Conf. Int.] 6753.6923 708.791 9.528 0.000 5361.955 8145.430 -140.4922 12.044 -11.665 0.000 -164.141 -116.844 2201.8694 1232.336 1.787 0.074 -217.870 4621.608 4749.0044 344.145 13.799 0.000 4073.262 5424.747 5.9515 0.250 23.805 0.000 5.461 6.442 7.0657 0.353 19.994 0.000 6.372 7.760 480.3156 35.597 13.493 0.000 410.420 550.212 1164.8864 59.143 19.696 0.000 1048.756 1281.017 47.0967 13.073 3.603 0.000 21.428 72.766 4294.6865 281.683 15.247 0.000 3741.592 4847.782
 Omnibus: Durbin-Watson: 7.552 0.867 0.023 7.305 0.219 0.0259 2.74 23200

## Conclusion – Regression

• Try adding the polynomial & interaction terms to your regression line. Sometimes they work like a charm.
• Adjusted R-squared is a good measure of training/in time sample error. We can’t be sure about the final model performance based on this. We may have to perform cross-validation to get an idea on testing error.
• Outlies can influence the regression line, we need to take care of data sanitization before building the regression line.