Home / Predictive Modeling & Machine Learning / 203.7.4 The Bagging Algorithm

203.7.4 The Bagging Algorithm

Let’s move forward to the first type of Ensemble Methodology, the Bagging Algorithm.

We will cover the concept behind Bagging and implement it using R.

The Bagging Algorithm

  • The training dataset D
  • Draw k boot strap sample sets from dataset D
  • For each boot strap sample i
    • Build a classifier model \(M_i\)
  • We will have total of k classifiers \(M_1 , M_2 ,… M_k\)
  • Vote over for the final classifier output and take the average for regression output

Why Bagging Works

  • We are selecting records one-at-a-time, returning each selected record back in the population, giving it a chance to be selected again
  • Note that the variance in the consolidated prediction is reduced, if we have independent samples. That way we can reduce the unavoidable errors made by the single model.
  • In a given boot strap sample, some observations have chance to select multiple times and some observations might not have selected at all.
  • There a proven theory that boot strap samples have only 63% of overall population and rest 37% is not present.
  • So the data used in each of these models is not exactly same, This makes our learning models independent. This helps our predictors have the uncorrelated errors.
  • Finally the errors from the individual models cancel out and give us a better ensemble model with higher accuracy
  • Bagging is really useful when there is lot of variance in our data

LAB: Bagging Models

  • Import Boston house price data. It is part of MASS package
  • Get some basic meta details of the data
  • Take 90% data use it for training and take rest 10% as holdout data
  • Build a single linear regression model on the training data.
  • On the hold out data, calculate the error (squared deviation) for the regression model.
  • Build the regression model using bagging technique. Build at least 25 models
  • On the hold out data, calculate the error (squared deviation) for the consolidated bagged regression model.
  • What is the improvement of the bagged model when compared with the single model?

Solution

#Importing Boston  house pricing data. 
library(MASS)
data(Boston)
head(Boston)
##      crim zn indus chas   nox    rm  age    dis rad tax ptratio  black
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83
## 4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63
## 5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90
## 6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12
##   lstat medv
## 1  4.98 24.0
## 2  9.14 21.6
## 3  4.03 34.7
## 4  2.94 33.4
## 5  5.33 36.2
## 6  5.21 28.7
dim(Boston)
## [1] 506  14
##Training and holdout sample
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
set.seed(500)
sampleseed <- createDataPartition(Boston$medv, p=0.9, list=FALSE)

train_boston<-Boston[sampleseed,]
test_boston<-Boston[-sampleseed,]

###Regression Model
reg_model<- lm(medv ~ ., data=train_boston)
summary(reg_model)
## 
## Call:
## lm(formula = medv ~ ., data = train_boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.4763  -2.7684  -0.4912   1.9030  26.4569 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.637e+01  5.534e+00   6.572 1.40e-10 ***
## crim        -1.042e-01  3.513e-02  -2.965 0.003195 ** 
## zn           4.482e-02  1.459e-02   3.073 0.002248 ** 
## indus        1.986e-02  6.566e-02   0.302 0.762462    
## chas         2.733e+00  8.765e-01   3.118 0.001939 ** 
## nox         -1.844e+01  4.018e+00  -4.590 5.79e-06 ***
## rm           3.845e+00  4.670e-01   8.234 2.04e-15 ***
## age          8.782e-04  1.434e-02   0.061 0.951211    
## dis         -1.488e+00  2.096e-01  -7.101 4.94e-12 ***
## rad          2.770e-01  6.993e-02   3.960 8.71e-05 ***
## tax         -1.062e-02  3.944e-03  -2.693 0.007348 ** 
## ptratio     -9.799e-01  1.385e-01  -7.073 5.92e-12 ***
## black        9.620e-03  2.827e-03   3.403 0.000726 ***
## lstat       -5.051e-01  5.706e-02  -8.852  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.787 on 444 degrees of freedom
## Multiple R-squared:  0.7309, Adjusted R-squared:  0.723 
## F-statistic: 92.75 on 13 and 444 DF,  p-value: < 2.2e-16
###Accuracy testing on holdout data
pred_reg<-predict(reg_model, newdata=test_boston[,-14])
reg_err<-sum((test_boston$medv-pred_reg)^2)
reg_err
## [1] 918.5927
###Bagging Ensemble Model
library(ipred)
bagg_model<- bagging(medv ~ ., data=train_boston , nbagg=30)

###Accuracy testing on holout data
pred_bagg<-predict(bagg_model, newdata=test_boston[,-14])
bgg_err<-sum((test_boston$medv-pred_bagg)^2)
bgg_err
## [1] 390.9028
###Overall Improvement
reg_err
## [1] 918.5927
bgg_err
## [1] 390.9028
(reg_err-bgg_err)/reg_err
## [1] 0.5744547
We can see the error of the model has been reduced.

About admin

Check Also

204.7.9 Boosting Conclusion

When Ensemble doesn’t work? The models have to be independent, we can’t build the same …

Leave a Reply

Your email address will not be published. Required fields are marked *