Home / Python / Predictive Modeling & Machine Learning / 204.7.6 Practice : Random Forest

# 204.7.6 Practice : Random Forest

Let’s implement the concept of Random Forest into practice using Python.

### Practice : Random Forest

• Dataset: /Car Accidents IOT/Train.csv
• Build a decision tree model to predict the fatality of accident
• Build a decision tree model on the training data.
• On the test data, calculate the classification error and accuracy.
• Build a random forest model on the training data.
• On the test data, calculate the classification error and accuracy.
• What is the improvement of the Random Forest model when compared with the single tree?
In [10]:
```#Importing dataset
```
In [11]:
```from sklearn import tree

var=list(car_train.columns[1:22])
c=car_train[var]
d=car_train['Fatal']

###buildng Decision tree on the training data ####
clf = tree.DecisionTreeClassifier()
clf.fit(c,d)
```
Out[11]:
```DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')```
In [12]:
```#####predicting on test data ####
tree_predict=clf.predict(car_test[var])
```
In [13]:
```from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm1 = confusion_matrix(car_test[['Fatal']],tree_predict)
print(cm1)
```
```[[3244  648]
[ 695 4478]]
```
In [14]:
```#####from confusion matrix calculate accuracy
total1=sum(sum(cm1))
accuracy_tree=(cm1[0,0]+cm1[1,1])/total1
accuracy_tree
```
Out[14]:
`0.85184776613348046`
In [15]:
```from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm1 = confusion_matrix(car_test[['Fatal']],tree_predict)
print(cm1)
total1=sum(sum(cm1))
#####from confusion matrix calculate accuracy
accuracy_tree=(cm1[0,0]+cm1[1,1])/total1
accuracy_tree
```
```[[3244  648]
[ 695 4478]]
```
Out[15]:
`0.85184776613348046`
In [16]:
```### accuracy_score() also gives the same result[using confusion matrix]
from sklearn.metrics import accuracy_score
accuracy_score(car_test[['Fatal']],tree_predict, normalize=True, sample_weight=None)
```
Out[16]:
`0.85184776613348046`
In [17]:
```####buliding a random forest classifier on training data#####
from sklearn.ensemble import RandomForestClassifier
forest=RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

forest.fit(c,d)
```
Out[17]:
```RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
oob_score=False, random_state=None, verbose=0,
warm_start=False)```
In [18]:
```###predicting on test data with RF model
forestpredict_test=forest.predict(car_test[var])
e=car_test['Fatal']
```
In [19]:
```###check the accuracy on test data
from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm2 = confusion_matrix(car_test[['Fatal']],forestpredict_test)
print(cm2)
total2=sum(sum(cm2))
#####from confusion matrix calculate accuracy
accuracy_forest=(cm2[0,0]+cm2[1,1])/total2
accuracy_forest
```
```[[3383  509]
[ 471 4702]]
```
Out[19]:
`0.89189189189189189`
• We can see an improvement in the Accuracy