Home / Python / Predictive Modeling & Machine Learning / 204.4.5 What is a Best Model?

204.4.5 What is a Best Model?

What is a best model? How to build?

  • A model with maximum accuracy /least error
  • A model that uses maximum information available in the given data
  • A model that has minimum squared error
  • A model that captures all the hidden patterns in the data
  • A model that produces the best perdition results

Model Selection

  • How to build/choose a best model?
  • Error on the training data is not a good meter of performance on future data
  • How to select the best model out of the set of available models ?
  • Are there any methods/metrics to choose best model?
  • What is training error? What is testing error? What is hold out sample error?

Practice : The Most Accurate Model

  • Data: Fiberbits/Fiberbits.csv
  • Build a decision tree to predict active_user
  • What is the accuracy of your model?
  • Grow the tree as much as you can and achieve 95% accuracy.

Solution

In [13]:
#Preparing the X and y to train the model
features = list(Fiber_df.drop(['active_cust'],1).columns)

X = np.array(Fiber_df[features])
y = np.array(Fiber_df['active_cust'])
In [14]:
#Let's make a model by choosing some initial  parameters.
from sklearn import tree

tree_config = tree.DecisionTreeClassifier(criterion='gini', 
                                   splitter='best', 
                                   max_depth=10, 
                                   min_samples_split=1, 
                                   min_samples_leaf=30, 
                                   max_leaf_nodes=10)
In [15]:
#Training the model and finding the accuracy of the model                 
tree_config.fit(X,y)
tree_config.score(X,y)
Out[15]:
0.84972999999999999

The first decision tree we have built is giving us an accuracy of 84.97% on the training data. We will grow the tree to achieve 95% accuracy.

In [16]:
tree_config_new = tree.DecisionTreeClassifier(criterion='gini', 
                                              splitter='best', 
                                              max_depth=None, 
                                              min_samples_split=2, 
                                              min_samples_leaf=1, 
                                              max_leaf_nodes=None)
In [17]:
#Training the model and accuracy
tree_config_new.fit(X,y)
tree_config_new.score(X,y)
Out[17]:
0.99668999999999996

This seem to be a matter of accuracy, the high the accuracy is good a model becomes. But, high accuracy comes with a price too. We might get to see it in next posts.

About admin

Check Also

204.7.6 Practice : Random Forest

Let’s implement the concept of Random Forest into practice using Python. Practice : Random Forest …

Leave a Reply

Your email address will not be published. Required fields are marked *