Home / Predictive Modeling & Machine Learning

Predictive Modeling & Machine Learning

203.5.5 Practice : Implementing Intermediate outputs in R

In this post we will learn how to implement the concept of intermediate outputs using R. We will cover many things in this session. Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv Filter the data and take first 74 observations from above dataset . Filter condition is Sample_Set<3 Build a logistic regression model to predict …

203.3.10 Pruning a Decision Tree in R

Pruning Growing the tree beyond a certain level of complexity leads to overfitting In our data, age doesn’t have any impact on the target variable. Growing the tree beyond Gender is not going to add any value. Need to cut it at Gender This process of trimming trees is called …

203.3.5 Information Gain in Decision Tree Split

Information Gain Information Gain= entropyBeforeSplit – entropyAfterSplit Easy way to understand Information gain= (overall entropy at parent node) – (sum of weighted entropy at each child node) Attribute with maximum information is best split attribute Information Gain- Calculation Entropy Ovearll = 100% (Impurity) Entropy Young Segment = 99% Entropy Old …

203.3.4 How to Calculate Entropy for Decision Tree Split?

LAB: Entropy Calculation – Example Calculate entropy at the root for the given population Calculate the entropy for the two distinct gender segments Code- Entropy Calculation Entropy at root 100% Male Segment : (-48/60)log(48/60,2)-(12/60)log(12/60,2) 0.7219281 FemaleSegment : (-2/40)log(2/40,2)-(38/40)log(38/40,2) 0.286397

203.3.3 How Decision tree Splits works?

The Splitting Criterion The best split is The split that does the best job of separating the data into groups Where a single class(either 0 or 1) predominates in each group Example Sales Segmentation Based on Age Example Sales Segmentation Based on Gender Impurity (Diversity) Measures We are looking for …

203.3.2 The Decision Tree Approach

The Decision Tree Approach The aim is to divide the whole population or the data set into segments The segmentation need to be useful for business decision making. If one class is really dominating in a segments Then it will be easy for us to classify the unknown items Then …