In this post we will learn how to implement the concept of intermediate outputs using R. We will cover many things in this session. Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv Filter the data and take first 74 observations from above dataset . Filter condition is Sample_Set<3 Build a logistic regression model to predict …

Read More »## 203.3.10 Pruning a Decision Tree in R

Pruning Growing the tree beyond a certain level of complexity leads to overfitting In our data, age doesn’t have any impact on the target variable. Growing the tree beyond Gender is not going to add any value. Need to cut it at Gender This process of trimming trees is called …

Read More »## 203.3.5 Information Gain in Decision Tree Split

Information Gain Information Gain= entropyBeforeSplit – entropyAfterSplit Easy way to understand Information gain= (overall entropy at parent node) – (sum of weighted entropy at each child node) Attribute with maximum information is best split attribute Information Gain- Calculation Entropy Ovearll = 100% (Impurity) Entropy Young Segment = 99% Entropy Old …

Read More »## 203.3.4 How to Calculate Entropy for Decision Tree Split?

LAB: Entropy Calculation – Example Calculate entropy at the root for the given population Calculate the entropy for the two distinct gender segments Code- Entropy Calculation Entropy at root 100% Male Segment : (-48/60)log(48/60,2)-(12/60)log(12/60,2) 0.7219281 FemaleSegment : (-2/40)log(2/40,2)-(38/40)log(38/40,2) 0.286397

Read More »## 203.3.3 How Decision tree Splits works?

The Splitting Criterion The best split is The split that does the best job of separating the data into groups Where a single class(either 0 or 1) predominates in each group Example Sales Segmentation Based on Age Example Sales Segmentation Based on Gender Impurity (Diversity) Measures We are looking for …

Read More »## 203.3.2 The Decision Tree Approach

The Decision Tree Approach The aim is to divide the whole population or the data set into segments The segmentation need to be useful for business decision making. If one class is really dominating in a segments Then it will be easy for us to classify the unknown items Then …

Read More »## 203.1.3 Beyond Pearson Correlation

The correlation coefficient used previously was the Pearson correlation coefficient, called so since it was invented by Pearson. If the correlation is between X and Y and if both X and Y are continuous, then there the Pearson coefficient works well. But there are places where it doesn’t work. How to find …

Read More »## 203.1.2 Correlation Calculation in R

Let us learn the correlation concepts with an example. Let’s do a lab on correlation calculation, we have a dataset called Air Passengers data then we have to find some correlation in that datasets. We have to find the correlation between number of passengers and promotional budget and We have …

Read More »