Decision Tree follows the Algorithm ID3 (Iterative Dichotomiser 3). This algorithm iteratively splits data into segments which have a decrease in Entropy and increase in Information Gain with each split.

The final goal is to achieve homogeneity in final nodes.

Two matrices of Decision Tree Algorithms are:

**Entropy**: is the uncertainty in the data point which we want to decrease with each split.**Information Gain**: is the decrease in Entropy after each split, which we want to increase with each split.

We shall cover Entropy in this post and see how it can be calculated.

### Impurity (Diversity) Measures

- We are looking for a impurity or diversity measure that will give high score for this Age variable(high impurity while segmenting), Low score for Gender variable(Low impurity while segmenting)

**Entropy**: Characterizes the impurity/diversity of segment- Measure of uncertainty/Impurity
- Entropy measures the information amount in a message
- S is a segment of training examples, p+ is the proportion of positive examples, p- is the proportion of negative examples
- Entropy(S) = −p+log2p+−p−log2p−
- Where p+ is the probability of positive class and p− is the probability of negative class.

- Entropy is highest when the split has p of 0.5.
- Entropy is least when the split is pure .ie p of 1

### Entropy is highest when the split has p of 0.5

- Entropy(S) = −p+log2p+−p−log2p−
- Entropy is highest when the split has p of 0.5
- 50-50 class ratio in a segment is really impure, hence entropy is high
- Entropy(S) = −p+log2p+−p−log2p−
- Entropy(S) = −0.5∗log2(0.5)−0.5∗log2(0.5)
- Entropy(S) = 1

### Entropy is least when the split is pure .ie p of 1

- Entropy(S) = −p+log2p+−p−log2p−
- Entropy is least when the split is pure ie p of 1
- 100-0 class ratio in a segment is really pure, hence entropy is low
- Entropy(S) = −p+log2p+−p−log2p−
- Entropy(S) = −1∗log2(1)−0∗log2(0)
- Entropy(S) = 0

### The less the entropy, the better the split

- The less the entropy, the better the split
- Entropy is formulated in such a way that, its value will be high for impure segments

In next post we will see how to calculate the entropy for each split.