Gini Impurity: Two Equivalent Formulas





Gini Impurity Formulas


Gini Impurity: Two Equivalent Formulas

Gini impurity is a measure used in decision tree algorithms (like CART) to quantify how “impure” a node is — in other words, how mixed the classes are.

✅ Formula 1: Basic Form

G = 1 - Σ(pᵢ²)

Where:

  • pᵢ is the probability (or proportion) of class i in the node.
  • n is the number of classes.

This formula calculates the probability that two items randomly chosen (with replacement) from the set belong to different classes.

✅ Formula 2: Pairwise Form

G = Σ(pᵢ × pⱼ), for all i ≠ j

This version directly computes the total probability that a randomly chosen pair of items will be from different classes.

✅ Why Are They Equivalent?

Because:

Σ(pᵢ × pⱼ) for all i, j = (Σ pᵢ)² = 1

So:

Σ(pᵢ × pⱼ) for i ≠ j = 1 - Σ(pᵢ²)

Hence:

G = 1 - Σ(pᵢ²) = Σ(pᵢ × pⱼ) for i ≠ j

✅ Summary of the Differences

Aspect Formula 1 (1 - Σ(pᵢ²)) Formula 2 (Σ(pᵢ × pⱼ), i ≠ j)
Simpler to compute ✅ Yes ❌ More complex (double sum)
Intuitive meaning Easy: “1 – sum of squared probabilities” Direct: “sum of all cross-class probs”
Used in practice Very commonly used Rarely used explicitly
Mathematically cleaner Yes Equivalent but more verbose


Leave a Comment

Your email address will not be published. Required fields are marked *