Ultimate Guide: Classification and Regression Tree (CART)

Published in

Towards Dev

3 min readFeb 21, 2023

The decision tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems too. The goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from prior data(training data).

Important Terminology related to Decision Trees

Root node: Represents the total population or a tiny portion of it. Root nodes can be divided into two or more homogeneous datasets.
Decision node: A sub-node that is further divided into sub-nodes.
Pruning: The process of deleting a sub-node from a decision node.
Splitting: A process to divide a node into different subnodes.
Terminal or leaf node: The nodes that don’t split.
Parent and child node: A node divided into sub-nodes. The sub-nodes from the parent node are known as the child node.
Branch or sub-tree: A subset of the entire tree.

Classification And Regression Tree (CART)

CART( Classification And Regression Tree) is a variation of the decision tree algorithm. It can handle both classification and regression tasks. Scikit-Learn uses the Classification And Regression Tree (CART) algorithm to train Decision Trees (also called “growing” trees). CART was first produced by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone in 1984. These algorithms serve as a base of machine learning algorithms like bagged decision trees, boosted decision trees, or random forests.

Data preparation for the CART: No special data preparation is required for the CART algorithm.

There are marked differences between regression trees and classification trees.

Regression trees: Predict continuous values depending on information sources or previous data. For instance, to predict the price of an item, previous data needs to be analyzed.
Classification trees: Determine whether an event occurred. It usually has outcomes as either yes or no. This type of decision tree algorithm is often used in real-world decision-making.

Gini index

Decision trees are often used while implementing machine learning algorithms. The hierarchical structure of a decision tree leads us to the final outcome by traversing through the nodes of the tree. Each node consists of an attribute or feature which is further split into more nodes as we move down the tree. But how do we decide:

Which attribute/feature should be placed at the root node?
Which features will act as internal nodes or leaf nodes?

To decide this, and how to split the tree, we use splitting measures like Gini Index, Information Gain, etc.

Gini Index or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen.

But what is actually meant by ‘impurity’?

If all the elements belong to a single class, then they can be called pure. The degree of the Gini Index varies between 0 and 1,

‘0’ denotes that all elements belong to a certain class or there exists only one class (pure).
‘1’ denotes that the elements are randomly distributed across various classes (impure).
A Gini Index of ‘0.5 ‘denotes equally distributed elements into some classes.

Advantages of CART

Results are simplistic.
Classification and regression trees are Nonparametric and Nonlinear.
Classification and regression trees implicitly perform feature selection.
Outliers have no meaningful effect on CART.
It requires minimal supervision and produces easy-to-understand models.

Limitations of CART

Overfitting.
High Variance.
low bias. (model makes simple assumptions to fit data)
the tree structure may be unstable.

Applications of the CART algorithm

For quick Data insights.
In Blood Donors Classification.
For environmental and ecological data.
In the financial sector.

Conclusion

This article has covered basic theories related to CART. Its classification and regression implementation will be published in further articles.

Towards Dev

Ultimate Guide: Classification and Regression Tree (CART)

Important Terminology related to Decision Trees

Classification And Regression Tree (CART)

Gini index

Advantages of CART

Limitations of CART

Applications of the CART algorithm

Conclusion

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Towards Dev

Written by Mayuresh Nightshade

No responses yet