Chapter 6 Decision Trees
Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron
Like SVMs Decision Trees are versatile Machine Learning algorithms that can perform both classification and regression tasks, and even multioutput tasks.
They are powerful algorithms, capable of fitting complex datasets.
For example, in Chapter 2 you trained a DecisionTreeRegressor model on the California housing dataset, fitting it perfectly (actually, overfitting it).
Decision Trees are also the fundamental components of Random Forests (see Chapter 7), which are among the most powerful Machine Learning algorithms available today.
In this chapter we will start by discussing how to train, visualize, and make predictions with Decision Trees.
Then we will go through the CART training algorithm used by Scikit-Learn, and we will discuss how to regularrize trees and use them for regression tasks.
Finally, we will discuss some of the limitations of Decision Trees.
Training and Visualizing a Decision Tree
To understand Decision Trees, let's build one and take a look at how it makes predictions.
The following code trains a DecisionTreeClassifier on the iris dataset (see Chapter 4):
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
iris = load_iris( )
X = iris.data[ : , 2: ] # petal length and width
y = iris.target
tree_clf = DecisionTreeClassifier(max_depth=2)
Figure 6-1. Iris Decision Treeは表示できないので省略。
Let's see how the tree represented in Figure 6-1 makes predictions.
Suppose you find an iris flower and you want to classify it.
You start at the root node (depth 0, at the top):
this node asks whether the flower's petal length is smaller than 2.45 cm.
If it is, then you move down to the root's left child node (depth 1, left).
In this case, it is a leaf node (i.e., it does not have any child nodes), so it doed not ask any questions:
simply look at the predicted class for that node, and the Decision Tree predicts that your flower is an Iris setosa (class=setosa).
Now suppose you find another flower, and this time the petal length is greater than 2.45 cm.
You must move down to the root's right child node (depth 1, right), which is not a leaf node, so the node asks another question:
is the petal width smaller than 1.75 cm?
If it is, then your flower is most likely an Iris versicolor (depth 2, left).
If not, it is likely an Iris virginica (depth 2, right).
It's really that simple.
One of the many qualities of Decision Trees is that they require very little data preparation.
In fact, they don't require feature scaling or centering at all.