Chapter 7 Ensemble Learning and Random Forests

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron

Suppose you pose a complex question to thousands of random people, then aggregate their answers. In many cases you will find that thisaggregated answer is better than an expert's answer. This is called the wisdom of crowd.

Similaely, if you aggregate the predictions of a group of predictors (such as classifiers or regressors), you will often get better predictions than with the best indivisual predictor.

A group of predictors is called an ensemble; thus this technique is called Ensemble Learning, and an Ensemble Learning algorithm is called an Ensemble method.

As an example of an Ensemble method, you can train a grope of Decision Tree classifiers, each on a different random subset of the training set.

To make predictions, you obtain the predictions of all the individual trees, then predict the class that gets the most votes (see the last exercise in Chapter 6).

Such an ensemble of Decision Trees is called a Random Forest, and despite its simplicity, this is one of the most powerful Machine Learning algorithms available today.

As discussed in Chapter 2, you will often use Ensemble methods near the end of a project, once you have already built a few good predictors, to combine them into an even better predictor. In fact, the winning solutions in Machine Learning competitions often involve several Ensemble methods.

In this chapter we will discuss the most popular Ensemble methods, including bagging, boosting, and atacking. We will also explore Random Forests.

Voting Classifiers

Suppose you have trained a few classifiers, each one achieving about 80% accuracy.

You may have a Logistic Regression classifier, an SVM classifier, a Random Forest Classifier, a K-Nearest Neighbors classifier, and perhaps a few more.

A very simple way to create an even better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes.

This majority-vote classifier is called a hard voting classifier.

Somewhat surprisingly, this voting classifier often achieves a higher accuracy than the best classifier in the ensemble.

In fact, even if each classifier is a weak learner (meaning it does only slightly better than random guessing), the ensemble can still be a strong learner (achieving high accuracy), provided there are a sufficient number of weak learners and they are sufficiently diverse.

＊Ensembleが良い結果をもたらす理由の１つは、generalization、にありそうだな。

Ensemble methods work best when the predictors are as independent from one another as possible.

One way to get diverse classifiers is to train them using very different algorithm.

This increases the chance that they will make very different types of errors, improving the ensemble's accuracy.

The following code creates and trains a voting classifier in Scikit-Learn, composed of three diverse classifiers (the training set is the moons dataset, introduced in Chapter 5):

from sklearn.ensemble import RandomForestClassifier

from sklearn.ensemble import VotingClassifier

from sklearn.linear_model import LogisticRegression

from sklearn.svm import SVC

log_clf = LogisticRegression( )

rnd_clf = RandomForestClassifier( )

svm_clf = SVC( )

voting _clf = VotingClassifier(

estimattors=[('lr', log_clf), ('rf', rnd_clf), ('svc', smc_clf)],

voting='hard')

voting_clf.fit(X_train, y_train)

Let's look at each classifier's accuracy on the test set:

from sklearn.metrics import accuracy_score

for clf in (log_clf, rnd_clf, svm_clf, voting_clf):

clf.fit(X_train, y_train)

y_pred = clf.predict(X_test)

print(clf.__class__.__name__, accuracy_score(y_test, y_pred))

...

LogisticRegression 0.864

RandomForestClassifier 0.896

SVC 0.888

VotingClassifier 0.904

There you have it!

The voting classifier slightly outperforms all the individual classifiers.

If all classifiers are able to estimate class probabilities (i.e., they all have a predict_proba( ) method), then you can tell Scikit-Learn to predict the class with the highest class probability, averaged over all the individual classifiers.

This is called soft voting.

It often achieves higher performance than hard voting because it gives more weight to highly confident votes.

All you need to do is replace voting="hard" with voting="soft" and ensure that all classifiers can estimate class probabilities.

This is not the case for the SVC class by default, so you need to set its probability hyperparameter to True (this will make the SVC class use cross-validation to estimate class probabilities, slowing down training, and it will add a predict_proba( ) method).

If you modify the preceding code to use soft voting, you will find that the voting classification achieves over 91.2% accuracy!

Bagging and Pasting