Chapter 11 Training Deep Neural Networks

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron

In Chapter 10 we introduced artificial neural networks and trained our first deep neural networks.

But they were shallow nets, with just a few hidden layers.

What if you need to tackle a complex problem, such as detecting hundreds of types of objects in high-resolution images?

You may need to train a much deeper DNN, perhaps with 10 layers or many more, each containing hundreds of neurons, linked by hundreds of thousands of connections.

Training a deep DNN isn't a walk in the park.

Here are some of the problems you could run into:

・You may be faced with the tricky vanishing gradients problem or the related exploding gradients problem. This is when the gradients grow smaller and smaller, when flowing backward through the DNN during training. Both of these problems make lower layers very hard to train. # lower laiers = layers close to input

・You might not have enough training data for such a large network, or it might be too costly to lable.

・Training may be wxtremely slow.

・A model with millions of parameters would severely risk overfitting the training set, especially if there are not enough training instances or if they are too noisy.

The Vanishing/Exploding Gradients Problems

Glorot and He Initialization

Nonsaturating Activation Functions

Batch Normalization

Implementing Batch Normalization with Keras

Gradient Clipping

Reusing Pretrained Layers

Transfer Learning with Keras

Unsupervised Pretraining

Pretraining on an Auxiliary Task

Faster Optimizers

Momentum Optimization