Chapter 11 Training Deep Neural Networks
Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron
In Chapter 10 we introduced artificial neural networks and trained our first deep neural networks.
But they were shallow nets, with just a few hidden layers.
What if you need to tackle a complex problem, such as detecting hundreds of types of objects in high-resolution images?
You may need to train a much deeper DNN, perhaps with 10 layers or many more, each containing hundreds of neurons, linked by hundreds of thousands of connections.
Training a deep DNN isn't a walk in the park.
Here are some of the problems you could run into:
・You may be faced with the tricky vanishing gradients problem or the related exploding gradients problem. This is when the gradients grow smaller and smaller, when flowing backward through the DNN during training. Both of these problems make lower layers very hard to train. # lower laiers = layers close to input
・You might not have enough training data for such a large network, or it might be too costly to lable.
・Training may be wxtremely slow.
・A model with millions of parameters would severely risk overfitting the training set, especially if there are not enough training instances or if they are too noisy.
The Vanishing/Exploding Gradients Problems
Glorot and He Initialization
Nonsaturating Activation Functions
Batch Normalization
Implementing Batch Normalization with Keras
Gradient Clipping
Reusing Pretrained Layers
Transfer Learning with Keras
Unsupervised Pretraining
Pretraining on an Auxiliary Task
Faster Optimizers
Momentum Optimization
Nesterov Accelerated Gradient
AdaGrad
RMSProp
Adam and Nadam Optimization
Training Sparse Models
Learning Rate Scheduling
Avoiding Overfitting Through Regularization
l1 and l2 Regularization
Dropout
Monte Carlo (MC) Dropout
Max-Norm Regularization
Summary and Practical Guidelines
Exercises