AI_ML_DL’s diary


Chapter 11 Training Deep Neural Networks

Chapter 11 Training Deep Neural Networks

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron 


In Chapter 10 we introduced artificial neural networks and trained our first deep neural networks.

But they were shallow nets, with just a few hidden layers.

What if you need to tackle a complex problem, such as detecting hundreds of types of objects in high-resolution images?

You may need to train a much deeper DNN, perhaps with 10 layers or many more, each containing hundreds of neurons, linked by hundreds of thousands of connections.

Training a deep DNN isn't a walk in the park.

Here are some of the problems you could run into:

・You may be faced with the tricky vanishing gradients problem or the related exploding gradients problem.  This is when the gradients grow smaller and smaller, when flowing backward through the DNN during training.  Both of these problems make lower layers very hard to train. # lower laiers = layers close to input

・You might not have enough training data for such a large network, or it might be too costly to lable.

・Training may be wxtremely slow.

・A model with millions of parameters would severely risk overfitting the training set, especially if there are not enough training instances or if they are too noisy.



The Vanishing/Exploding Gradients Problems



Glorot and He Initialization



Nonsaturating Activation Functions



Batch Normalization



Implementing Batch Normalization with Keras



Gradient Clipping



Reusing Pretrained Layers



Transfer Learning with Keras



Unsupervised Pretraining



Pretraining on an Auxiliary Task



Faster Optimizers



Momentum Optimization



Nesterov Accelerated Gradient









Adam and Nadam Optimization



Training Sparse Models



Learning Rate Scheduling



Avoiding Overfitting Through Regularization



l1 and l2 Regularization






Monte Carlo (MC) Dropout



Max-Norm Regularization



Summary and Practical Guidelines










style=132 iteration=1



style=132 iteration=20



style=132 iteration=500