AI_ML_DL’s diary

人工知能、機械学習、ディープラーニングの日記

Chapter 11 Training Deep Neural Networks

Chapter 11 Training Deep Neural Networks

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron 

 

In Chapter 10 we introduced artificial neural networks and trained our first deep neural networks.

But they were shallow nets, with just a few hidden layers.

What if you need to tackle a complex problem, such as detecting hundreds of types of objects in high-resolution images?

You may need to train a much deeper DNN, perhaps with 10 layers or many more, each containing hundreds of neurons, linked by hundreds of thousands of connections.

Training a deep DNN isn't a walk in the park.

Here are some of the problems you could run into:

・You may be faced with the tricky vanishing gradients problem or the related exploding gradients problem.  This is when the gradients grow smaller and smaller, when flowing backward through the DNN during training.  Both of these problems make lower layers very hard to train. # lower laiers = layers close to input

・You might not have enough training data for such a large network, or it might be too costly to lable.

・Training may be wxtremely slow.

・A model with millions of parameters would severely risk overfitting the training set, especially if there are not enough training instances or if they are too noisy.

 

 

The Vanishing/Exploding Gradients Problems

 

 

Glorot and He Initialization

 

 

Nonsaturating Activation Functions

 

 

Batch Normalization

 

 

Implementing Batch Normalization with Keras

 

 

Gradient Clipping

 

 

Reusing Pretrained Layers

 

 

Transfer Learning with Keras

 

 

Unsupervised Pretraining

 

 

Pretraining on an Auxiliary Task

 

 

Faster Optimizers

 

 

Momentum Optimization

 

 

Nesterov Accelerated Gradient

 

 

AdaGrad

 

 

RMSProp

 

 

Adam and Nadam Optimization

 

 

Training Sparse Models

 

 

Learning Rate Scheduling

 

 

Avoiding Overfitting Through Regularization

 

 

l1 and l2 Regularization

 

 

Dropout

 

 

Monte Carlo (MC) Dropout

 

 

Max-Norm Regularization

 

 

Summary and Practical Guidelines

 

 

Exercises

 

 

 

 

 

f:id:AI_ML_DL:20200520084744p:plain

style=132 iteration=1

 

f:id:AI_ML_DL:20200520084635p:plain

style=132 iteration=20

 

f:id:AI_ML_DL:20200520084510p:plain

style=132 iteration=500