AI_ML_DL’s diary

人工知能、機械学習、ディープラーニングの日記

Chapter 14 Deep Computer Vision Using Convolutional Neural Network

Chapter 14  Deep Computer Vision Using Convolutional Neural Network

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron

 

In this Chapter we will explore where CNNs came from, what their building blocks look like, and how to implement them using TensorFlow and Keras.

Then we will discuss some of the best CNN architectures, as well as other visual tasks, including object detection (classifying multiple objects in an image and placing bounding boxes around them) and semantic segmentation (classifying each pixel according to the class of the object it belongs to).

 

The Atchitecture of the Visual Cortex

David H. Hubel and Torsten Wiesel performed a series of experiments on cats in 1958 and 1959 (and a few years later on monkeys), giving crucial insights into the structure of the visual cortex (the authors received the Nobel Prize in Physiology or Medicine in 1981 for this work).

In particular, they showed that many neurons in the visual cortex have a small local receptive field, meaning they react only to visual stimuli located in a limited region of the visual field (see Figure 14-1, in which the local receptive fields of five neurons are represented by dashed circles).

The receptive fields of different neurons may overlap, and together they tile the whole visual field.

 

Moreover, the authors showed that some neurons react only to images of holizontal lines, while others react only to lines with different orientations (two neurons may have the same receptive field but react to different line orientations).

They also noticed that some neurons have larger receptive fields, and they react to more complex patterns that are combinations of the lower-level patterns.

These observations led to the idea that the higher-level neurons are based on the outputs of neighboring lower-level neurons (in Figure 14-1, notice that each neuron is connected only to a few neurons from the previous layer).

This powerful architechture is able to detect all sorts of complex patterns in any area of the visual field.

 

Figure 14-1.  Biological neurons in the visual cortex respond to specific patterns in small regions of the visual field called receptive fields; as the visual signal makes its way through consecutive brain modules, neurons respond to more complex patterns in larger receptive fields.

 

These studies of the visual cortex inspired the neocognitron, introduced in 1980, which gradually evolved into what we call convolutional neural networks.

(Kunihiko Fukushima, "Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position," Biological Cybernetics 36 (1980): 193-202.

<追記>

著者の福島 邦彦氏はご健在で、80才を過ぎた現在、精力的に研究を続けておられるようである。

2019年に出版されたIEICEのInvited Paper "Recent advances in the deep CNN neocognitron"は、これまでの研究の集大成のようで、1979年から2018年までの40年間に発表されたご自身の13件の論文が引用されている。

ネオコグニトロンの発想の元になっているのは、上記のヒューベルトとウィ―セルによる視覚の研究成果であり、一貫して人間の脳のメカニズムを追求しているようである。

最後にInvited PaperのConclusionを転記しておく。

This paper has discussed recent advances of the neocognitron and several networks extended from it.

The neocognitron is a network suggested from the biological brain.

The author feel that the deep learning is not the only way to realize networks like, or superior to, biological brain.

To make further advances in the research, it is important to learn from the biological brain.

There should be several algorithms that control the biological brain.

It is now important to find out these algorithms and apply them to the design of more advanced neural networks. 

  

Convolutional Layers

 

 

Filters

 

 

Stacking Multiple Feature Maps

 

 

TensorFlow Implementation

 

 

Memory Requirements

 

 

Pooling Layers

 

 

TensorFlow Implementation

 

 

CNN Architectures

 

 

LeNet-5

 

 

AlexNet

 

 

Data Augmentation

 

 

GoogLeNet

 

 

VGGNet

 

 

ResNet

 

 

Xception

 

 

SENet

 

 

Implementing a ResNet-34 CNN Using Keras

 

 

Using Pretrained Models from Keras

 

 

Pretrained Models for Transfer Learning

 

 

Classification and Localization

 

 

Object Detection

 

 

Fully Convolutional Networks

 

 

You Only Look Once (YOLO)

 

 

Mean Average Precision (mAP)

 

 

Semantic Segmentation

 

 

TensorFlow Convolution Operations

 

 

Exercises

  

 

 

 

f:id:AI_ML_DL:20200520091349p:plain

style=135 iteration=1

 

f:id:AI_ML_DL:20200520091252p:plain

style=135 iteration=20

 

f:id:AI_ML_DL:20200520091154p:plain

style=135 iteration=500