AI_ML_DL’s diary


Chapter 14 Deep Computer Vision Using Convolutional Neural Network

Chapter 14  Deep Computer Vision Using Convolutional Neural Network

Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow 2nd Edition by A. Geron


In this Chapter we will explore where CNNs came from, what their building blocks look like, and how to implement them using TensorFlow and Keras.

Then we will discuss some of the best CNN architectures, as well as other visual tasks, including object detection (classifying multiple objects in an image and placing bounding boxes around them) and semantic segmentation (classifying each pixel according to the class of the object it belongs to).


The Atchitecture of the Visual Cortex

David H. Hubel and Torsten Wiesel performed a series of experiments on cats in 1958 and 1959 (and a few years later on monkeys), giving crucial insights into the structure of the visual cortex (the authors received the Nobel Prize in Physiology or Medicine in 1981 for this work).

In particular, they showed that many neurons in the visual cortex have a small local receptive field, meaning they react only to visual stimuli located in a limited region of the visual field (see Figure 14-1, in which the local receptive fields of five neurons are represented by dashed circles).

The receptive fields of different neurons may overlap, and together they tile the whole visual field.


Moreover, the authors showed that some neurons react only to images of holizontal lines, while others react only to lines with different orientations (two neurons may have the same receptive field but react to different line orientations).

They also noticed that some neurons have larger receptive fields, and they react to more complex patterns that are combinations of the lower-level patterns.

These observations led to the idea that the higher-level neurons are based on the outputs of neighboring lower-level neurons (in Figure 14-1, notice that each neuron is connected only to a few neurons from the previous layer).

This powerful architechture is able to detect all sorts of complex patterns in any area of the visual field.


Figure 14-1.  Biological neurons in the visual cortex respond to specific patterns in small regions of the visual field called receptive fields; as the visual signal makes its way through consecutive brain modules, neurons respond to more complex patterns in larger receptive fields.


These studies of the visual cortex inspired the neocognitron, introduced in 1980, which gradually evolved into what we call convolutional neural networks.

(Kunihiko Fukushima, "Neocognitron: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position," Biological Cybernetics 36 (1980): 193-202.


著者の福島 邦彦氏はご健在で、80才を過ぎた現在、精力的に研究を続けておられるようである。

2019年に出版されたIEICEのInvited Paper "Recent advances in the deep CNN neocognitron"は、これまでの研究の集大成のようで、1979年から2018年までの40年間に発表されたご自身の13件の論文が引用されている。


最後にInvited PaperのConclusionを転記しておく。

This paper has discussed recent advances of the neocognitron and several networks extended from it.

The neocognitron is a network suggested from the biological brain.

The author feel that the deep learning is not the only way to realize networks like, or superior to, biological brain.

To make further advances in the research, it is important to learn from the biological brain.

There should be several algorithms that control the biological brain.

It is now important to find out these algorithms and apply them to the design of more advanced neural networks. 


Convolutional Layers






Stacking Multiple Feature Maps



TensorFlow Implementation



Memory Requirements



Pooling Layers



TensorFlow Implementation



CNN Architectures









Data Augmentation


















Implementing a ResNet-34 CNN Using Keras



Using Pretrained Models from Keras



Pretrained Models for Transfer Learning



Classification and Localization



Object Detection



Fully Convolutional Networks



You Only Look Once (YOLO)



Mean Average Precision (mAP)



Semantic Segmentation



TensorFlow Convolution Operations









style=135 iteration=1



style=135 iteration=20



style=135 iteration=500