Computer vision is one of the most rapidly advancing fields in the quest for Artificial Intelligence and Machine Learning. It has widespread use. From mass scale factory applications like automated sorting and scanning of eggs; to bleeding edge applications like self driving cars; to healthcare with robotic doctors diagnosing COVID and Cancer with astounding precision.
However, this article is not about the advances in applications of Computer Vision. It is about the humble origins of Computer Vision and the inspiration behind it – provided by neurology.
P.S. this article is heavily influenced by an amazing book: Hands on Machine Learning by Aurélien Géron
In a previous article (Path to Superintelligence: Whole-Brain Emulation), I gave a general introduction to neurology based machine learning structures called “Deep Neural Networks” which are a subset of “Artificial Neural Networks”. In this article, let us explore “Convolutional Neural Networks” and how they mimic the activations between our eyes and our brains, or should I say – the architecture of the visual cortex.
[read more]
David H. Hubel and Torsten Wiesel performed a series of experiments on cats (and monkeys later on) in 1958. This produced insights into the structure of the visual cortex. In particular, they showed that many neurons in the visual cortex have a small “local receptive field”. This means that they react only to visual stimuli located in a limited region of the visual field!
For example, see this illustration:
Here we can see the local receptive fields of five neurons. The receptive fields are marked by the dashed circles on the image. The receptive fields may overlap. However collectively they tile the whole visual field.
This was an important inspiration. This finding of small receptive fields enables dividing-and-conquering. As a rule, computer scientists love dividing and conquering. It allows us to approach a big problem as a summation of multiple small, easily solvable problems.
There were more findings from this study:
- Some neurons only react to images of horizontal lines while others react exclusively to other orientations. In fact, two neurons could have the same visual receptive field, but react to different line orientations. Simply put – they process different visual signals in the same receptive field!
- Some neurons with larger receptive fields reacted only to more complex patterns that are combinations of the lower level patterns. This is even more amazing. This means that the smaller solvable problems can simply be summed together to “see” the whole image!
This basically means that higher level neurons are based on the output of neighboring lower level neurons. The “neighboring” is important as the high level neurons will not be connected to all lower level neurons – hence creating a composition of layers of images that finally come together as a whole image!
These studies basically inspired the “neocognitron” in 1980 and gradually evolved into today’s “Convolutional Neural Networks”.
And this, my friends concludes, is the inspiration behind creating deep learning networks that mimic the visual cortex and the neuron activations that go behind “seeing”. And clearly, it has worked, for Computer Vision is ubiquitous not. The next time you unlock your iPhone with Face ID, think about how the iPhones camera “saw” you and “recognized” you. Be amazed!
In the next part of this blog, we will dive a bit deeper on how this is implemented, with some more insights into how the nuts and bolts of the visual cortex translate to software based nuts and bolts![/read]
Leave a Reply