Visual Information Understanding Framework with Human Perception Model and Machine Learning

Project: Research

View graph of relations


The human visual system has no difficulty in achieving various perception tasks such as moving object detection, action/event recognition, etc. These tasks have been active computer vision research topics in the past decades that have led to numerous deterministic systems utilizing hand-crafted features been proposed. Recently automated visual perception systems have advanced rapidly through the use of deep learning resulting in many architectures, often each comprising of a model specially designed for a specific application. For instance, a convolutional neural network (CNN) can be used to group image pixels into meaningful regions, or a two-stream network has been developed for human action recognition through parallel processing of spatial and temporal image data. CNN is inspired by the connectivity of neurons of the human brain, and each neuron only responds to a small receptive field. The essence of various CNN architectures is feature extraction. Some studies have demonstrated that the features extracted by CNN are better than the hand-crafted features for recognition. While the goal of computer vision is to infer the meaning from image or video by machine, no part of CNN can be considered as resemblance of the human visual perception.In the human eye, the retina contains photoreceptor cells, each of which senses light and translates optical information into neural impulses. The photoreceptor cells are not distributed uniformly. The fovea, located in the center of retina, contains closely packed photoreceptor cells. Most of them are parvocellular cells (P-cells) which are sensitive to different wavelengths of light and create chromatic perception with the highest resolution. At the peripheral of the visual field, the photoreceptor cells (mostly magnocellular cells, M-cells), are more sensitive to achromatic motion information. We propose a foveated model that mimics the human visual system. It is a sequential process simulating the awareness of motion followed by the extraction of detailed information. Experimentation will first be performed applying the model on traditional computer vision methods. The concept of foveated vision will be integrated with deep learning. The parallel inferencing framework can best simulate the simultaneous analysis of multi-modal visual signals in the brain. Through these experimentations, we will demonstrate the improvement and applicability due to embedding of the foveated vision model to deep learning on computer vision research. Comparison will be made with state-of-the-art methods on benchmark datasets. To investigate the correlation of the foveated model with subjective evaluation, specific quantitative measures will be proposed and verified.


Project number9042823
Grant typeGRF
Effective start/end date1/01/206/06/23