DEPTH ESTIMATION FROM A SINGLE IMAGE AND RELATIVE SYSTEM
Introduction
A method for estimating depth, optical flow, and other semantic information on low-power devices. Specifically, low-resolution images acquired from a single camera are processed by a lightweight and highly accurate self-supervised Convolutional Neural Network.

Technical features
Estimating the depth and the optical flow from a scene is crucial in several computer vision applications. A recent trend aims to infer such cues from a single camera to simplify the setup and allow their use in application contexts characterised by severe cost and size constraints. To this end, this invention consists of a tiny Convolutional Neural Network capable of processing low-resolution images to obtain coarse semantic information of the observed scene. The network can run on off-the-shelf microcontroller units with minimal power requirements (a few hundreds of mW). Nevertheless, it is accurate enough to serve as the backbone of many high-level IoT applications such as people tracking, simple traffic monitoring, and privacy-preserving monitoring systems. Moreover, the network is trained in a self-supervised manner; thus, it does not require costly ground-truth annotations during the training phase.
Possible Applications
- Proximity control, tracking and traffic monitoring systems;
- Privacy-preserving monitoring systems;
- Augmented and Virtual Reality.
Advantages
- Extraction of dense semantic information from a single image;
- Trained with self-supervised learning;
- Compatible with mobile battery-powered devices;
- Low cost.