PhD Position 2019

large scale model cut

URGENT : Real-time Visual Surface Reconstruction for Spatial AI Systems : with applications to robotics and augmented reality

Research team:

This PhD will be carried out jointly with the Robot Vision and Media Coding activities of the Signal, Image and Systems team from the I3S-CNRS/UCA laboratory. Both have an excellent research track record along with a close connection to innovation notably via the industrial transfer of their work for the creation of two successful startup companies, PIXMAP and CINTOO.

The Robot Vision (www.i3s.unice.fr/robotvision) activity is focused on the study of real-time visual localization and mapping applied mainly to robotics but also to augmented reality. The MediaCoding (www.i3s.unice.fr/me d iacoding) activity is focused on 3D modeling and compression with applications to 3D scanning and meshing for reconstruction and efficient streaming.

This project aims to build upon the complementary aspects of these related teams from the point of view of the 3D reconstruction of large scale environments while accounting for, incremental learning, efficiency, robustness and precision.

Context :

Deep learning is at the heart of recent advances in visual perception allowing large amounts of training data to be transformed into compact forms that can be reused online to provide prior knowledge. Real-time visual sensing algorithms form the basis for spatially intelligent systems such as augmented reality interfaces or autonomous robotics. This project is situated at the intersection between visual localization, mapping and life-long learning.

Color and depth hardware sensors have recently proven essential for spatially aware systems. This project therefore aims at directly exploiting RGB-D cameras in a sensor based approach for end-to-end training of deep CNN approaches combined with unsupervised 3D reconstruction systems.

The research objective is to propose and study new paradigms and concepts for real-time spatial intelligence by i) developing techniques to acquire a compact large-scale 3D environment representations and ii) by taking into account prior knowledge of the environment and using it to infer higher level semantics and occluded scene elements.

Tackling these objectives should enable applications such as real-time robot navigation or augmented reality.

PhD Subject :

Spatial intelligence is an area that deals with spatial judgment and the ability to visualize with the mind's eye. It enables solving the problems of navigation, visualization of objects from different angles and spaces or scene recognition, or to notice fine details. The aim of this PhD topic is therefore to tackle this challenging topic through the fundamentals of visual surface reconstruction whilst taking into account the added difficulty of real-time computational efficiency.

This research will be focused on the particular element of improving 3D surface reconstruction. Classically, visual localization and mapping solutions have focused on directly exploiting dense point clouds provided by RGB-D sensors [1]. On the other hand many geometric meshing strategies have been devised to address various criterion [2].

A fundamental objective of this thesis will aim at developing an efficient 3D map representation for large scale environments that takes into account incremental sensor pose uncertainty. This will involve taking a live stream of 3D point clouds from a RGB-D sensor undergoing movement and mapping this to efficient surface elements composed of triangulated meshes. This map representation should provide for incremental reconstruction, provide for multiple resolutions and be adapted to take into account prior knowledge. A more compact representation should allow for improved real-time performance.

Whilst 3D meshing is a widely studied problem, very few works have been proposed to exploit large-scale prior information about the scene within the 3D triangulation process. In this thesis, the central objective will be to exploit knowledge about parts of the scene to assist the mapping process. A state-of-the-art deep learning technique for performing semantic localization and mapping [3] will provide the basis for providing prior information about the scene in order to improve reconstruction precision, efficiency and robustness. The completion of hidden object parts [4] along with a generative approach for completing meshes from known objects [5] will be used to form the basis of a new approach to mapping that integrates prior knowledge of the environment and objects. For example, a robot navigating in an unknown indoor environment should be able to predict that the floor continues to extend behind occluding object in the foreground, etc.

Work plan :

The main goals of this thesis can be broken into the following main parts :

Real-time 3D mapping : developing techniques to acquire a compact large-scale 3D environment representation.
Spatial Learning : taking into account prior knowledge of the environment and using it to infer higher level semantics and occluded scene elements.
Online exploitation combining 3D mapping with a learnt spatial knowledge for large-scale environment mapping.

Prerequisites :

The candidate should be motivated to carry out world class research and should have a Master in Computer Vision, Computer Graphics or Robotics, along with solid skills in C/C++, Python, LINUX, Git and OpenCV. Experience with Pytorch, TensorFlow or ROS is of added value.

Contact :

Interested candidates must send a detailed CV, their Master's results and one or more letters of recommendation to This email address is being protected from spambots. You need JavaScript enabled to view it. and This email address is being protected from spambots. You need JavaScript enabled to view it.

Bibliography :

[1] On unifying key-frame and voxel-based dense visual SLAM at large scales, Maxime Meilland and Andrew I. Comport, International Conference on Intelligent Robots and Systems, 2013, Tokyo, Japan.

[2] Arnaud Bletterer, Frédéric Payan, Marc Antonini, Anis Meftah, Towards the reconstruction of cultural heritage sites : A local graph-based representation to resample gigantic acquisitions , EUROGRAPHICS Workshop on Graphics and Cultural Heritage (EG GCH), Vienna, Austria, 2018, November, 2018.

[3] Semantic-only Visual Odometry based on dense class-level segmentation, Howard Mahe, Denis Marraud and Andrew I. Comport, International Conference on Pattern Recognition (ICPR 2018), Aug 2018, Pékin, China.

[4] Dario Rethage, Federico Tombari, Felix Achilles and Nassir Navab, Deep Learned Full-3D Object Completion from Single View, arXiv, 2018.

[5] Angela Dai and Matthias Niessner, Scan2Mesh: From Unstructured Range Scans to 3D Meshes, Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2019.

Open Positions