Spatial AI for Vehicle Perception in Dynamic Environments



1 / Description of the research topic and the related thesis topic:

Next generation Autonomous Driving  systems will face the technical challenge of using a wide variety and number of sensors (Camera, Radar, Lidar, Ultrasonic) to discern the surrounding environment. One of the most critical aspects addressed in both autonomous driving and robotics is environment perception, since it consists of the ability to understand the surroundings of the vehicle to estimate risks and make decisions on future movements. In recent years, Spatial Artificial Intelligence based on machine learning is at the heart of recent advances in spatio-temporal perception allowing an end-to-end convolutional approach to simultaneously perform perception tasks such as obstacle detection, tracking and prediction.

The core subject of this PhD is to study and develop innovative concepts, algorithms and methods to enhance situational awareness of autonomous systems in dynamic environments using spatial AI. The objective will be to design an end-to-end approach such that:

  1. The input to a training network will be the set of images from multiple 3D sensors observing the scene represented by a 3D occupancy grid. Voxel-based representations provide an excellent framework to perform low-level sensor fusion. In its most generic form, voxel information could also include any relevant information for the application such as occupancy, velocity, danger, reachability, etc. Different sensor models can be specified to adapt to distinct characteristics of the various sensors, facilitating efficient fusion in the grid(s).
  2. The output of the network will be the detection of moving and static obstacles /objects, their bounding boxes, their trajectory and a short-term prediction of their movement (tracklet).

To this end we aim to use RGB-D sensor consistency to train a convolutional neural network such that all sensor measurements (occupancy grid) transform correctly to every other image with minimal error in a Bayesian sense (i.e. with uncertainty).

2 / Description of research activities :

The main research axes of this thesis can be broken into the following main parts :

  • Low-level sensor fusion into 3D occupancy grids. Several 3D sensors will be considered such as cameras, radar and Lidar.
  • Training an end-to-end CNN using 3D occupancy grids as input. The output will be the dynamic object positions and short-term trajectories. Large training datasets will be acquired to train these models.
  • Extending the training to predict long-term situation awareness by considering the future interactions of the dynamic objects in the scene.  For example, road users interacting with each other and the vehicles on the road.

This PhD will be supervised jointly between Adrian Voicila from RENAULT Software Labs and Andrew Comport from the Robot Vision team from the I3S-CNRS/UCA laboratory (

Experimental data collected by prototype cars and/or synthetic ones will be used in the validation of the algorithms and methods developed during the thesis within real open-road scenarios.



The candidate should be motivated to carry out world class research and should have a Master in Computer Vision and/or Robotics, along with solid skills in mathematics, estimation theory, C/C++, Python, LINUX, Git and OpenCV. Experience with Pytorch, TensorFlow or ROS is of added value.



Interested candidates must send a detailed CV, their Master's results and one or more letters of recommendation to This email address is being protected from spambots. You need JavaScript enabled to view it. and This email address is being protected from spambots. You need JavaScript enabled to view it.   



We aim to fill this position as soon as possible with the aim to commence in the beginning of October. Applications will be considered until a suitable candidate has been found and no later than the second week of August.


Funding and location:

 The PhD will be funded by a CIFRE contrat and hosted in Sophia Antipolis, France.