Perception tasks for scene understanding in such complex environments as on our streets, are currently best solved using Deep Learning techniques. In particular, multitask learning (MTL), has gained popularity, as several detection tasks (lane detection, street sign detection, semantic segmentation, …) can be trained with a shared backend.
In this project, carried out in the course on Deep Learning for Autonomous Driving, together with a classmate we implemented a multi task learning architecture for semantic segmentation and monocular depth estimation. The featured image above shows the validation set predictions of our final architecture.
In this project we were given the Miniscapes dataset which is composed of semantically segmented images to different classes that an autonomous car might need to recognize (pedestrians, cars, cyclists, etc.) and ground truth depth values of the different objects portrayed in these images. We implemented and explored various deep neural network architectures, e.g., DeeplabV3, Branched architecture and Task distillation. Our main contribution was the introduction of the VortexPooling layer  in the branched architecture  and mitigating the skewed distribution of the ground-truth depth values by applying a log-transformation. Finally, we reached the 5th place in class for the total score, 3rd place in semantic segmentation and 7th place in depth estimation. For all the details, pleas refer to the project report below.
- C.-W. Xie, H.-Y. Zhou, and J. Wu, “Vortex Pooling: Improving Context Representation in Semantic Segmentation,” CoRR, vol. abs/1804.06242. 2018, [Online]. Available: http://arxiv.org/abs/1804.06242.
- D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, and L. Van Gool, “Fast Scene Understanding for Autonomous Driving,” CoRR, vol. abs/1708.02550. 2017, [Online]. Available: http://arxiv.org/abs/1708.02550.