Supervision: Konstantinos Pitas
Project type:
Semester project (master)
Master thesis
Finished
Project Description
Deriving uncertainty estimates for feedforward DNN predictions is critical in a number of tasks. For example you might want to voice order a "bicycle spare part" through Alexa. You'd want the deep neural network transcribing your speech to have low confidence if it mishears heart,dart or fart, and ask you again about what you'd really like to order!
Unfortunately the outputs of the softmax layer cannot be interpreted in a principled way as a probability distribution. Instead modelling DNNs themselves as coming from a probability distribution and taking a Bayesian view of predictions, is much better grounded in theory, and results in better uncertainty estimates in practice.
Assuming that weights have a probability distribution has shortcomings. While for small dimensions (networks with few weights), probability distributions follow common intuition, in high dimensions things start to get weird... For example Gaussian distributions in high dimensions have mass around a hypersphere far away from the mean, kind of like a soap bubble.
This unbalances uncertainty estimates in the kind of deep neural networks with millions of parameters that are most useful in real applications [1].
Project Goals
In this project the student will implement techniques that are aimed to scale existing Bayesian approximate inference techniques to realistic architectures and datasets.
Prerequisites
The student must be highly motivated and independent, with good knowledge of Tensorflow/Keras or Pytorch and able to implement and modify large DNN architectures such as VGG-16 and Resnet-56.
The project consists of 20% theory and 80% practice, and blends a variety of problems at the cutting edge of DNN research.
This is a master or semester project.
Contact Contact me by email at konstantinos.pitas@epfl.ch or pass by ELE 227 for a quick discussion.
[1] Radial Bayesian Neural Networks: Beyond Discrete Support In Large-Scale Bayesian Deep Learning https://arxiv.org/pdf/1907.00865.pdf