Supervision: Benjamin Ricaud

Project type: Master thesis

Finished

Classification of images and detection of objects in images have reached an impressive accuracy in the recent years, thanks to deep learning. Researchers are now applying these methods to audio recordings for identifying particular sounds inside them.

The goal of the project is to reproduce the state-of-the-art deep learning techniques in audio and to go beyond them. The student will work with Google audioset to train and test the network performances. Among the deep neural network architectures, we will use deep auto-encoders and CNN combined with RNN and an attention network.

The tasks of the student will be the following:

  • understand the architecture of neural networks,
  • build the networks using Keras, Tensorflow or Pytorch,
  • preprocess the data, train an test the networks,
  • try modifications to the networks suggested by the supervisors and assess the quality of the detection/classification.

The student should have a good command of programming, in particular in Python. Experience with deep learning and/or with audio signal processing would be a plus. The student should be highly motivated, curious and having a strong interest in artificial intelligence and audio.

References

https://arxiv.org/abs/1804.04715

https://arxiv.org/abs/1711.00927

https://arxiv.org/abs/1703.08019