Meeting 8 (04.05)

Goal: current implementation state (auto-encoder, classification) and preliminary classification results. Discuss music conferences and schedule.

Related notebooks: audio_classification, auto_encoder, audio_features.


  • Complete the description of the pre-processing steps in the paper.
  • Store data in sklearn format, i.e. \(R^{Nxn}\) which is the inverse of our math notation. Redo the pre-processing on the entire dataset.
  • Implementation of post-processing, i.e. classification: feature aggregation, feature vectors visualization, features rescaling, labels generation, linear SVM, majority voting, random train and test sets splitting, cross-validation.
  • Classification results: see the audio_classification notebook for results on accuracy and speed for 2, 4 or 10 genres using spectrograms or raw audio, various classifier implementations and diverse scaling methods.
  • See also the exact methodology and various observations and ideas.
  • Implementation of the auto-encoder as a sklearn Estimator class. Ease the integration with scikit-learn, enable to use it in a sklearn Pipeline to avoid transductive learning. See the auto_encoders notebook for the implementation and comparison_xavier for an usage example.
  • Performance boost of a factor 10 by using ATLAS or OpenBLAS instead of numpy’s own BLAS implementation. Half of the execution time is now used for matrix multiplications by The limiting factor for speed is now the memory bandwidth. ATLAS performs multi-threaded matrix multiplication.
  • Unsupervised feature extraction on a reduced dataset (2 genres, 10 clips) using the new implementation of the auto-encoder. See the audio_features notebook for details.
  • Memory exhaustion when working on the full dataset. Three ways to circumvent it:
    1. Work on a reduced dataset. For now only 2 genres of 10 clips pass. It is not sufficient for significant accuracy measurements.
    2. Store \(X\) and \(Z\) on disk via HDF5. As we are already bandwidth limited, introducing SATA and the SSD in the way may further decrease performance.
    3. As each column is independent when minimizing for \(Z\) and each line is independent when minimizing for \(D\), we can independently work on a subset of the problem in RAM and keep the whole data \(X\) and \(Z\) on disk.


  • Was not ready in time for ISMIR 2015 (h5-index of 31). We will publish for ICASSP 2016 (h5-index of 47). The deadline is September 25th.
  • Mitigation of memory usage: reduce the number of frames per clip.
  • Schedule: report deadline June 19th, oral defense mid-July, conference paper September 25th.
  • Can Xavier be the expert ? We could then do the defense at the end of June.


  • Classify data after unsupervised feature extraction.
  • Observe what the dictionary learned.
  • Look for an increase in accuracy.


Michaël Defferrard

I am currently pursuing master studies in Information Technologies at EPFL. My master project, conducted at the LTS2 Signal Processing laboratory led by Prof. Pierre Vandergheynst, is about audio classification with structured deep learning. I previously devised an image inpainting algorithm. It used a non-local patch graph representation of the image and a structure detector which leverages the graph representation and influences the fill-order of the exemplar-based algorithm. I've been a Research Assistant in the lab, where I did investigate Super Resolution methods for Mass Spectrometry. I develop PyUNLocBoX, a convex optimization toolbox in Python.

Leave a Reply