General

Starting with Python

Following my post on Python versus Matlab, I have started to use Python to solve various machine learning problems.

To start with Python, I recommend several resources:

More resources to come…

Enjoy your Python learning

 

Audio inpainting with similarity graphs

I’m very proud to announce the release of a new kind of audio inpainting algorithm. It is able to reconstruct long missing parts of a song by searching through the rest of the content for a suitable replacement.

You can try the algorithm online or download the code here and run it on your machine. A technical report associated with this algorithm is available on arXiv.

Enjoy!

Abstract
In this contribution, we present a method to compensate for long duration data gaps in audio signals, in particular music. To achieve this task, a similarity graph is constructed, based on a short-time Fourier analysis of reliable signal segments, e.g. the uncorrupted remainder of the music piece, and the temporal regions adjacent to the unreliable section of the signal. A suitable candidate segment is then selected through an optimization scheme and smoothly inserted into the gap.

Shall I use MATLAB or Python during my thesis?

At the beginning of my thesis, I chose to use MATLAB as a main programming language for my simulations. Today I believe it was the wrong choice.

Why did I opt for MATLAB?

Mainly because I was very used to it. Since it’s a quick and easy prototyping language, I was able to test my ideas in no time. The user interface is really intuitive too and makes debugging simple. And, since I was maintaining two MATLAB toolboxes, I had a lot of code that and I didn’t want to rewrite in Python. Finally, Python was frightening me because it had a very bad user-interface. So, to master this “almost” new language, I would have had to go through a slow learning process and invested a lot of time too.

What changed my mind today?

Ipython-notebook is an interface that connects the python-console to a web-browser, allowing the user to easily make plots, run cells, add comments, etc. Because of its success, it was extended to other programming languages and developed into a project called Jupyter. You can even use MATLAB with it.

The notebook gave a fresh new start to Python in the scientific community and new toolboxes were ported from MATLAB to Python. For my personal use, the gap in scientific tools between the two languages has been hugely reduced in the last two years.

On the other hand though, MATLAB isn’t able to deal with its main flows. It’s still expensive, close-source, inefficient and complicated to interface with other programming languages. And, remember no one cares if you know MATLAB, however mastering Python is a great asset for you CV.

Conclusion

While I believe MATLAB is still a great tool to experiment and play with, I’m not sure Python isn’t even better for this task. When it comes to seriously implement something, I believe Python is better. At the beginning of a thesis, PhD students often believe that they need to be productive. This is wrong and the first year should be leveraged to understand the fundamentals of the field and to find appropriate work tools. So, if you’re at the beginning of you thesis, I can only recommend you learn Python. I’m quite sure you won’t regret it.

Additional links

 

Where to find datasets?

I do not know any website or repository gathering all datasets. In this blog, I’m just listing a few links pointing to datasets or datasets websites. This list will grow with time.

New paper: compressive PCA on graphs

http://arxiv.org/abs/1602.02070

Abstract

Randomized algorithms reduce the complexity of low-rank recovery methods only w.r.t dimension p of a big dataset YRp×n. However, the case of large n is cumbersome to tackle without sacrificing the recovery. The recently introduced Fast Robust PCA on Graphs (FRPCAG) approximates a recovery method for matrices which are low-rank on graphs constructed between their rows and columns. In this paper we provide a novel framework, Compressive PCA on Graphs (CPCA) for an approximate recovery of such data matrices from sampled measurements. We introduce a RIP condition for low-rank matrices on graphs which enables efficient sampling of the rows and columns to perform FRPCAG on the sampled matrix. Several efficient, parallel and parameter-free decoders are presented along with their theoretical analysis for the low-rank recovery and clustering applications of PCA. On a single core machine, CPCA gains a speed up of p/k over FRPCAG, where k << p is the subspace dimension. Numerically, CPCA can efficiently cluster 70,000 MNIST digits in less than a minute and recover a low-rank matrix of size 10304 X 1000 in 15 secs, which is 6 and 100 times faster than FRPCAG and exact recovery.

New paper: Stationary signal processing on graphs

I’m proud to present a new paper. Using the ideas presented inside, we should be able to improve many graph-based models.

Abstract

Graphs are a central tool in machine learning and information processing as they allow to conveniently capture the structure of complex datasets. In this context, it is of high importance to develop flexible models of signals defined over graphs or networks. In this paper, we generalize the traditional concept of wide sense stationarity to signals defined over the vertices of arbitrary weighted undirected graphs. We show that stationarity is intimately linked to statistical invariance under a localization operator reminiscent of translation. We prove that stationary graph signals are characterized by a well-defined Power Spectral Density that can be efficiently estimated even for large graphs. We leverage this new concept to derive Wiener-type estimation procedures of noisy and partially observed signals and illustrate the performance of this new model for denoising and regression.

Links

A starter kit for Deep Learning

Courses

  1. A MOOC from Geoffrey Hinton, one of the fathers of deep learning
    https://www.coursera.org/course/neuralnets
  2. https://cs231n.github.io/

Book
http://www.deeplearningbook.org/

Blog posts

Selected software
The three main tools are:

  1. The classic guy in python
    http://deeplearning.net/software/theano
    Tutorials found in
    http://deeplearning.net/tutorial/
  2. The other guy in the competition. (I started with that one)
    http://torch.ch/
    Torch has the advantage to be interfaced with Lua. It offers a simple way to create the neural nets. Recently, pytorch gained a lot of attention
    http://pytorch.org/
  3. Tensorflow, the new coming guy from google
    http://www.tensorflow.org/get_started/index.html

As a recommendation, I would advice pytorch of tensorflow.

MATLAB is not a very appropriate language for deep learning. However, it is interesting to use for learning purposes.

  1. https://github.com/rasmusbergpalm/DeepLearnToolbox
  2. http://devblogs.nvidia.com/parallelforall/deep-learning-for-computer-vision-with-matlab-and-cudnn/
  3. http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html
  4. https://github.com/kyunghyuncho/deepmat
  5. https://github.com/sdemyanov/ConvNet

Publications (To be done)
http://research.microsoft.com/pubs/192769/tricks-2012.pdf