Author Archive: nperraud

Starting with Python

Following my post on Python versus Matlab, I have started to use Python to solve various machine learning problems.

To start with Python, I recommend several resources:

More resources to come…

Enjoy your Python learning

 

Audio inpainting with similarity graphs

I’m very proud to announce the release of a new kind of audio inpainting algorithm. It is able to reconstruct long missing parts of a song by searching through the rest of the content for a suitable replacement.

You can try the algorithm online or download the code here and run it on your machine. A technical report associated with this algorithm is available on arXiv.

Enjoy!

Abstract
In this contribution, we present a method to compensate for long duration data gaps in audio signals, in particular music. To achieve this task, a similarity graph is constructed, based on a short-time Fourier analysis of reliable signal segments, e.g. the uncorrupted remainder of the music piece, and the temporal regions adjacent to the unreliable section of the signal. A suitable candidate segment is then selected through an optimization scheme and smoothly inserted into the gap.

Shall I use MATLAB or Python during my thesis?

At the beginning of my thesis, I chose to use MATLAB as a main programming language for my simulations. Today I believe it was the wrong choice.

Why did I opt for MATLAB?

Mainly because I was very used to it. Since it’s a quick and easy prototyping language, I was able to test my ideas in no time. The user interface is really intuitive too and makes debugging simple. And, since I was maintaining two MATLAB toolboxes, I had a lot of code that and I didn’t want to rewrite in Python. Finally, Python was frightening me because it had a very bad user-interface. So, to master this “almost” new language, I would have had to go through a slow learning process and invested a lot of time too.

What changed my mind today?

Ipython-notebook is an interface that connects the python-console to a web-browser, allowing the user to easily make plots, run cells, add comments, etc. Because of its success, it was extended to other programming languages and developed into a project called Jupyter. You can even use MATLAB with it.

The notebook gave a fresh new start to Python in the scientific community and new toolboxes were ported from MATLAB to Python. For my personal use, the gap in scientific tools between the two languages has been hugely reduced in the last two years.

On the other hand though, MATLAB isn’t able to deal with its main flows. It’s still expensive, close-source, inefficient and complicated to interface with other programming languages. And, remember no one cares if you know MATLAB, however mastering Python is a great asset for you CV.

Conclusion

While I believe MATLAB is still a great tool to experiment and play with, I’m not sure Python isn’t even better for this task. When it comes to seriously implement something, I believe Python is better. At the beginning of a thesis, PhD students often believe that they need to be productive. This is wrong and the first year should be leveraged to understand the fundamentals of the field and to find appropriate work tools. So, if you’re at the beginning of you thesis, I can only recommend you learn Python. I’m quite sure you won’t regret it.

Additional links

 

New paper: Global and Local Uncertainty Principles for Signals on Graphs

I’m particularly proud to present this work since we have been working on it for almost 4 years. It started with my master thesis in 2012 before any publications on graph uncertainty were out. Even though there were new coming articles dealing with uncertainty every few months, we kept working our initial idea.

In this paper, we generalize some classical uncertainty principles for signals residing on Euclidean domains to uncertainty principles for signals residing on weighted graphs. To do so, we use generalizations of time-frequency transforms and ambiguity functions. Contrary to the classical setting, the uncertainty in the graph setting depends on the localization of the signal, leading to the new concept of “local uncertainty.”

ArXiv link: http://arxiv.org/abs/1603.03030

Abstract

Uncertainty principles such as Heisenberg’s provide limits on the time-frequency concentration of a signal, and constitute an important theoretical tool for designing and evaluating linear signal transforms. Generalizations of such principles to the graph setting can inform dictionary design for graph signals, lead to algorithms for reconstructing missing information from graph signals via sparse representations, and yield new graph analysis tools. While previous work has focused on generalizing notions of spreads of a graph signal in the vertex and graph spectral domains, our approach is to generalize the methods of Lieb in order to develop uncertainty principles that provide limits on the concentration of the analysis coefficients of any graph signal under a dictionary transform whose atoms are jointly localized in the vertex and graph spectral domains. One challenge we highlight is that due to the inhomogeneity of the underlying graph data domain, the local structure in a single small region of the graph can drastically affect the uncertainty bounds for signals concentrated in different regions of the graph, limiting the information provided by global uncertainty principles. Accordingly, we suggest a new way to incorporate a notion of locality, and develop local uncertainty principles that bound the concentration of the analysis coefficients of each atom of a localized graph spectral filter frame in terms of quantities that depend on the local structure of the graph around the center vertex of the given atom. Finally, we demonstrate how our proposed local uncertainty measures can improve the random sampling of graph signals.

Where to find datasets?

I do not know any website or repository gathering all datasets. In this blog, I’m just listing a few links pointing to datasets or datasets websites. This list will grow with time.

New paper: compressive PCA on graphs

http://arxiv.org/abs/1602.02070

Abstract

Randomized algorithms reduce the complexity of low-rank recovery methods only w.r.t dimension p of a big dataset YRp×n. However, the case of large n is cumbersome to tackle without sacrificing the recovery. The recently introduced Fast Robust PCA on Graphs (FRPCAG) approximates a recovery method for matrices which are low-rank on graphs constructed between their rows and columns. In this paper we provide a novel framework, Compressive PCA on Graphs (CPCA) for an approximate recovery of such data matrices from sampled measurements. We introduce a RIP condition for low-rank matrices on graphs which enables efficient sampling of the rows and columns to perform FRPCAG on the sampled matrix. Several efficient, parallel and parameter-free decoders are presented along with their theoretical analysis for the low-rank recovery and clustering applications of PCA. On a single core machine, CPCA gains a speed up of p/k over FRPCAG, where k << p is the subspace dimension. Numerically, CPCA can efficiently cluster 70,000 MNIST digits in less than a minute and recover a low-rank matrix of size 10304 X 1000 in 15 secs, which is 6 and 100 times faster than FRPCAG and exact recovery.

New paper: Stationary signal processing on graphs

I’m proud to present a new paper. Using the ideas presented inside, we should be able to improve many graph-based models.

Abstract

Graphs are a central tool in machine learning and information processing as they allow to conveniently capture the structure of complex datasets. In this context, it is of high importance to develop flexible models of signals defined over graphs or networks. In this paper, we generalize the traditional concept of wide sense stationarity to signals defined over the vertices of arbitrary weighted undirected graphs. We show that stationarity is intimately linked to statistical invariance under a localization operator reminiscent of translation. We prove that stationary graph signals are characterized by a well-defined Power Spectral Density that can be efficiently estimated even for large graphs. We leverage this new concept to derive Wiener-type estimation procedures of noisy and partially observed signals and illustrate the performance of this new model for denoising and regression.

Links

A starter kit for Deep Learning

Courses

  1. A MOOC from Geoffrey Hinton, one of the fathers of deep learning
    https://www.coursera.org/course/neuralnets
  2. https://cs231n.github.io/

Book
http://www.deeplearningbook.org/

Blog posts

Selected software
The three main tools are:

  1. The classic guy in python
    http://deeplearning.net/software/theano
    Tutorials found in
    http://deeplearning.net/tutorial/
  2. The other guy in the competition. (I started with that one)
    http://torch.ch/
    Torch has the advantage to be interfaced with Lua. It offers a simple way to create the neural nets. Recently, pytorch gained a lot of attention
    http://pytorch.org/
  3. Tensorflow, the new coming guy from google
    http://www.tensorflow.org/get_started/index.html

As a recommendation, I would advice pytorch of tensorflow.

MATLAB is not a very appropriate language for deep learning. However, it is interesting to use for learning purposes.

  1. https://github.com/rasmusbergpalm/DeepLearnToolbox
  2. http://devblogs.nvidia.com/parallelforall/deep-learning-for-computer-vision-with-matlab-and-cudnn/
  3. http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html
  4. https://github.com/kyunghyuncho/deepmat
  5. https://github.com/sdemyanov/ConvNet

Publications (To be done)
http://research.microsoft.com/pubs/192769/tricks-2012.pdf