I have been working on the uncertainty principle for some time now and I would like to share some interesting ideas I have discovered while studying it.

For many people the uncertainty principle, first stated by Heisenberg, is a mysterious fact applying to elementary particles. What is less known is that it also appears in signal processing. In fact this principle apply to any wave, be it a probability wavefunction in quantum mechanics or an acoustic wave in signal processing.

Contrary to what one may think, it is not an old-fashioned topic of the 20th century over-investigated and where nothing more can be said. It is involved in information theory and in new methods and concepts of signal processing (and not only signal processing). The ones I am thinking of are the concepts of sparsity and compressive sensing. It is much easier to denoise, extract information or compress when the relevant information of the signal can be concentrated in a few locations within it. For example most of the images become sparse when represented as wavelet coefficients. Finding or knowing the representations where data are sparse is a key of modern image and signal processing techniques. Measures of the sparsity or concentration have been developed as well as methods for sparsifying or spreading a function that have led to new versions of the uncertainty principle and a better understanding of it.

The uncertainty principle is based on the following idea. I have drawn a wave function on the figure. Two characteristics are important for a wave function: its position and its oscillating frequency. The wave needs some “space” in order to oscillate. You can see here that the oscillation is located between 50 and 80 (arbitrary units). If one wants to measure the frequency, at least one oscillation is needed. If the oscillation is slower, you need more space.

This is where the problem lies. An oscillation can not be reduced to one point in space, otherwise there is no oscillation. If one wants to be sure of the frequency, a minimum space is required for the measure.

In quantum mechanics, particules (electrons, protons,…) are described as waves (probability waves) and their velocity is given by the frequency of the wave (more precisely the probability distribution of frequencies). That is why Heisenberg’s uncertainty principle states that you can not measure with high precision at the same time the position and the velocity of a particle: you need to measure the oscillations.

In signal processing the same problem arise when doing the spectrogram of an audio signal for example. There is a trade-off between the precision of the representation in the time and frequency domain. You can play with this trade-off by choosing the analysis window of the short-time Fourier transform. The following 2 figures illustrate it. Two spectrograms of the same audio signal have been computed each one with a different window. The signal is a short recording of a glockenspiel playing several notes. On the first figure a narrow window, with a width of a few time steps, has been used to compute the spectrogram. The attack of the notes (impact sound) are thin sharp lines (vertical) as they are short event in time whereas the harmonic oscillations are larger lines (horizontal). The window is too localized in time to measure long oscillating behaviors.

On the second spectrogram a larger window has been used. This time the harmonic components are thin but the attacks are wider. The window is too large, not localized enough in time to measure precisely events of short duration.

This bring us to the problem of measuring the location and frequency of a wave. In quantum mechanics, you do not have a rule to measure a particle, it is too small. You can only sense a particle with another particle. In a spectrogram, you mesure a function by comparing it to another function (the window). The comparison is made by using the scalar product. The act of measuring is limited by the precision of the probe. The precision is given by *the spread* of the function.

In this figure, where is the function? Is it localized at 0.0? It is between -1 and 1 for sure but where exactly? Imagine if the function is not symmetric or contains some noise, it is impossible to tell the precise localization.

This example of function is the real part of a modulated Gaussian \(g(x)=\frac{1}{\sigma\sqrt{\pi}}e^{-\frac{x^2}{\sigma^2}}e^{2i\pi fx}\), where \(\sigma^2=0.1\) and \(f=5\). The first exponential gives the envelop or amplitude, the second one gives the oscillation. Its spread is given by \(\sigma\). Its Fourier transform is \(\hat{g}(k)=e^{-\sigma^2 \pi^2(k-f)^2}\). This is a function with a spread of \(1/\pi \sigma\). For the Gaussian function, the effect of the uncertainty principle can be directly seen as an increase of \(\sigma\) leads to a larger spread of the function in the space (or time) domain and a smaller spread in the frequency domain.

**Measures of spread as measures of uncertainty**

To measure the spreading of a function, one can use the Heisenberg approach and compute the variance of the function around its mean value. The squared absolute value of the function \(|f|^2\) is seen as a probability distribution of energy, the function position is the mean position \(M\) of the distribution and the precision is given by its variance \(V\) as follows

\[M(f)=\int t |f(t)|^2 dt, \qquad V^2(f)=\int t^2 |f(t)|^2 dt-M(f)^2.

\]

We assume here and in all the following formulae that the function is normalized: its \(l^2\)-norm equals one. This leads to the Heisenberg uncertainty principle. The variance in space and in the frequency domain cannot be both arbitrarily small as stated in the famous formula:

\[V(f).V(\hat{f})\ge C,

\]

where \(C\) is a constant depending on the definition of the Fourier transform. Remark that this approach for the spreading assumes that the function is spread around a reference point (the mean). A function having two “bumps” far appart will have a large variance but still looks concentrated. This measure of spreading is not appropriate for such functions.

The spread of this function over the spatial domain should be ‘spread1’+’spread2’ and not the variance. Other spreading measures can be used to solve this problem. Each of them gives a different uncertainty principle.

One measure is given by the entropy \(H\) of the function. The entropy is often used by physicists to measure the disorder or spreading:

\[H(f)=\int|f(x)|^2\ln|f(x)|^2dx

\]

A small entropy for the function indicates an energy concentrated on a few locations (sparsity). It leads to the Entropic uncertainty principle:

$$H(f)+H(\hat{f})\ge 1-\ln2

$$

One can also measure the spreading with \(l^p\) norms, noted \(\|\cdot\|_p\) (The entropy belongs to the same family, with the Renyi entropies):

\[\|f\|_p^p=\int |f(x)|^pdx .

\]

For \(p\) between 1 and 2, a small norm means a sparse signal (for \(p>2\) it is the opposite). This family measures the absolute spreading of functions: 2 bumps of the same shape will give twice the spread of a single one with the \(l^1\)-norm. For example the latter norm is used in signal processing where a sparse signal has to be recovered as the solution of an optimization problem. The uncertainty principle reads:

\[\|f\|_1\|\hat{f}\|_1\ge\frac{1}{\mu},\]

where \(\mu\) is the coherence between the bases used for the signal representations (here the canonical basis and the Fourier basis).

Coming back to the spectrogram, there is an uncertainty principle for it. The principle of the spectrogram is to compare the function to elementary waveforms (the translated and modulated windows) for which the position and frequency are known. The uncertainty principle is then given by the “ambiguity function” uncertainty principle. Within the spectrogram, the energy cannot be too concentrated. The following figure shows the spectrogram of a Gaussian function with itself as the window, there is a minimal area of energy which can not be reduced to a single point.

The concentration is measured with the \(l^p\) norms. For \(p\) between 1 and 2, as we have seen, it is a measure of sparsity of the function. The smallest the \(l^1\)-norm, the sparsest the function. Let us denote by \({\rm STFT}(f,g)\) the short-time Fourier transform of \(f\) with window \(g\). The uncertainty principle is expressed in the following manner:

\[\|{\rm STFT}(f,g)\|_1\ge 2\|f\|_2\|g\|_2.

\]

Assume \(\|f\|_2=\|g\|_2=1\), then by construction the coefficients of the spectrogram are \(|{\rm STFT}(f,g)|\le 1\). This tells us that the spectrogram cannot be reduced to a single red point.

We have seen some examples of the uncertainty principle. There exist others. For example the above relations apply to continuous functions. There are different formulae for the discrete case where signals are sampled and of limited length. If the function domain is more “exotic”, if it is a graph for example, there still an uncertainty principle. A representation of a function is often understood as projecting a function onto an orthonormal basis. This can also be generalized to frames, which are redundant sets of vectors. The redundancy brings additional flexibility and the representation may describe with a better accuracy the information inside a signal.

One can find more information in my work with Bruno Torresani A survey of uncertainty principles and some signal processing applications and Refined support and entropic uncertainty inequalities. I am actually working on the uncertainty principle for signals on graphs and this will be the object of a future post.