Meeting 11 (15.06)

Goal: results discussion and review of the report

Achieved

  • Fix various issues with the graph creation (audio_graph.ipynb). See git commits for further details.
  • Integrate the Dirichlet energy (via the graph Laplacian) in the objective function (auto_encoder.ipynb).
  • More efficient implementation of trace(Z.dot(L.dot(Z.T))).
  • Run various experiments using the graph.
  • Implement a relative stopping criterion for the outer loop.
  • Make a script (audio_experiment.ipynb) to perform automated experiments, i.e. to test the effect of an hyper-parameter on various metrics (e.g. accuracy, speed, sparsity, objective etc.) by automatically generating plots. It also compares to the baseline.
  • Take the mean of multiple 10-fold cross-validation runs as the main metric (audio_classification.ipynb).
  • Add an ls (redundant) parameter to control the weight of the sparse codes (see experiment k).
  • Complete reports of the newest results are now published online.

Main results

  • Introducing the Dirichlet energy (via the graph Laplacian) helps to extract more robust features in the setting of noisy signals.
  • Introducing the Dirichlet energy (via the graph Laplacian) helps to extract better features:
    • 5 genres: from 79 (+/- 2.7) to 81 (+/- 3.4)

Idea

  • Avec notre graphe, on devrait être plus robuste aux données perturbées n’est-ce pas ? (Puisqu’on impose de la régularité sur le manifold.) Est-ce qu’on aurait créé une sorte d’hybride entre un sparse et un denoising auto-encoder ? Le tout sans devoir perturber les données avant training.
  • We may not increase much the accuracy, but become much more robust to perturbations. It is the idea behind NL-means.

Discussion

  • Center the data before measuring the cosine similarity measure ?
  • Tune hyper-parameters in the context of noisy data.
  • Number of good predictions in majority voting.
  • Do not necessarily show the baseline. 😉

Experiments

  • (m) Influence of graph parameters.
  • (l) Influence of ld (small dataset).
  • (k) Influence of ls (small dataset).
  • (j) Influence of lg (small dataset, rtol=1e-5).
  • (i) Influence of convergence (small dataset, lg=100).
  • (h) Influence of lg, half dataset (500 songs, 644 frames, m=256), rtol=1e-3. Ran for 3h30.
    • Observations:
      • lg=0.1: 2208s, 5 outer, sparsity 6.6%, 78 (+/- 3.8)
      • lg=0.5: 1860s, 4 outer, sparsity 8.5%, 77 (+/- 4.9)
      • lg=1: 2118s, 5 outer, sparsity 8.2%, 78 (+/- 6.7)
      • lg=5: 1024s, 10 outer ?, sparsity 40%, 77 (+/- 6.1) (bad convergence)
      • lg=10: 1010s, 10 outer ?, 49.8%, 76 (+/- 6.0) (bad convergence)
      • lg=20: 973s, 10 outer ?, 64.3%, 75 (+/- 5.5) (bad convergence)
      • lg=50: 989s, 10 outer ?, 70%, 73 (+/- 4.5) (bad convergence)
    • Conclusions:
      • Atoms are all similar as we use the same initialization (via RNG seed). Except the not converged ones.
      • lg seems to have some impact, larger than ld.
      • Problem: Dirichlet energy does not count for much as it is 2 order of magnitudes smaller.
  • (g) Influence of lg, small dataset (500 songs, 149 frames, m=128), rtol=1e-3.
    • lg=1: 292s, 3 outer, sparsity 10.3%, 72 (+/- 4.6)
    • lg=10: 330s, 3 outer, sparsity 17.1%, 69 (+/- 5.4)
    • lg=100: 120s, 1 outer, sparsity 78.5%, 55 (+/-  5.9)
  • (f) More efficient computation of \(tr(Z^TLZ)\).
    • Original np.trace(Z.dot(L.dot(Z.T))): 286s, 72 (+/- 5.2), single eval 49.6 µs
    • Hadamard product np.multiply(L.dot(Z.T), Z.T).sum(): 282s, 72 (+/- 6.9), single eval 34.6 µs
    • Einstein summation np.einsum(‘ij,ji->’, L.dot(Z.T), Z): 279s, 73 (+/- 5.5), single eval 30.6 µs
    • Speed increase: 2%
  • (e) Graph Laplacian as float32.
    • HDF5: 50MB to 40MB (500 songs, 149 frames)
    • float64: 296s, 72 (+/- 6.4)
    • float32: 283s, 72 (+/- 8.1)
    • Speed increase: 4%
  • (d) Runtime test.
    • 500 songs, 149 frames, m=512, rtol=1e-5: 7902s, 74 (+/- 3.8)
    • 500 songs, 149 frames, m=128, rtol=1e-3: 279s, 73 (+/- 5.5)
  • (c) Test on 500 songs with \(\lambda_g=1\).
    • Converge after 7 outer too.
    • Same objective values.
    • The Dirichlet penalty is an order of magnitude lower.
    • A bit less sparse (because of increased smoothing): 1.5% –> 1.8%.
    • Atoms look similar.
    • Accuracy increase of 1-2%. 80 (+/- 4.0) without normalization, 81 (+/- 3.4) with minmax.
    • Slower: from 5h30 to 8h15 to extract features.
  • (b) Introducing the Dirichlet energy in the objective, small dataset.
  • (a) Fix the graph creation.

Michaël Defferrard

I am currently pursuing master studies in Information Technologies at EPFL. My master project, conducted at the LTS2 Signal Processing laboratory led by Prof. Pierre Vandergheynst, is about audio classification with structured deep learning. I previously devised an image inpainting algorithm. It used a non-local patch graph representation of the image and a structure detector which leverages the graph representation and influences the fill-order of the exemplar-based algorithm. I've been a Research Assistant in the lab, where I did investigate Super Resolution methods for Mass Spectrometry. I develop PyUNLocBoX, a convex optimization toolbox in Python.

Leave a Reply