Goal: results discussion and review of the report

### Achieved

- Fix various issues with the graph creation (audio_graph.ipynb). See git commits for further details.
- Integrate the Dirichlet energy (via the graph Laplacian) in the objective function (auto_encoder.ipynb).
- More efficient implementation of trace(Z.dot(L.dot(Z.T))).
- Run various experiments using the graph.
- Implement a relative stopping criterion for the outer loop.
- Make a script (audio_experiment.ipynb) to perform automated experiments, i.e. to test the effect of an hyper-parameter on various metrics (e.g. accuracy, speed, sparsity, objective etc.) by automatically generating plots. It also compares to the baseline.
- Take the mean of multiple 10-fold cross-validation runs as the main metric (audio_classification.ipynb).
- Add an ls (redundant) parameter to control the weight of the sparse codes (see experiment k).
- Complete reports of the newest results are now published online.

### Main results

- Introducing the Dirichlet energy (via the graph Laplacian) helps to extract more robust features in the setting of noisy signals.
- Introducing the Dirichlet energy (via the graph Laplacian) helps to extract better features:
- 5 genres: from 79 (+/- 2.7) to 81 (+/- 3.4)

### Idea

- Avec notre graphe, on devrait être plus robuste aux données perturbées n’est-ce pas ? (Puisqu’on impose de la régularité sur le manifold.) Est-ce qu’on aurait créé une sorte d’hybride entre un sparse et un denoising auto-encoder ? Le tout sans devoir perturber les données avant training.
- We may not increase much the accuracy, but become much more robust to perturbations. It is the idea behind NL-means.

### Discussion

- Center the data before measuring the cosine similarity measure ?
- Tune hyper-parameters in the context of noisy data.
- Number of good predictions in majority voting.
- Do not necessarily show the baseline. 😉

### Experiments

- (m) Influence of graph parameters.
- Distance metric: euclidean or cosine

- (l) Influence of ld (small dataset).
- (k) Influence of ls (small dataset).
- (j) Influence of lg (small dataset, rtol=1e-5).
- (i) Influence of convergence (small dataset, lg=100).
- (h) Influence of lg, half dataset (500 songs, 644 frames, m=256), rtol=1e-3. Ran for 3h30.
- Observations:
- lg=0.1: 2208s, 5 outer, sparsity 6.6%, 78 (+/- 3.8)
- lg=0.5: 1860s, 4 outer, sparsity 8.5%, 77 (+/- 4.9)
- lg=1: 2118s, 5 outer, sparsity 8.2%, 78 (+/- 6.7)
- lg=5: 1024s, 10 outer ?, sparsity 40%, 77 (+/- 6.1) (bad convergence)
- lg=10: 1010s, 10 outer ?, 49.8%, 76 (+/- 6.0) (bad convergence)
- lg=20: 973s, 10 outer ?, 64.3%, 75 (+/- 5.5) (bad convergence)
- lg=50: 989s, 10 outer ?, 70%, 73 (+/- 4.5) (bad convergence)

- Conclusions:
- Atoms are all similar as we use the same initialization (via RNG seed). Except the not converged ones.
- lg seems to have some impact, larger than ld.
- Problem: Dirichlet energy does not count for much as it is 2 order of magnitudes smaller.

- Observations:
- (g) Influence of lg, small dataset (500 songs, 149 frames, m=128), rtol=1e-3.
- lg=1: 292s, 3 outer, sparsity 10.3%, 72 (+/- 4.6)
- lg=10: 330s, 3 outer, sparsity 17.1%, 69 (+/- 5.4)
- lg=100: 120s, 1 outer, sparsity 78.5%, 55 (+/- 5.9)

- (f) More efficient computation of \(tr(Z^TLZ)\).
- Original np.trace(Z.dot(L.dot(Z.T))): 286s, 72 (+/- 5.2), single eval 49.6 µs
- Hadamard product np.multiply(L.dot(Z.T), Z.T).sum(): 282s, 72 (+/- 6.9), single eval 34.6 µs
- Einstein summation np.einsum(‘ij,ji->’, L.dot(Z.T), Z): 279s, 73 (+/- 5.5), single eval 30.6 µs
- Speed increase: 2%

- (e) Graph Laplacian as float32.
- HDF5: 50MB to 40MB (500 songs, 149 frames)
- float64: 296s, 72 (+/- 6.4)
- float32: 283s, 72 (+/- 8.1)
- Speed increase: 4%

- (d) Runtime test.
- 500 songs, 149 frames, m=512, rtol=1e-5: 7902s, 74 (+/- 3.8)
- 500 songs, 149 frames, m=128, rtol=1e-3: 279s, 73 (+/- 5.5)

- (c) Test on 500 songs with \(\lambda_g=1\).
- Converge after 7 outer too.
- Same objective values.
- The Dirichlet penalty is an order of magnitude lower.
- A bit less sparse (because of increased smoothing): 1.5% –> 1.8%.
- Atoms look similar.
- Accuracy increase of 1-2%. 80 (+/- 4.0) without normalization, 81 (+/- 3.4) with minmax.
- Slower: from 5h30 to 8h15 to extract features.

- (b) Introducing the Dirichlet energy in the objective, small dataset.
- (a) Fix the graph creation.