Meeting 12 (19.06)

Goal: report and results review.

Achieved

  • Exploit the idea of the structuring auto-encoder as an hybrid between a sparse and a denoising auto-encoder.
  • Disable majority voting: smaller variance due to increased number of samples to classify (i.e. 12 times more). The accuracy increases: it is easier to classify feature vectors than whole clips.

Main results

  • Introducing the graph Laplacian makes the model very robust against noise.
  • Improvement of 7% in a noisy or noiseless environment. However in the noiseless environment the addition of the graph is not significant.
  • The kNN approximation has an influence on the constructed graph and the accuracy of at least 1.3%. There is room for better performance with a better constructed graph. The quality of the FLANN approximation should be investigated.
  • The hyper-parameters do not seem to be very sensible, i.e. we may not have to fine-tune them.
  • Speed:
    • Noiseless: graph generation is fast (~200s), feature extraction is slow (~2700s)
    • Noisy 10%: graph generation is slow (~1200s), feature extraction is fast (~900s)
    • Classification: ~400s with graph, ~700s without graph.
  • The encoder is adding structure.

Idea

  • It works best when all the objectives have an equivalent value –> auto-tuning of the hyper-parameters ! Can be approximately assessed after one outer iteration already.
  • Higher the $\lambda$, higher the importance of the sub-objective. Try to give equal weights.
  • Train the auto-encoder (graph, dictionary, encoder) on clean data and test on noisy data.
  • Go further and classify individual frames ? That would mean to discard feature aggregation.
  • Use NLDR to assess if there is really an underlying structure / manifold in the data ?
  • Would be interesting to launch multiple simulations with the same hyper-parameters without seeding the RNG to see if the results are stable.

Experiments

  • Keep smallest eigenvalues.
  • Hyper-parameters fine-tuning.
  • Increase sparsity in the presence of noise.
  • Standardization instead of scaling.
  • Accuracy results to be published (report, presentation).
    • (k) Without noise.
    • (k) With 20% noise.
  • (j) Encoder.
  • Performance stability (over various sources of randomness).
    • (i) Distance metric.
    • Zero mean for euclidean.
  • (h) Graph weight lg, noiseless setting.
  • Training / testing ratio.
    • (g) Without graph, without noise.
    • (g) With graph, without noise.
  • Graph vs no graph vs spectrograms.
    • (f) With noise.
    • (f) Without noise.
  • Better graph
    • Scaling (features, samples, dataset). minmax vs std.
    • Disable feature scaling. Then scale the noise ! And verify tolerance.
    • (e) Distance metric.
  • (d) Noise level.
  • (c) Classify individual feature vectors, i.e. no majority voting.
  • (b) Understand the influence of the graph in a noisy data setting. Warning: the graph was created with noiseless data.
  • (a) Noisy signal hypothesis.

Discussion

  • Au final le résultat sera un plot de baseline, no graph et graph w.r.t. noise level. Conclusion: la préservation de la structure (1) extrait de meilleure features et (2) est robuste au bruit
  • Et si chaque feature est normalisée indépendamment. Ça enlève le biais envers les features à forte variation, mais est-ce que c’est ce qu’on veut ? Xavier: “oui dans le cas general.”.
  • Xavier: “on peut appliquer le fameux pagerank regularization, qui est tres robust a des graphs mal construits”.
  • Fait intéressant: il semble que l’encodeur ajoute de la structure ! Car l’énergie de Dirichlet diminue.
  • Xavier: “oui, mais tu ne dois pas oublier que tu fais du *transductive* learning, c’est a dire tu apprends les features en utilisant training + TEST data. C’est pour cela que tu n’as pas une amelioration significative. Si on faisait du *supervised* learning, c’est a dire on utilise seulement les TRAINING data, alors ce probleme est bcp plus challenging que le transductive probleme, et la je pense que nos resultats avec graph seraient bien meilleures! C’est un commentaire que je te conseille d’ajouter a ton resultat pour le mettre en perspective. Aussi la premiere chose a faire apres le PDM est de faire du *supervised* learning avec graph et le comparer avec no graph, je pense que l’on aura des (bonnes) surprises!”
  • Is there anything we can say about good enough local minima ? Xavier: “oui, il peut y avoir des local minima qui sont d’excellentes solutions a des learning problems.”.
  • Or that many local minima are actually similar and we don’t care in which we fall ? Xavier: “NO! a bad solution and a good solution can have the same energy!”.
  • Why energy formulations are good ? Xavier: “Many reasons s.a. good understanding, robustness, existence of solutions, design and analysis of optimization algorithms…”.

Michaël Defferrard

I am currently pursuing master studies in Information Technologies at EPFL. My master project, conducted at the LTS2 Signal Processing laboratory led by Prof. Pierre Vandergheynst, is about audio classification with structured deep learning. I previously devised an image inpainting algorithm. It used a non-local patch graph representation of the image and a structure detector which leverages the graph representation and influences the fill-order of the exemplar-based algorithm. I've been a Research Assistant in the lab, where I did investigate Super Resolution methods for Mass Spectrometry. I develop PyUNLocBoX, a convex optimization toolbox in Python.

Leave a Reply