Goal: final review of the optimization schemes. We now move on implementation (toward ISMIR paper) and will come back to optimization later.

### Achieved

- Review of the proposed optimization scheme by Xavier which uses primal-dual (Chambolle-Pock) [5] for the three convex minimizations.
- Complete
*batch training* optimization scheme:
- \(\min_D\) : FISTA [3].
- \(\min_Z\) : FISTA with adaptive restart [4].
- \(\min_E\) : A forward-backward based primal-dual formulation which exploits the gradient (unlike Chambolle-Pock). It is a generalization of Chambolle-Pock and ISTA [2]. Unlike these two, the method is however not accelerated. There exists variants for this purpose.

- Characteristics:
- No trick, the formulation fits each solver.
- Optimal convergence in \(O(1/k^2)\) for \(Z\) and \(D\).
- Linear convergence for \(Z\) after some iterations thanks to adaptive restart.
- All available gradients are exploited.
- Each minimization is distributable, i.e. the sub-problems are independent. The matrices only need to be reunified in the outer loop.
- Easy to add the \(tr(ZLZ^T)\) regularization. We would however loose the parallelization ability by definition of the problem (smooth gradient for each dimension of the sparse code).

- The paper git repository is now on GitLab (was previously local).

### Discussion

- Why not set \(E = D^T\) ? See mathematically or with experiments if it is equivalent.
- Why constraints on rows of \(E\) ? It was a mistake by Xavier, they should be on columns.
- Discussion of my optimization scheme.
- How to clearly states rows and columns of matrices ?
- pyGSP status: can create KNN graphs and compute the graph Laplacian. There is some unit tests but it is preferable to compare with matlab.
- Literature reviews:
- Are other people working on clean optimization schemes for deep learning ?
- Optimization schemes for sparse coding. Does everybody use FISTA for Z ? What are they using for D ?

- Agenda:
- Auto-encoder prototype for images (easier to have some feelings).
- Setup: pre-processing and post-processing.
- Merge 1 and 2: results with genre recognition.
- Interpret results (classification accuracy, learned basis).
- Write paper (6+1 pages): abstract April 27th, deadline May 4th.

### Next week

- Start the implementation of a prototype in Python for images.
- Write down the graph definition.
- Prepare a new revision of the paper for review.
- Continue to read the amassed papers.

### Notes

- Another convex optimization algorithm by Nesterov [1].
- Nice discussion of duality with algorithms in [2].
- Other methods to investigate for \(\min_E\): SDMM (a parallel generalization of ADMM) [6], ppxa [7], generalized forward-backward [8]. Ils sont lents, mais pas de convergence optimale possible de toute manière. Autre solution: enlever \(||EX||_1\) (et résoudre avec FISTA), la contrainte (et résoudre avec Chambolle-Pock accéléré) ou les deux (et résoudre par inversion de matrice ou gradient descent). Convergence rapide dans tous les cas.
- Xavier: the Lipschitz constant is less important for Chambolle-Pock than it is for FISTA.

### References

[1] Nesterov, Y. (2007). Gradient methods for minimizing composite objective function.

[2] Komodakis, N., & Pesquet, J. C. (2014). Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. *arXiv preprint arXiv:1406.5429*.

[3] Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. *SIAM Journal on Imaging Sciences*, *2*(1), 183-202.

[4] O’Donoghue, B., & Candes, E. (2013). Adaptive restart for accelerated gradient schemes. *Foundations of computational mathematics*, 1-18.

[5] Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. *Journal of Mathematical Imaging and Vision*, *40*(1), 120-145.

[6] Pesquet, J. C., & Pustelnik, N. (2012). A parallel inertial proximal optimization method. *Pacific Journal of Optimization*, *8*(2), 273-305.

[7] Combettes, P. L., & Pesquet, J. C. (2008). A proximal decomposition method for solving convex variational inverse problems. *Inverse problems*, *24*(6), 065014.

[8] Raguet, H., Fadili, J., & Peyré, G. (2013). A generalized forward-backward splitting. *SIAM Journal on Imaging Sciences*, *6*(3), 1199-1226.