Goal: final review of the optimization schemes. We now move on implementation (toward ISMIR paper) and will come back to optimization later.
- Review of the proposed optimization scheme by Xavier which uses primal-dual (Chambolle-Pock)  for the three convex minimizations.
- Complete batch training optimization scheme:
- \(\min_D\) : FISTA .
- \(\min_Z\) : FISTA with adaptive restart .
- \(\min_E\) : A forward-backward based primal-dual formulation which exploits the gradient (unlike Chambolle-Pock). It is a generalization of Chambolle-Pock and ISTA . Unlike these two, the method is however not accelerated. There exists variants for this purpose.
- No trick, the formulation fits each solver.
- Optimal convergence in \(O(1/k^2)\) for \(Z\) and \(D\).
- Linear convergence for \(Z\) after some iterations thanks to adaptive restart.
- All available gradients are exploited.
- Each minimization is distributable, i.e. the sub-problems are independent. The matrices only need to be reunified in the outer loop.
- Easy to add the \(tr(ZLZ^T)\) regularization. We would however loose the parallelization ability by definition of the problem (smooth gradient for each dimension of the sparse code).
- The paper git repository is now on GitLab (was previously local).
- Why not set \(E = D^T\) ? See mathematically or with experiments if it is equivalent.
- Why constraints on rows of \(E\) ? It was a mistake by Xavier, they should be on columns.
- Discussion of my optimization scheme.
- How to clearly states rows and columns of matrices ?
- pyGSP status: can create KNN graphs and compute the graph Laplacian. There is some unit tests but it is preferable to compare with matlab.
- Literature reviews:
- Are other people working on clean optimization schemes for deep learning ?
- Optimization schemes for sparse coding. Does everybody use FISTA for Z ? What are they using for D ?
- Auto-encoder prototype for images (easier to have some feelings).
- Setup: pre-processing and post-processing.
- Merge 1 and 2: results with genre recognition.
- Interpret results (classification accuracy, learned basis).
- Write paper (6+1 pages): abstract April 27th, deadline May 4th.
- Start the implementation of a prototype in Python for images.
- Write down the graph definition.
- Prepare a new revision of the paper for review.
- Continue to read the amassed papers.
- Another convex optimization algorithm by Nesterov .
- Nice discussion of duality with algorithms in .
- Other methods to investigate for \(\min_E\): SDMM (a parallel generalization of ADMM) , ppxa , generalized forward-backward . Ils sont lents, mais pas de convergence optimale possible de toute manière. Autre solution: enlever \(||EX||_1\) (et résoudre avec FISTA), la contrainte (et résoudre avec Chambolle-Pock accéléré) ou les deux (et résoudre par inversion de matrice ou gradient descent). Convergence rapide dans tous les cas.
- Xavier: the Lipschitz constant is less important for Chambolle-Pock than it is for FISTA.
 Nesterov, Y. (2007). Gradient methods for minimizing composite objective function.
 Komodakis, N., & Pesquet, J. C. (2014). Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. arXiv preprint arXiv:1406.5429.
 Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183-202.
 O’Donoghue, B., & Candes, E. (2013). Adaptive restart for accelerated gradient schemes. Foundations of computational mathematics, 1-18.
 Chambolle, A., & Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1), 120-145.
 Pesquet, J. C., & Pustelnik, N. (2012). A parallel inertial proximal optimization method. Pacific Journal of Optimization, 8(2), 273-305.
 Combettes, P. L., & Pesquet, J. C. (2008). A proximal decomposition method for solving convex variational inverse problems. Inverse problems, 24(6), 065014.
 Raguet, H., Fadili, J., & Peyré, G. (2013). A generalized forward-backward splitting. SIAM Journal on Imaging Sciences, 6(3), 1199-1226.