Discussed April 1st, presented by Vassilis, notes by Vassilis

### Summary

The paper we reviewed today is not as close to our stuff as we thought initially… It is actually quite

complicated and it builds a lot on other people’s works that are already complex. Some interesting

points:

- the graph is given from outside, not created from the standard data matrix. It encodes what they call “context”, that depends on the application. I.e. the context of a node is its neighboring nodes.
- it is a kind of simple neural network that passes both the input data and embeddings (to be learned) through a non-linear function (max(0, x)), and finally from a softmax to predict the labels. They only used one hidden layer.
- The graph is only used in learning the embeddings, but the embeddings are used both for graph context prediction AND for class label prediction (therefore both loss functions are used for learning the embeddings).
- For input queries that don’t reside on nodes of the graph, the embeddings can be computed as the outputs of a hidden layer fed with the input query.
- to train using the graph they uniformly sample a short random walk from the graph, then from this they pick two nodes with a maximum distance of d hops. This is done because getting all possible pairs is too expensive and this sampling seems more efficient for (block) stochastic gradient descent.

### Notes

Explanations of word2vec:

### References

[1] Yang, Z., Cohen, W., and Salakhutdinov, R. (2016). Revisiting Semi-Supervised Learning with Graph Embeddings. arXiv:1603.08861 [Cs].