Supervision: Daniel Probst

Project type: Semester project (master) Master thesis

Available

What's it about. Machine learning on big chemical data has recently produced intriguing results and was shown to be capable of predicting the results of chemical and biological reactions, the toxicity of compounds, or the potential of a newly synthesised molecule to become a drug. In the chemical and biological sciences, molecules are often represented by graphs where vertices represent atoms, and edges the covalent bonds between them. These molecular graphs can then be either used directly as inputs for machine learning methods such as graph neural networks (GNNs), or be further transformed into strings to become input for natural language processing (NLP) methods. While this graph representation is sufficient for common organic molecules, it fails to represent other chemical entities such as polymers or metal complexes.

Project Goal. The goal of this project is to develop a universal representation for molecules, which encompasses the diverse space of chemical entities by either generalizing an existing approach or establishing a new one. The resulting representation will then be applied to chemical data set and evaluated using an existing neural network architecture such as a transformer or a graph neural network. This project can lead to a publication.

Profile. You're a computer scientist or mathematician with an interest in the natural sciences or a chemist or biologist with an interest in computer science. Experience in programming (Python and/or C-style language) and machine learning is of advantage.

Supervisor. Daniel Probst. I'm a computer scientist with experience in biology and chemistry and great interest in bringing people from diverse backgrounds together to create cool science.

Contact. If you're interested or have any questions, do not hesitate to contact me by e-mail daniel.probst@epfl.ch.