Benchmarking machine-readable vectors of chemical reactions on computed activation barriers

Puck van Gerwen^1,^2,³, Ksenia R. Briling^1,^2*, Yannick Calvino Alonso^1,^2*, Malte Franke², Clemence Corminboeuf^1,^2,^3*

1 Institut des Sciences et Ingénierie Chimiques, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland

2 Laboratory for Computational Molecular Design, Institute of Chemical Sciences and Engineering, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland

3 National Center for Competence in Research-Catalysis (NCCR-Catalysis), École Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland

* Corresponding authors emails: ksenia.briling@epfl.ch, yannick.calvinoalonso@epfl.ch, clemence.corminboeuf@epfl.ch

DOI10.24435/materialscloud:xd-10 [version v1]

Publication date: Oct 16, 2024

How to cite this record

Puck van Gerwen, Ksenia R. Briling, Yannick Calvino Alonso, Malte Franke, Clemence Corminboeuf, Benchmarking machine-readable vectors of chemical reactions on computed activation barriers, Materials Cloud Archive 2024.163 (2024), https://doi.org/10.24435/materialscloud:xd-10

Description

In recent years, there has been a surge of interest in predicting computed activation barriers, to enable the acceleration of the automated exploration of reaction networks. Consequently, various predictive approaches have emerged, ranging from graph-based models to methods based on the three-dimensional structure of reactants and products. In tandem, many representations have been developed to predict experimental targets, which may hold promise for barrier prediction as well. Here, we bring together all of these efforts and benchmark various methods (Morgan fingerprints, the DRFP, the CGR representation-based Chemprop, SLATM_d, B²R_l², EquiReact and language model BERT + RXNFP) for the prediction of computed activation barriers on three diverse datasets. This record includes data to support the article "Benchmarking machine-readable vectors of chemical reactions on computed activation barriers". This supports the github repository https://github.com/lcmd-epfl/benchmark-barrier-learning which contains the codes and duplicates the data.

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.

Files

File name	Size	Description
datasets.tar.gz MD5md5:4a979a671d7fdbcdc99bbe4578c36a0f	944.0 MiB	The tar ball file `datasets.tar.gz` contains three folders corresponding to each dataset used in the article. Each of them contains the geometries (xyz-files), SMILES and properties (CSV-file), and the raw binary data (data-splits, results, and fingerprints/representations) See README.txt for more information.
README.txt MD5md5:f9d6e150a6b9bc932fcc7ca2bd535e97	5.2 KiB	README

License

Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

External references

Journal reference

P. van Gerwen, K. R. Briling, Y. Calvino Alonso, M. Franke, and C. Corminboeuf, Digital Discovery 3, 932–943 (2024) doi:10.1039/D3DD00175J

Keywords

chemical reactions machine learning benchmark EPFL MARVEL/P2 SNSF ERC

Version history:

2024.163 (version v1) [This version]	Oct 16, 2024	DOI10.24435/materialscloud:xd-10

Recommended by

Indexed by

materialscloud:2024.163