You are currently on a failover version of the Materials Cloud Archive hosted at CINECA, Italy.
Click here to access the main Materials Cloud Archive.
Note: If the link above redirects you to this page, it means that the Archive is currently offline due to maintenance. We will be back online as soon as possible.
This version is read-only: you can view published records and download files, but you cannot create new records or make changes to existing ones.

Exploring the design space of machine-learning models for quantum chemistry with a fully differentiable framework


JSON Export

{
  "id": "2715", 
  "created": "2025-06-02T15:49:39.503639+00:00", 
  "updated": "2025-06-02T19:42:28.910552+00:00", 
  "revision": 2, 
  "metadata": {
    "_files": [
      {
        "key": "dataset.tar.xz", 
        "size": 1156517180, 
        "description": "Dataset of organic molecules of varying complexity along with scripts to train the model and reproduce the figures", 
        "checksum": "md5:86a870be93068bd7e2bb5a2524524827"
      }, 
      {
        "key": "README.md", 
        "size": 5895, 
        "description": "README file with the description of the structure of the files stored in dataset.tar.xz", 
        "checksum": "md5:6950ed956e59102be34dce9718396e36"
      }
    ], 
    "license_addendum": null, 
    "edited_by": 576, 
    "publication_date": "Jun 02, 2025, 21:42:28", 
    "contributors": [
      {
        "email": "divya.suman@epfl.ch", 
        "affiliations": [
          "Laboratory of Computational Science and Modeling, Institut des Mat\u00e9riaux, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "givennames": "Divya", 
        "familyname": "Suman"
      }, 
      {
        "affiliations": [
          "Laboratory of Computational Science and Modeling, Institut des Mat\u00e9riaux, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "givennames": "Jigyasa", 
        "familyname": "Nigam"
      }, 
      {
        "email": "sandra.saade@epfl.ch", 
        "affiliations": [
          "Laboratory of Computational Science and Modeling, Institut des Mat\u00e9riaux, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "givennames": "Sandra", 
        "familyname": "Saade"
      }, 
      {
        "email": "paolo.pegolo@epfl.ch", 
        "affiliations": [
          "Laboratory of Computational Science and Modeling, Institut des Mat\u00e9riaux, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "givennames": "Paolo", 
        "familyname": "Pegolo"
      }, 
      {
        "affiliations": [
          "Laboratory of Computational Science and Modeling, Institut des Mat\u00e9riaux, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland"
        ], 
        "givennames": "Hanna", 
        "familyname": "Tuerk"
      }, 
      {
        "affiliations": [
          "Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA"
        ], 
        "givennames": "Xing", 
        "familyname": "Zhang"
      }, 
      {
        "affiliations": [
          "Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA"
        ], 
        "givennames": "Garnet", 
        "familyname": "Kin-Lic Chan"
      }, 
      {
        "affiliations": [
          "Laboratory of Computational Science and Modeling, Institut des Mat\u00e9riaux, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL), 1015 Lausanne, Switzerland", 
          "Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA"
        ], 
        "givennames": "Michele", 
        "familyname": "Ceriotti"
      }
    ], 
    "description": "Traditional atomistic machine learning (ML) models serve as surrogates for quantum mechanical (QM) properties, predicting quantities such as dipole moments and polarizabilities, directly from compositions and geometries of atomic configurations. With the emergence of ML approaches to predict the \u201cingredients\u201d of a QM calculation, such as the ground state charge density or the effective single-particle Hamiltonian, it has become possible to obtain multiple properties through analytical physics-based operations on these intermediate ML predictions. We present a framework to seamlessly integrate the prediction of an effective electronic Hamiltonian, for both molecular and condensed-phase systems, with PySCFAD, a differentiable QM workflow. This integration facilitates training models indirectly against functions of the Hamiltonian such as electronic energy levels, dipole moments, polarizability, etc. We then use this framework to explore various possible choices within the design space of hybrid ML/QM models, examining the influence of incorporating multiple targets on model performance and learning a reduced-basis ML Hamiltonian that can\nreproduce targets computed from a much larger basis. Our benchmarks evaluate the accuracy and transferability of these hybrid models, compare them against predictions of atomic properties from their surrogate models, and provide indications to guide the design of the interface between the ML and QM components of the model. For our benchmarks we have used a subset of the QM7 and QM9 datasets as well as two extrapolative datasets for long chain polyalkenes/acenes and polyenoic acid series, along with a small graphene dataset.", 
    "id": "2715", 
    "owner": 1784, 
    "references": [
      {
        "comment": "Preprint where the data is discussed", 
        "url": "https://arxiv.org/abs/2504.01187", 
        "type": "Preprint", 
        "citation": "D. Suman, J. Nigam, S. Saade, P. Pegolo, H. Tuerk, X. Zhang, G.K. Chan, and M. Ceriotti, (2025). arXiv preprint.", 
        "doi": "arXiv:2504.01187"
      }, 
      {
        "comment": "Software used to generate machine learning outputs", 
        "url": "https://github.com/curiosity54/mlelec/tree/qm7", 
        "type": "Software", 
        "citation": "J. Nigam, P. Pegolo, and M. Ceriotti, \u201cIntegrating ML for Hamiltonian and electronic structure,\u201d https://github.com/curiosity54/mlelec (2024)."
      }
    ], 
    "version": 1, 
    "license": "Creative Commons Attribution 4.0 International", 
    "status": "published", 
    "title": "Exploring the design space of machine-learning models for quantum chemistry with a fully differentiable framework", 
    "keywords": [
      "Hamiltonian", 
      "machine learning", 
      "Automatic-differentiation"
    ], 
    "doi": "10.24435/materialscloud:mg-8f", 
    "mcid": "2025.92", 
    "is_last": true, 
    "_oai": {
      "id": "oai:materialscloud.org:2715"
    }, 
    "conceptrecid": "2714"
  }
}