This repo contains all the datasets (in the extxyz format of ASE) as well as models that have been trained for "Prediction rigidities for data-driven chemistry" by Chong et al., a contribution to Faraday Discussions: Data-driven discovery in the chemical sciences. - Manuscript can be found here: - Model training inputs and analysis details can be found here: Files are organized according to the sections that appear in the manuscript: `section_3`: contains (1) QM9 dataset organized into training (or subsampled training sets), validation, and test sets, and similarly Si10 clusters with their energies and forces. We do not claim any ownership of the QM9 data; (2) SOAP-BPNN, PaiNN, and MACE models that have been trained using the datasets. Details of how the models have been trained can be found in the associated GitHub repository: `section_4a`: contains the amorphous carbon structures of Deringer et al., Phys. Rev. B, 2017, organized into the different density splits that were used for the analysis. We do not claim any ownership of the data. `section_4b`: contains the carbon structures of Deringer et al., Phys. Rev. B, 2017, organized into the training and test splits that were used for the analysis. We do not claim any ownership of the data. `section_5a`: contains the silicon dimer to pentamer configurations with their energies and forces used for the CPR analysis. `section_5b`: contains the water dimer configurations at separation between 3 and 10 Angstroms used for the CPR analysis. `section_6`: contains the raw all-atoms MD trajectory from GROMACS that was prepared for the coarse-grained ML model training, as well as the frames where the water molecules are expressed as beads, organized into the training, validation, and test sets used for the coarse-grained MACE model training.