The tar ball file `datasets.tar.gz` contains three folders corresponding to each dataset used in the article. Each of them contains the geometries (xyz-files), SMILES and properties (CSV-file), and the raw binary data (data-splits, results, and fingerprints/representations) ./cyclo: full_dataset.csv full dataset and target properties dataset_subset_750.csv Subset splitting and properties B2R2(l)-model b2r2_l_10_fold.npy results on the 10 fold cross-validation datasplits b2r2_l_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries) b2r2_l.npy representations for the full dataset b2r2_l_xtb.npy representations for the full dataset (xtb geometries) DRFP-model drfp_10_fold.npy results on the 10 fold cross-validation datasplits drfp.npy representations for the full dataset MFP-model mfp_10_fold.npy results on the 10 fold cross-validation datasplits mfp.npy representations for the full dataset SLATM-model slatm_10_fold.npy results on the 10 fold cross-validation datasplits slatm_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries) Geometries xyz DFT-level geometries xyz-xtb xTB-level geometries ./gdb7-22-ts: ccsdtf12_dz.csv ccsd-level computed data and target properties ccsdtf12_dz_subset_750.csv subset ccsd-level computed data and target properties tr_sizes.npy training sizes for each split B2R2(l)-model b2r2_l_10_fold.npy results on the 10 fold cross-validation datasplits b2r2_l_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries) b2r2_l.npy representations for the full dataset b2r2_l_xtb.npy representations for the full dataset (xtb geometries) DRFP-model drfp_10_fold.npy results on the 10 fold cross-validation datasplits drfp.npy representations for the full dataset MFP-model mfp_10_fold.npy results on the 10 fold cross-validation datasplits mfp.npy representations for the full dataset SLATM-model results on the 10 fold cross-validation datasplits slatm_10_fold.npy results on the 10 fold cross-validation datasplits (xtb geometries) slatm_10_fold_xtb.npy Geometries xyz DFT-level geometries xyz-xtb xTB-level geometries ./proparg: data.csv full dataset and target properties data_fixarom_smiles.csv fixed aromaticity data_fixarom_smiles_stereo.csv fixed stereochemistry data_subset_750.csv subset splitting B2R2(l)-model b2r2_l_10_fold.npy results on the 10 fold cross-validation datasplits b2r2_l_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries) b2r2_l.npy representations for the full dataset b2r2_l_xtb.npy representations for the full dataset (xtb geometries) DRFP-model drfp.npy representations for the full dataset drfp_10_fold.npy results on the 10 fold cross-validation datasplits drfp_combinatorial.npy representations for the full dataset drfp_combinatorial_10_fold.npy results on the 10 fold cross-validation datasplits drfp_stereo.npy representations for the full dataset (including stereochemistry) drfp_stereo_10_fold.npy results on the 10 fold cross-validation datasplits MFP-model mfp.npy representations for the full dataset mfp_10_fold.npy results on the 10 fold cross-validation datasplits mfp_combinatorial_10_fold.npy representations for the full dataset mfp_combinatorial.npy results on the 10 fold cross-validation datasplits mfp_stereo_10_fold.npy representations for the full dataset (including stereochemistry) mfp_stereo.npy results on the 10 fold cross-validation datasplits SLATM-model slatm_10_fold.npy results on the 10 fold cross-validation datasplits slatm_10_fold_xtb.npy results on the 10 fold cross-validation datasplits (xtb geometries) Geometries xyz DFT-level geometries xyz-xtb xTB-level geometries