* Preliminary Clone the GitHub repository: #+begin_src bash git clone git@github.com:HaoZeke/brms_idrot_repro.git #+end_src * About Contains the reproduction details for the publication on the performance and success models for the dimer across rotational optimizers and external rotation removal. ** Reference If you use this repository or its parts please cite the corresponding publication or data source. *** Preprint #+begin_quote R. Goswami, “Bayesian hierarchical models for quantitative estimates for performance metrics applied to saddle search algorithms,” May 19, 2025, arXiv: arXiv:2505.13621. doi: 10.48550/arXiv.2505.13621. #+end_quote ** Replication data Remember to inflate the data using the materialscloud source before using the scripts in the repository. This can be done by running the following--assuming that the ~.xz~ files are in ~data~ relative to the repository root: #+begin_src bash # Fitted models with predictions cd $GITROOT/data tar -xf models_and_preds.tar.xz && rm -rf models_and_preds.tar.xz # Raw benchmark data, i.e., EON output logs cp $GITROOT/data/hpc.tar.xz $GITROOT/bench_runs/runs/hpc cd $GITROOT/bench_runs/runs/hpc tar -xf hpc.tar.xz && rm -rf hpc.tar.xz #+end_src *** Reusing models To reuse the models and predictions, both F.A.I.R. formatted results and easier to work with ~R~ formats are provided. **** From R The record contains ~R~ objects for the ~brms~ models and their predictions. The model dependencies need to be loaded. After this, the base ~R~ function, ~readRDS~ will suffice: #+begin_src R library('brms') model <- readRDS("data/models/brms_pes_cglbfgs_norot.rds") #+end_src The predictions are ~zstd~ compressed files (level 22) which need to be read using ~archive::file_read~ and ~readRDS~: #+begin_src R con <- archive::file_read(file = "data/models/preds/brms_pes_cg_rotrem.rds") res<-readRDS(con) close(con) #+end_src More helper functions for generating and using these models and predictions are in the Github repository. **** F.A.I.R formatted usage Without ~R~ the steps for access are a bit more involved. Predictions are provided as Apache Arrow Parquet files, along with the model training data. The model trained is also exported from ~brms~ into the ~stan~ code. For each model (e.g., brms_pes_cg_rotrem), we provide three key components: - Stan Code (.stan): The complete model definition translated into the Stan programming language. This is the logic of the model. - File example: data/fair_forms/stancode/brms_pes_cg_rotrem.stan - Stan Data (.parquet): The data that was passed to the Stan model for fitting. This file is essential for re-running the model from scratch. - File example: data/fair_forms/standata_parquet/brms_pes_cg_rotrem_standata.zstd.parquet - Predictions (.parquet): The pre-computed predictions generated by our R run of the model. This is the most direct way to use the model's output. - File example: data/fair_forms/brms_pes_cg_rotrem_preds.zstd.parquet *** Structure The repository itself is structured into code archives, benchmark runs, and scripts for analysis. #+begin_src bash ➜ tree -L 2 . ├── bench_runs │   ├── base_config.ini │   ├── calc_rundata.py │   ├── profiles │   ├── readme.org │   ├── rundata │   ├── run_eon.py │   ├── scripts │   └── Snakefile ├── data │   └── sella_si_data.zip ├── docs │   └── source ├── LICENSE ├── pixi.lock ├── pixi.toml ├── readme.org ├── scripts │   └── env_setup.sh └── subrepos ├── ase ├── chemparseplot ├── eOn ├── IterativeRotationsAssignments ├── nwchem ├── pychumpchem └── rgpycrumbs #+end_src Where the data in the archives expands to locations within the benchmarks. Each of the benchmarks consists of the following structure: #+begin_src bash . ├── doublets │   ├── 000 # ..... │   └── 234 └── singlets │   ├── 000 # ..... └── 264 #+end_src Comprising of 500 systems. *** EON Dimer runs #+begin_src bash # hpc.tar.xz # $GITROOT/bench_runs/runs/hpc ➜ tree -L 3 . . ├── cg │   ├── no_rot_remove │   │   ├── doublets │   │   └── singlets │   └── rot_remove │   ├── doublets │   └── singlets └── lbfgs ├── no_rot_remove │   ├── doublets │   └── singlets └── rot_remove ├── doublets └── singlets #+end_src