The dataset is provided in the `.array` format which contains a binary representation of a data frame.
It can be loaded in python using the pandas library as follows:

    import pandas as pd
    pd.read_feather("<filename>.arrow")

This will give access to a dataframe containing the following columns:

| key                 | description                                                |
|---------------------|------------------------------------------------------------|
| dir                 | The directory where the data came from                     |
| material            | The material name                                          |
| is_vdw              | Whether van der Waals corrections were used                |
| uv_iter             | The self-consistent step number                            |
| formula             | The chemical formal in a unit cell                         |
| cell                | The dimensions of the the unic cell                        |
| n_atoms_uc          | The number of atoms in the unit cell                       |
| person              | The user who generated the data                            |
| structure_index     | A unit identifier for the structure this row belongs to    |
| pw_time_unix        | How long the pw.x calculation took                         |
| hp_time_unix        | How long the hp.x calculation took                         |
| param_in            | The input Hubbard parameter                                |
| param_out           | The output Hubbard parameter                               |
| param_type          | The parameter type of this row, can be "U" or "V"          |
| dist_bohr_in        | The interatomic distance before relaxation                 |
| dist_bohr_out       | The interatomic distance after relaxation                  |
| atom_1_idx          | The index of atom 1 in the atoms list                      |
| atom_1_idx_uc       | The unit cell index of atom 1 in the atoms list            |
| atom_1_element      | The element of atom 1                                      |
| atom_1_mass         | The mass of atom 1                                         |
| atom_1_z_valence    | The atomic number of atom 1                                |
| atom_1_in_name      | The name used for atom 1 at the input                      |
| atom_1_in_type      | A unique index for atom 1 at the input                     |
| atom_1_out_name     | The name used for atom 1 at the output                     |
| atom_1_out_type     | A unique index for atom 1 at the output                    |
| atom_1_occs_1       | The flattened occupations matrix for atom 1 spin channel 1 |
| atom_1_occs_2       | The flattened occupations matrix for atom 2 spin channel 2 |
| atom_1_frac_coords  | The fractional coordinates of atom 1                       |
| atom_1_starting_mag | The starting magnetisation of atom 1                       |
| atom_1_final_mag    | The final magnetisation of atom 1                          |
| atom_2_idx          | The index of atom 2 in the atoms list                      |
| atom_2_idx_uc       | The unit cell index of atom 2 in the atoms list            |
| atom_2_element      | The element of atom 2                                      |
| atom_2_mass         | The mass of atom 2                                         |
| atom_2_z_valence    | The atomic number of atom 2                                |
| atom_2_in_name      | The name used for atom 2 at the input                      |
| atom_2_in_type      | A unique index for atom 2 at the input                     |
| atom_2_out_name     | The name used for atom 2 at the output                     |
| atom_2_out_type     | A unique index for atom 2 at the output                    |
| atom_2_occs_1       | The flattened occupations matrix for atom 2 spin channel 1 |
| atom_2_occs_2       | The flattened occupations matrix for atom 2 spin channel 2 |
| atom_2_frac_coords  | The fractional coordinates of atom 2                       |
| atom_2_starting_mag | The starting magnetisation of atom 2                       |
| atom_2_final_mag    | The final magnetisation of atom 2                          |


If the `param_type` is "U" then atom 1 and atom 2 are the same, and the data is replicated.
Otherwise, atoms 1 and 2 will correspond to a transition metal, ligand pair.

# Source code

The source code needed to perform training can be found in the file `hubbardml-v0.2.0.zip`.

To get started you can use the following shell commands:

    unzip hubbardml-v0.2.0.zip
    cd camml-lab-hubbardml-1537fcb
    pip install -e .
    
This will install the python library, now to train you can use:

    python run.py experiment=predict_hp model=u
    
Which will train a model for predicting Hubbard U parameters using our supplied dataset.

