The basics¶

Under the hood, the graph-pes-train command performs the following steps:

loads in your data, model, loss function, etc. This happens before anything else so that if you run into errors, you can quickly identify the source of the problem.
“pre-fits” the model on the training data. (optional) Under-the-hood, any torch.nn.Module components of your model that define a pre_fit method will be passed the training data for them to make any adjustments/calculations before training commences (see pre_fit_all_components() for details). This is useful for e.g. estimating energy scales and offsets from the training data.
trains the model using a PyTorch Lightning trainer.
saves the best model for later use, as well as “deploying it” for use in LAMMPS
tests the best model on the training and validation data, together with any other test data you have specified.

To control the behaviour of these steps, you need to pass graph-pes-train a nested dictionary of configuration options. These options are sourced from three places:

the default values defined in training-defaults.yaml
values you define in the config file/s you pass to graph-pes-train: <config-1.yaml> <config-2.yaml> ...
additional command line arguments you pass to graph-pes-train: <nested/key=value> <nested/key=value> ...

The following .yaml config file contains the bare minimum information you need to specify in order to train a model:

minimal.yaml¶

# train a SchNet model...
model:
    +SchNet:
        layers: 3
        channels: 64
        cutoff: =/CUTOFF

# ...using some of the QM7 structures...
data:
    +load_atoms_dataset:
        id: QM7
        cutoff: =/CUTOFF
        n_train: 5_000
        n_valid: 100

# ...training on energy labels...
loss: +PerAtomEnergyLoss()

# ...using a cutoff of 5.0 Å
#    (referenced above)
CUTOFF: 5.0

To use this config file, while overriding the default CUTOFF value, you would run:

graph-pes-train minimal.yaml CUTOFF=3.5

Units¶

Providing that you use a consistent unit system, graph-pes doesn’t care about the exact units you use.

What do we mean by consistent? If your energies are provided in units of A, and your lengths (positions and cell vectors) are provided in units of B, then:

forces should be in units of A/B
stress should be in units of A/B^3
virial should be in units of A

Common choices for A and B are:

A = eV and B = Å
A = kcal/mol and B = Å

but you could also use A = J and B = m if you wanted to!

See Theory for more details on the units of the various properties, and for the conventions adopted by graph-pes for calculating stresses and virials in particular.