Benchmarks

ANI-1ccx

The ANI-1ccx dataset comprises an “optimally spanning” subset of the ANI-1x dataset, with each structure being re-labelled with the total structure energy using the “gold standard” CCSD(T)/CBS level of theory. Internall, files are downloaded from FigShare.

ANI-1x

The ANI-1x dataset is a comprehensive collection of labelled molecular structures designed for training machine learned potentials. ANI-1x was generated using an active learning approach to produce a diverse and useful dataset covering the chemical space of organic molecules composed of C, H, N, and O atoms, Accurate energy and force labels are provided for each structure using the \(\omega\)B97x/6-31G(d) level of theory. Internall, files are downloaded from FigShare.

QM7

A collection of 7,165 saturated, small molecules containing up to 7 heavy atoms, with geometries relaxed using an empirical potential. Atomisation energies were calculated similarly to a FHI-AIMS implementation of the Perdew-Burke-Ernzerhof hybrid functional (PBE0). Original files were obtained from quantum-machine.org. Energies have been converted from kcal/mol to eV.

QM9

134k stable organic molecules made up of CHONF and containing up to 9 heavy atoms. Each molecule’s geometry was relaxed at the PM7 semi-empirical level of theory, before being labelled with DFT. For more information, see Quantum chemistry structures and properties of 134 kilo molecules. Internally, files are downloaded from FigShare,. Energy labels are quoted in eV, relative to the isolated atoms of the molecule.