Benchmarks¶
The ANI-1x dataset is a comprehensive collection of labelled molecular structures designed for training machine learned potentials. ANI-1x was generated using an active learning approach to produce a diverse and useful dataset covering the chemical space of organic molecules composed of C, H, N, and O atoms, Accurate energy and force labels are provided for each structure using the \(\omega\)B97x/6-31G(d) level of theory. Internall, files are downloaded from FigShare.
A collection of 7,165 saturated, small molecules containing up to 7 heavy atoms, with geometries relaxed using an empirical potential. Atomisation energies were calculated similarly to a FHI-AIMS implementation of the Perdew-Burke-Ernzerhof hybrid functional (PBE0). Original files were obtained from quantum-machine.org. Energies have been converted from kcal/mol to eV.
134k stable organic molecules made up of CHONF and containing up to 9 heavy atoms. Each molecule’s geometry was relaxed at the PM7 semi-empirical level of theory, before being labelled with DFT. For more information, see Quantum chemistry structures and properties of 134 kilo molecules. Internally, files are downloaded from FigShare,. Energy labels are quoted in eV, relative to the isolated atoms of the molecule.
A dataset composed of a single MD trajectory for each of 10 molecules. Original structures are taken from Chmiela et al., with energy and force labels recalculated by Christensen and von Lilienfeld using “the PBE/def2-SVP level of theory [with] very tight SCF convergence and [a] very dense DFT integration grid”. The MD trajectories are presented one at a time, with structures within each trajectory in chronological order.